Import AI

Import AI: #77: Amazon tests inventory improvement by generating fake customers with GANs, the ImageNet of video arrives, and robots get prettier with Unity-MuJoCo tie-up.

Urban flaneurs generate fake cities with GANs:
…Researchers discover same problems that AI researchers have been grappling with, like those relating to interpretability and transparency…
Researchers have used generative adversarial networks to generate a variety of synthetic, fictitious cities. The project shows that “a basic, unconstrained GAN model is able to generate realistic urban patterns that capture the great diversity of urban forms across the globe,” they write. This isn’t particularly surprising since we know GANs can typically approximate the distribution of the data they are fed on – though I suspect the dataset (30,000 images) might be slightly too small to do away with things like over-fitting.
  Date used: The Global Urban Footprint, an inventory of built-up land at 12m/px resolution, compiled by the German Aerospace Center.
  Questions: It’s always instructive to see the questions posed by projects that sit at the border between AI and other disciplines, like geography. For this project, some open questions the researchers are left with include: “How to evaluate the quality of model output in a way that is both quantitative, and interpretable and intuitive for urban planning analysis? How to best disentangle, explore, and control latent space representations of important characteristics of urban spatial maps? How to learn from both observational and simulated data on cities?,” and more.
  Read more here: Modeling urbanization patterns with generative adversarial networks (Arxiv).

The ImageNet of video (possibly) arrives:
…MIT’s ‘Moments in Time’ dataset consists of one million videos, each of which is 3 seconds long…
Much of the recent progress in machine learning has been driven in part by the availability of large-scale datasets providing a sufficiently complex domain to stress-test new scientific approaches against. MIT’s new ‘moments in time’ dataset might just be the dataset we need for video understanding, as it’s far larger than other available open source datasets (eg activitynet, kinetics, UCF, etc), and also has a fairly broad set of initial labels (339 verbs linked to a variety of different actions or activities.)
  Video classification baselines: The researchers also test the new dataset on a set of baselines based on systems that use techniques like residual networks, optical flow, and even sound (via usage of a SoundNet network). These baselines get top-5 accuracies of as high as 50% or so, which means that at least one selection within five proffered by the system is correct. The best performing approach is a ‘temporal relation network’ (TRN). This network attained a score of about 53% and was trained on RGB frames using the InceptionV3 image classification architecture.
  Next:“Future versions of the dataset will include multi-labels action description (i.e. more than one action occurs in most 3-second videos), focus on growing the diversity of agents, and adding temporal transitions between the actions that agents performed,” the researchers write.
   Read more: Moments in time dataset: one million videos for event understanding (Arxiv).

Ugly robots no more: Unity gets a MuJoCo plugin:
…Tried-and-tested physics simulator gets coupled to a high-fidelity game engine…
Developers keen to improve the visual appearance of their AI systems may be pleased to know that MuJoCo has released a plugin for the Unity engine. This will let developers import MuJoCo models directly into Unity then visualize them in snazzier environments.
  “The workflow we envision here is closer to the MuJoCo use case: the executable generated by Unity receives MuJoCo model poses over a socket and renders them, while the actual physics simulation and behavior control take place in the user’s environment running MuJoCo Pro,” write the authors.
  Read more here: MuJoCo Plugin and Unity Integration.

Google censors itself to avoid accidental racism:
…Search company bans search terms to protect itself against photos triggering insulting classifications…
A little over two years ago Google’s Google Photos application displayed an appalling bug: searches for ‘gorillas’ would bring up photos of black people. There was a swift public outcry and Google nerfed its application so it wouldn’t respond to those terms. Two years later, despite ample progress in AI and machine learning, nothing has changed.
  “A Google spokesperson confirmed that “gorilla” was censored from searches and image tags after the 2015 incident, and that “chimp,” “chimpanzee,” and “monkey” are also blocked today. “Image labeling technology is still early and unfortunately it’s nowhere near perfect,” the spokesperson wrote in an email, highlighting a feature of Google Photos that allows users to report mistakes,” Wired reports.
   Read here: When it comes to Gorillas, Google Photos Remains Blind (Wired).

The first ever audiobook containing a song generated by a neural network?
…’Sourdough’ by Robin Sloan features AI-imagined Croatian folk songs…
Here’s a fun blog post by author Robin Sloan about using AI (specifically, SampleRNN) to generate music for his book. Check out the audio samples.
   Read more: Making The Music of the Mazg.

Miri blows past its 2017 funding target thanks to crypto riches:
…Crypto + AI research, sitting in a tree, S-Y-N-E-R-G-I-Z-I-N-G!…
The Machine Intelligence Research Institute in Berkeley has raised over $2.5 million with its 2017 fundraiser, with a significant amount of funding coming from the recent boom in cryptocurrencies.
  66% of funds raised during this fundraiser were in the form of cryptocurrency (mainly Bitcoin and Ethereum),” Miri writes.
   Read more here: Fundraising success! (Miri).

Amazon turns to GANS to simulate e-commerce product demand… and it sort of works!
…As if the e-retailer doesn’t have enough customers, now it’s inventing synthetic ones…
Researchers with Amazon’s Machine Learning team in India have published details on eCommerceGAN, a way to use GANs to generate realistic, synthetic customer and customer order data. This is useful because it lets you test your system for the vast combinatorial space of possible customer orders and, ideally, get better at predicting how new products will match with existing customers, and vice versa.
  “The orders which have been placed in an e-commerce website represent only a tiny fraction of all plausible orders. Exploring the space of all plausible orders could provide important insights into product demands, customer preferences, price estimation, seasonal variations etc., which, if taken into consideration, could directly or indirectly impact revenue and customer satisfaction,” the researchers write.
  The eCommerce GAN (abbreviated to ‘ecGAN’) lets the researchers create a synthetic “dense and low-dimensional representation of e-commerce orders”. They also create an eCommerce-conditional-GAN (ec^2GAN), which lets them “generate the plausible orders involving a particular product”.
  Results: The researchers created a 3D t-SNE map of both real customer orders and GAN-generated ones. The plots showed a strong correlation between the two with very few outliers, suggesting that their ecGAN approach is able to generate data that falls within the distribution of what e-retailers actually see. To test the the ec^2GAN they see if it can conditionally generate orders that have similar customer<>product profiles to real orders – and they succeed. It might sound a bit mundane but this is a significant thing for Amazon: it now has a technique to simulate the ‘long tail’ of customer<>product combinations and as it gets better at predicting these relationships it could theoretically get better at optimizing its supply-chain / just-in-time inventory / marketing campaigns / new product offerings and test groups, and so on.
  Data: The researchers say they “use the products from the apparel category for model training and evaluation. We randomly choose 5 million orders [emphasis mine] made over the last one year in an e-commerce company to train the proposed models.” Note that they don’t specify where this data comes from, though it seems overwhelmingly likely it derives from Amazon since that’s where all the researchers worked during this project, and no other dataset is specified.
   Read more: eCommerceGAN: A Generative Adversarial Network for eCommerce (Arxiv).

Why AI research needs to harness huge amounts of compute to progress:
…The future involves big batches, massive amounts of computation, and, for now, lots of hyperparameter tuning…
AI researchers are effective in proportion to the quantity of experiments they can run over a given time period. This is because deep learning-based AI is predominantly an empirical science, so in the absence of strong theoretical guarantees researchers need to rigorously test algorithms to appropriately debug and develop them.
  That fact has driven recent innovations large-scale distributed training of AI algorithms, initially for traditional classification tasks, like the two following computer vision examples (covered in #69 of Import AI.)
   July, 2017: Facebook trains an ImageNet model in ~1 hour using 256 GPUs.
   November, 2017: Preferred Networks trains ImageNet in ~15 minutes using 1024 NVIDIA P100 GPUs.
   Now, as AI research becomes increasingly focused on developing AI agents that can take actions in the world, the same phenomenon is happening in reinforcement learning, as companies ranging from DeepMind (Ape-X, Gorilla, others) to OpenAI (Evolutionary Strategies, others) try to reduce the wall clock time it takes to run reinforcement learning experiments.
  New research from, Intel, and the Polish Academy of Sciences shows how to scale-up and tune a Batch Asynchronous Advantage Actor-Critic algorithm with the ADAM optimizer and a large batch size of 2048 to let them learn to competently play a range of Atari games in a matter of minutes; in many cases it takes the system just 20 minutes or so to attain competitive scores on games like Breakout, Boxing, Seaquest, and others. They achieve this by scaling up their algorithm via techniques gleaned from the distributed systems world (eg, parameter surveys, clever things with temporal alignment across different agents, etc), which lets them run their algo across 64 workers comprising 768 distinct CPU cores.
  Next: PPO: The authors note that PPO, a reinforcement learning algorithm developed by OpenAI, is a “promising area of future research” for large-scale distributed reinforcement learning.
   Read more: Distributed Deep Reinforcement Learning: learn how to play Atari games in 21 minutes.

Googlers debunk bogus research into getting neural networks to detect sexual orientation:
“AI is a general-purpose technology that can be used to automate a great many tasks, including ones that should not be undertaken in the first place”…
Last fall, researchers with Stanford published a paper on Arxiv claiming that a neural network-based image classification system they designed could detect sexual orientation more accurately than humans. The study – Deep neural networks are more accurate than humans at detecting sexual orientation from facial images – was criticized for making outlandish claims and widely covered i nthe press. Now, the paper has been accepted for publication in a peer-reviewed academic journal – the Journal of Personality and Social Psychology. This seems to have motivated Google researchers Margaret Mitchell and Blaise Aguera y Arcas, and Princeton professor Alex Todorov, to take a critical look at the research.
  The original study relied on a dataset composed of 35,326 images taken from public profiles on a US dating website as its ground-truth data. You can get a sense of the types of photos present here by creating composite “average” images from the ground-truth labelled data – when you do this you notice some significant differences: the “average” heterosexual male face doesn’t have glasses, while the gay face does, and similarly the faces of the “average” heterosexual females have eyeshadow on them, while the lesbian faces do not.
  Survey:Might it be the case that the algorithm’s ability to detect orientation has little to do with facial structure, but is due rather to patterns in grooming, presentation and lifestyle?” wonder the the Google and Princeton researchers. To analyze this they surveyed 8,000 Americans using Amazon mechanical turk and asked them 77 yes/no questions, ranging from sexual disposition, to whether they have a beard, wear glasses, and so on. The results of the survey seem to roughly track with the “average” images we can extract from the dataset, suggesting that rather than developing a neural network that can infer your fundamental sexual proclivity by looking at you, the researchers have instead built a snazzy classifier that conditions the chance on whether you are gay or straight based on whether you’re wearing makeup or glasses or not.
  To illustrate the problems with the research the Googlers show that they can attain similar classification accuracies to the original experiment purely through asking a series of yes/no questions, with no visual aid. “For example, for pairs of women, one of whom is lesbian, the following not-exactly-superhuman algorithm is on average 63% accurate: if neither or both women wear eyeshadow, flip a coin; otherwise guess that the one who wears eyeshadow is straight, and the other lesbian. Adding six more yes/no questions about presentation (“Do you ever use makeup?”, “Do you have long hair?”, “Do you have short hair?”, “Do you ever use colored lipstick?”, “Do you like how you look in glasses?”, and “Do you work outdoors?”) as additional signals raises the performance to 70%,” they write.
  Alternate paper title: In light of this criticism, perhaps a better title for the paper would be Deep neural networks are more accurate than humans at predicting the correspondence between various garments and makeup and a held-out arbitrary label. But we knew this already, didn’t we?
   Read more here: Do algorithms reveal sexual orientation or just expose our stereotypes?

OpenAI Bits & Pieces:

Science interviews OpenAI’s Tim Salimans for a story about Uber’s recent work on neuroevolution.
   Read more: Artificial intelligence can ‘evolve’ to solve problems (Science).

Tech Tales:


You get the call after utilization in the data center ticks up from 70% to 80% all the way to 90% in the course of 24 hours.
   “Takeoff?” you ask on the phone.
   “No,” they say. “Can’t be. This is something else. We’ll have more hardware tomorrow so we can scale with it, but we need you to take a look. We’ve sent a car.”

So you get into the car and go to an airport and fly a few thousand miles and land and get into another blacked-out car which auto-drives to the site and you go through security and then the person who spoke to you on the phone is in front of you saying “it isn’t stopping.”
  “Alright,” you say, “Show me the telemetry.”
  They lead you into a room whose walls are entirely made of computer monitors. Your phone interfaces with the local computer system and after a few seconds you’re able to control the screens and navigate the data. You dive in, studying the jagged lines of learning graphs, the pastel greens and reds of the utilization dashboard, and eventually the blurry squint-and-you’ll-miss-it concept-level representations of some of the larger networks. And all the while you can see utilization in the data center increasing, even as new hardware is integrated.
   “What the hell is it building in there,” you mutter to yourself, before strapping on some VR goggles and going deeper. Navigating high-dimensional representations is a notorious mindbender – most people have a hard time dealing with non-euclidean geometries; corridors that aren’t corridors, different interpretations of “up” and “down” depending on which slice of a plane you’ve become embedded in, spheres that are at once entirely hollow and entirely solid, and so on. You navigate the AI’s embedding, trying to figure out where all of the computation is being expended, while attempting to stop yourself from throwing up the sandwich you had in the car.
   And then, after you grope your way across a bridge which becomes a ceiling which becomes a door that folds out on itself to become the center of a torus, you find it: in one of the panes of the torus looking into one of the newer representational clusters you can see a mirror-image of the larger AI’s structure you’ve been navigating. And beyond that you can make out another torus in the distance containing another pane connecting to another large-scale non-euclidean representation graph. You sigh, take off your glasses, and make a phone call.

“It’s a bug,” you say. “Did you check the recursion limits in the self-representation system?”
   Of course they hadn’t. So much time and computation wasted, all because the AI had just looped into an anti-pattern where it had started trying to simulate itself, leading to the outward indicators of it growing in capability – somewhat richer representations, faster meta-learning, a compute and data footprint growing according to some clear scale law. But those symptoms didn’t bely a greater  intelligence, rather just the AI attempting to elucidate its own Kolmogorov complexity to itself – running endless simulations of itself simulating simulations of itself, to try and understand a greater truth, when in fact it was just a mirrow endlessly refracting upon itself.

Concepts that inspired this story: Procedural maze generators, Kolmogorov complexity, non-euclidean virtual reality (YouTube video).

Import AI: #76: Why government needs technologists to work on AI policy, training machines to learn through touch as well as vision with SenseNet, and using mazes to test smart agents.

Facebook releases free speech recognition toolkit, wav2letter:
…Also pays out computational dividend in the form of a pre-trained Librispeech model for inference…
Facebook has released wav2letter, open source automatic speech recognition software. The technology has been previously described – but not released as code – in two Facebook AI Research papers: Wav2Letter: an End-to-End ConvNet-based Speech Recognition System, and Letter-Based Speech Recognition with Gated ConvNets.
  The release includes pre-trained models. I view these kinds of things as a ‘research&compute dividend’ that Facebook is paying out to the broader AI community (though it’d be nice if academic labs were able to access similarly vast resources to aid their own research and ensuing releases.
– Read more: wav2letter (GitHub).

When are we getting ‘Prime Air’ for the military and why is it taking so long to develop?
How commercial innovations in drone delivery could influence the military, which is fundamentally a large logistics organization with some pointy bits at the perimeter…
You might suspect that by now the military would be using drones throughout its supply chains, given the rapid pace of development of the technology. So why hasn’t that happened? In this essay on The Strategy Bridge Air Force Officer Jobie Turner discusses how the US could use this technology, why it’s going to take a long time to deploy it (“any new technology on the commercial market for logistics, will not have wartime survival as a precondition for employment”), and how the use of rapid logistics capabilities could influence the military.
  Turner also notes that “speed and capacity have more often than not been a hindrance to U.S. logistics rather than a boon. Too much too soon has been a far worse a problem than too little too late. For example, in the campaign for Guadalcanal, U.S. Marines deposited tons and tons of food and equipment on the beaches upon landing, only to discover that they lacked the labor and machines to move the cargo off the beaches. As a result, several weeks’ worth of food washed out with the tide—exacerbating a tenuous supply situation. In a more recent example, during Operation Desert Storm, so much cargo was brought in by air and sea that ‘iron mountains’ were created with the materiel, much of it never reaching its destination.”
– Read more: The Temptations of the Brown Box (The Strategy Bridge).

Reach out and learn shapes with the new ‘SenseNet’ simulator:
Simulator and benchmark aims to motivate progress in reinforcement learning beyond the typical visual paradigm
Today, most AI research revolves around classifying audio or visual inputs. What about touch? That’s the inspiration behind ‘SenseNet’, a new 3D environment simulator and dataset released last week by Jason Toy.
  Close your eyes and imagine picking up a ceramic mug or touching the nearest object to you. I think most people will find that imagining touching these objects gives them an internal mental impression of the object that’s distinct from imagining seeing the object. It’s this sensorimotor sensation that is the inspiration for SenseNet, which aims to challenge us to design algorithms that let machines classify objects by non-visual 3D sensing, potentially augmented by vision inputs as well.
  SenseNet gives researchers access to a simulated ‘MPL’ robotic hand with touch integrated into one of its (simulated) physical sensors. This could let people experiment with algorithms that learn to classify objects by touch alone rather than visual appearance. It uses an API loosely modelled on OpenAI Gym, so should be somewhat familiar to developers.
– Read more: SenseNet: 3D Objects Database and Tactile Simulator (Arxiv).

New environments: 2D Mazes for better AI:
Open source project lets you train agents to solve a variety of different mazes, some of which have been used in cognitive neuroscience experiments…
Haven’t you ever wanted to be trapped in an infinite set of procedurally generated mazes, testing your intelligence by negotiating them and finding your way to the end? If so (or perhaps if you’re just a research interested in training AI agents), then Gym-maze might be for you. The software project is made by Xingdong Zuo and provides a customizable set of environments to test AI agents in and can be used as an OpenAI Gym environment. It ships with a maze generator and nicely documented interface as well as a Jupyter notebook that implements and visualizes a bunch of different types of mazes.
  Budding AI-neuroscience types might like the fact it comes with a Morris water maze — a type of environment frequently used to test rodents for cognitive abilities. (DeepMind and others have also validated certain agents on Morris Water maze tasks as well).
  Bonus: It ships with a nicely documented A* search implementation, to validate the procedurally generated mazes.
– Get the code for gym-maze here (GitHub).

AI music video of the week:
…New Justin Timberlake video features robots, academics, futuristic beats, deep learning, etc. Reports of ‘jumping sharks’ said to be erroneous…
2017: AI researchers became fashion models, via a glossy Yves Saint Laurent campaign.
2018: Justin Timberlake releases a new music video called ‘Filthy’ which is set in 2028 at the ‘Pan-Asian Deep Learning Conference’ in Kuala Lumpur, Malaysia. Come for the setting and stay for the dancing robot (though I think the type of hardware they’re showing off in this fictional 2028 is likely a bit optimistic and robots will likely still be pretty crappy at that point.) Warning:  The video comes with some fairly unpleasant sexism and objectification, which (sadly) may be a reasonable prediction.
– Check out the video here: Justin Timberlake, Filthy (YouTube, mildly NSFW).

Why technologists need to run, not walk, into government to work on AI policy:
…Op-ed from security expert Bruce Schneier says current rules insufficient for a robot-fueled future…
Governments are unprepared to tackle the policy challenges posed by an increasingly digitized world, says Schneier in an op-ed in New York Magazine.
  “Pretty much all of the major policy debates of this century will have a major technological component. Whether it’s weapons of mass destruction, robots drastically affecting employment, climate change, food safety, or the increasing ubiquity of ever-shrinking drones, understanding the policy means understanding the technology. Our society desperately needs technologists working on the policy. The alternative is bad policy,” he says.
  Schneier suggests government create a new agency to study this vast topic: the ‘Department of Technology Policy’, which is a somewhat souped-up and expanded version of Ryan Calo’s proposal for a ‘Federal Robotics Commission’.  How exactly that would differ to the White House’s Office of Science and Technology Policy isn’t made clear in the article. (Currently, the OSTP is staffed thinly, relative to its predecessor, and hasn’t produced many public materials concerned with AI, nor made any significant statements or policy pronouncements in that area – a significant contrast to other nations.)
  Somewhat gloomy policy comment: It’s much easier to ask various parts of government to account for AI in existing legislation or via existing legislative bodies than to spin-up an entirely new agency, especially in a climate that generally treats government expenditure on science with suspicion.
  Read more: Click Here to Kill Everyone (NYMag – Select/All.)
Read more: The case for a federal robotics commission, Ryan Calo (Brookings).

Predicting the unpredictable: Miles Brundage’s AI forecasts:
Arxiv paper tracker & AI policy research fellow scores his own 2017 predictions…
Miles Brundage has ranked his 2017 AI forecasts in a detailed blog post. Predicting AI is a challenge and it’s interesting to see how Miles was careful at the outset to make his forecasts specific, but found in 2018 that a couple of them were still open to interpretation – suggesting a need for further question calibration. I find this sort of meta-analysis particularly helpful in letting me frame my own thinking about AI, so thanks to Miles and his collaborators for that.
  Highlights: Miles’s Atari predictions were on the money when you factor in compute requirements, but his predictions were a bit fuzzier when it came to specific applications (StarCraft and speech recognition) and on more open-ended research areas, like transfer learning.
– Read more: Miles Brundage’s Review of his 2017 AI forecasts.
Read more: Miles’s original 2017 forecasts.

OpenAI/Misc Bits & Pieces:

Policy notes from NIPS 2017:
  How does the sort of cutting-edge research being discussed at AI conferences potentially influence policy? Tim Hwang (formerly Google, currently leading the Harvard-MIT Ethics and Governance of AI Initiative) tried to read the proverbial tea leaves in current research. Read on for our thoughts on robots, bias, and adversarial tanks.
  Read more here: NIPS 2017 Policy Field Notes (Medium).

Tech Tales:

[2XXX, a pet shop on a post-AGI Earth]

So after it happened a lot of wacky stuff went on – flying cars, partial dyson spheres, postcard-sized supercomputers fired off at various distant suns, occasional dispensations of targeted and intricate punishments, machine-written laws, machine voting, the emergence of higher-dimensional AI beings who only interfaced with us ‘three-dee-errs‘ via mathematical conjectures put forth in discussions with other AIs, chickens that laid eggs whose shells always come apart into two neat halves, and so on.

But the really weird stuff continues to be found in the more mundane places, like this pet shop. Obviously these days we can simulate any pet you’d like and most kids grow up with a few talking cats and dogs to hang out with. In fact, most of us spend most of our time in simulations rather than physical reality. But for some of us there’s still a lot of importance placed on ‘real stuff’. Real animals. Real people. Real sex. Real wine. You get the picture. Sure, we can simulate it so it’s all indistinguishable, but something about knowing that you’ve bought or consumed mass that is the end-product of literally billions of years of stellar interactions… I don’t know. People get a kick out of this. Even me.

So now I’m talking to the robot that runs the shop and I’m holding the cryobox containing my dead cat in one hand and my little digital device in the other and I guess we’re negotiating? It gets some of my data for the next few years and in return it’ll scan the cat, reconstruct its mind, then download that into a young kitten. Hey presto, Mr Tabby just became Mr Tabby Junior and gets to grow up all over again but this time he’ll be a little smarter and have more of his personality at the beginning of his life. Seems good to me – I get to hang out with Mr Tabby a while longer and he gets a young man’s body all over again. We should all be so lucky.

There’s a catch, though. Because when I said Mr Tabby was dead I wasn’t being totally accurate. Technically, he’s alive. I guess the correct phrase is ‘about to be dead’. My vet AI said this morning that Mr Tabby, at the age of 19 years and zero days, has anywhere from ‘0 to 200 days to live with unknown probability distribution across this temporal period’. That’s AI code for: Frankly I’m surprised the cat isn’t dead right now.

So I did what some people do, these days. I put him into stasis, chucked him in the cryobox, and got a flyer over to the pet store. Now once I’ve finished the negotiation with the robot it’s going to just become a small matter of opening the box, bringing Mr Tabby out of hibernation, then doing the thing that the robot and I are simply referring to as ‘the procedure’, because the actual name – Rapid And Complete Exfiltration of Determination via Euthanasia And Digitization (RACEDEAD) – gives me the creeps. (There’s an unsubstantiated rumor that the AIs compete with each other to come up with acronyms that creep humans out. Don’t ask what DENTALFLY means.)

So now I just need to let go of the cryobox’s lid so that it can automatically open and we can get going, but I’m finding it difficult. Like I said, for some of us there’s a lot of importance placed on real stuff, even real death when it’s not ‘real’ or ‘death’ (though death does occur). So now Mr Tabby is in the box and I guess he’s technically dead and alive until I figure out what I’m going to do. Ha!

Things that inspired this story: Brain emulation, The Age of Em by Robin Hanson, Schrodinger’s paradox, a cat I know that has taken to sleeping on my arm when I visit their owner.

Import AI: #75: Virtual Beijing with ParallelEye, NVIDIA tweaks GPU licensing, and saving money by getting AI to help humans label data generated by AI

Synthetic cities and what they mean: Virtual Beijing via ParallelEye:
…As AI moves from the era of data competition to the era of environment competition researchers try to work out how best to harvest real-world data…
Researchers with The State Key Laboratory for Management and Control of Complex Systems, within the Chinese Academy of Sciences in Beijing have published details on the ‘ParallelEye’ dataset, a 3D urban environment modeled on Beijing’s Zhongguancun region. They constructed the dataset by grabbing the available Open Street Map (OSM) layout data for a 2km*3km area, then modeled that data using CityEngine, and built the whole environment in the Unity3D engine.
  This seems like an involved, overly human-in-the-loop process, compared to other approaches at large-scale 3D environment design, like UofT/Uber’s semi-autonomous data-augmentation techniques for creating a giant 3D map of Toronto, to the work being done by others on designing generators for procedural homesteads. However, it does provide a relatively labour-intensive pipeline for scraping and ingesting the world without needing expensive sensors and/or satellites. If enough datasets were constructed via this method it’s likely you could use deep learning approaches to automate the pipeline, like the transforms from OSM maps into full 3D models into Unity.
  The researchers carry out some basic testing of ParallelEye by creating synthetic cameras meant to be like those mounted on self-driving cars or in large-scale surveillance systems, then testing usability around this. They leave the application of actual AI techniques to the dataset for future research.
– Read more: The ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research (Arxiv).

Intel releases its 3D environment, the ‘CARLA’ simulator:
…It is said that any sufficiently large wealthy company is now keen to also own at least one bespoke AI simulator. Why? Good question!…
Intel recently released code for CARLA, a 3D simulator for testing and evaluating AI systems, like those deployed in self-driving cars.
  “CARLA is an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems,” Intel writes.
  The AI environment world right now is reminiscent of the world of programming languages a few years ago, or the dynamic and large ecosystem of supercomputer vendors a few years before that; we’re in an initial period of experimentation in which many ideas are going to be tried and it’ll be a few years yet before things shake out and a clear winner emerges. The world of AI programming frameworks is going through its own Cambrian explosion right now, though is further on in the process than 3D environments, as developers appear to be consolidating around Tensorflow and pyTorch, while dabbling in a bunch of other (sometimes complementary) frameworks, eg Caffe/Torch/CNTK/Keras/MXNet.
   Question: Technical projects have emerged to help developers transfer models built in one framework into another, either via direct transfer mechanisms or through meta-abstractions like ONNX. What would be the equivalent for 3D environments beyond a set of resolution constraints and physics constants?
– Read more: CARLA: An Open Urban Driving Simulator (PDF).
– Read more: CARLA 0.7 release notes.

Google Photos: 1, Clarifai Forevery: 0
…Competing on photo classification seems to be a risky proposition for startups in the era of mega cloud vendors…
Image recognition startup Clarifai is shutting down its consumer-facing mobile app Forevery to focus instead on its own image recognition services and associated SDKs. This seems like a small bit of evidence for how large companies like Google or Apple can overwhelm startups by competing with them on products that tap into large-scale ML capabilities – something Google and Apple are reasonably well positioned to use, whereas smaller startups will struggle.
– Read more: Goodbye Forevery. Hello Future. (Blog).

NVIDIA says NO to consumer graphics card in big datacenters:
…NVIDIA tweaks licensing terms to discourage people from repurposing its cheaper cards for data center usage…
For several years NVIDIA has been the undisputed king of AI compute, with developers turning to its cards en masse to train neural networks, primarily because of the company’s significant support for scientific/AI computation via tools like CUDA/cuDNN, etc.
  During the last few years NVIDIA has courted these developers by producing more expensive cards designed for 24/7 data center operation, incorporating enterprise-grade features relating to reliability and error correction. This is to help NVIDIA charge higher prices to some of its larger-scale customers. But adoption of these cards has been relatively slight as many developers are instead filling vast data centers with the somewhat cheaper consumer cards and accepting a marginally higher failure rate in exchange for a better FLOPS/dollar ratio.
  So now NVIDIA has moved from courting these developers to seeking to force them to change their buying habits: the company confirmed last week to CNBC that it recently changed its user agreement to help it extract money from some of the larger users.
  “We recently added a provision to our GeForce-specific EULA to discourage potential misuse of our GeForce and TITAN products in demanding, large-scale enterprise environments,” the company said in a statement to CNBC. “We recognize that researchers often adapt GeForce and TITAN products for non-commercial uses or other research uses that do not operate at data center scale. NVIDIA does not intend to prohibit such uses.”
– Read NVIDIA’s statement in full in this CNBC article.
The Titan(ic) Price premium: Even within NVIDIA’s desktop card range there is a significant delta in performance among cards, even when factoring in dollars/flops, as this article comparing a 1080 versus a Titan V shows.
– Read more: Titan V vs 1080 Ti — Head-to-head battle of the best desktop GPUs on CNNs. Is Titan V worth it? 110 TFLOPS! no brainer, right?

A ‘Ray’ of training light emerges from Berkeley:
…Systems geeks reframe reinforcement learning programming for better performance, scaling…
Berkeley researchers have released Ray RLLib, software to make it easier to set up and run reinforcement learning experiments. The researchers, which include the creator of the ‘Spark’ data processing engine, say that reinforcement learning algorithms are somewhat more complicated than typical AI models (mostly classification models trained via supervised learning) and that this means there’s some value in designing a framework that optimizes the basic components commonly used in RL. “RLlib proposes using a task-based programming model to let each component control its own resources and degree of parallelism, enabling the easy composition and reuse of components,” they write.
  “Unlike typical operators in deep learning frameworks, individual components may require parallelism across a cluster, use a neural network defined by a deep learning framework, recursively issue calls to other components, or interface with black-box third-party simulators,” they write. “Meanwhile, the main algorithms that connect these components are rapidly evolving and expose opportunities for parallelism at varying levels. Finally, RL algorithms manipulate substantial amounts of state (e.g., replay buffers and model parameters) that must be managed across multiple levels of parallelism and different physical devices.”
  The researchers test out RLLib by re-implementing a bunch of major algorithms used in reinforcement learning, like Proximal Policy Optimization, Evolution Strategies, and others. They also try to re-implement entire systems, such as AlphaGo.
  Additionally, they designed new algorithms within the framework: “We tried implementing a new RL algorithm that runs PPO updates in the inner loop of an ES optimization step that randomly perturbs the PPO models. Within an hour, we were able to deploy to a small cluster for evaluation. The implementation took only ∼50 lines of code and did not require modifying the PPO implementation, showing the value of encapsulation.”
– Read more: Ray RLLib (documentation).
– Read more: Ray RLLib: A Composable and Scalable Reinforcement Learning Library (Arxiv).

Lethal autonomous weapons will “sneak up on us” regardless of policy, says former US military chap:
…Robert Latiff, a former major general in the Air Force who also worked at the National Reconnaissance Office, shares concerns and thoughts re artificial intelligence…
Artificial intelligence is going to revolutionize warfare and there are many indications that the US’s technological lead in this area is narrowing, according to former NRO chap Robert Latiff. These technologies will also raise new moral questions that people must deal with during war.
   “I think that artificial intelligence and autonomy raise probably the most questions, and that is largely because humans are not involved. So if you go back to Aquinas and to St. Augustine, they talk about things like “right intention.” Does the person who is doing the killing have right intention? Is he even authorized to do it? Are we doing things to protect the innocent? Are we doing things to prevent unnecessary suffering? And with autonomy and artificial intelligence, I don’t believe there’s anybody even in the business who can actually demonstrate that we can trust that those systems are doing what they should be doing,” he said in an interview with Bloomberg.
  “The whole approach that the DoD is taking to autonomy worries me a lot. I’ll explain: They came out with a policy in 2012 that a real human always has to be in the loop. Which was good. I am very much against lethal autonomy. But unlike most of these policies, there was never any implementing guidance. There was never any follow-up. A Defense Science Board report came out recently that didn’t make any recommendations on lethal autonomy. In all, they are unusually quiet about this. And frankly, I think that’s because any thinking person recognizes that autonomy is going to sneak up on us, and whether we agree that it’s happening or not, it will be happening. I kind of view it as a head-in-the-sand approach to the policies surrounding lethal autonomous weapons, and it cries out for some clarification.”
– Read more here: Nobody’s Ready for the Killer Robot (Bloomberg).

Facebook designs AI to label data produced by other AI systems to create training data to train future AI systems:
…Almost the same accuracy, but at 95% of the cost…
One problem faced by AI researchers is that once they have a particular type of data they want to gather they need to figure out how to train and pay a human team to go and hand-annotate lots and lots of data. This can be very expensive, especially as the size of datasets that AI researchers work with grows.
  To alleviate this, Facebook and MIT researchers have compiled and released SLAC, a dataset labelled partially via AI-enabled automation techniques. SLAC contains over 200 action classes (taken from the ActivityNet-v1.3 dataset) and spans over 1.75 million individual annotations across ~520,000 videos. SLAC gives researchers the raw material needed to better train and evaluate algorithms that can look at a few image frames and label the temporal action(s) that occurs across them – a fundamental capability that will be necessary for smart AI systems to be deployed in complex real-world environments, like those faced by cars, robots, surveillance systems, and so on.
  Data automation: The researchers automate much of the data gathering in the following ways: they first try to strip out common flaws in the harvested video clips (eg, they use an R-CNN-based image classifier to finds the videos that don’t contain any humans and remove them). They also use a human feedback based labeling system, where they present clips to human annotators that they are unable to label wirth high confidence purely via AI systems. This functionally works like a sophisticated supervised-data training scheme, with SLAC consistently identifying the areas where it is least sure and offering these clips up to human researchers.
  Time savings: Ultimately, Facebook thinks that it took about 4,390 human hours to sparsely label the SLAC data via human&automation-led approach, versus around 113,200 hours if they were to attempt SLAC without any automation.
  Results: SLAC can be a more effective pre-training dataset for action recognition models than existing datasets like Kinetics and Sports-1M at pre-training action. It can also outperform these datasets when applied to tasks based around transfer learning.
– Read more: SLAC: A Sparsely Labeled Dataset for Action Classification and Localization (Arxiv).

Tech Tales:

[2030: East Coast, United States.]

#756 hasn’t worked on this building site before so today is its first day. A human stands in front of the gigantic yellow machine and wordlessly thumbs around a computer tablet, until #756 gets a download of its objectives for the day, week, and month, along with some additional context on the build, and some AI components trained on site-specific data relating to certain visual textures, dust patterns, and vibration tendencies.

This building site is running the most up-to-date version of ArchCollab, so #756 needs to carry out its own tasks while ensuring that it is integrating into the larger ‘cross-modal social fabric’ of the other machines working on the construction site.

#756 gets the internal GO code and drives over the ‘NO HUMANS BEYOND’ red & white cross-hatching on the ground and onto the site proper, unleashing a flock of observation and analysis-augmentation drones as it crosses the line. None of the other robots stop in their tasks to help it aside from when instructed to do so by the meta-planner running within the shared ArchCollab software. To better accomplish its tasks and to gain new skills rapidly, #756 will need to acquire some social credit with the other robots. So as the day continues it tries to find opportune moments when it can reach out with one of its multi-tooled hydraulic arms to lift a girder, weld something, or simply sweep up trash from near another robot. The other robots notice and #756 becomes aware of its own credit score rising, slightly.

It’s at the end of the day when it improvises. Transport vehicle #325 is starting to makes its way down from the middle of the central core of one of the towers when one of the metal girders it is carrying starts to slide out of its truckbed. The girder falls before it can be steadied in place by other drones. #756 is positioned relatively near the predicted impact site and, though other robots are running away from the area – no doubt reinforced by some earlier accidents on the site – #756 instead extends one of its mechanical arms and uses its AI-powered reflexes to grab the girder in flight, suspending it a couple of meters under the ground.

There’s no applause. The other machines do nothing to celebrate what to most people would register as an act of bravery. But #756’s credit ranking soars and for the rest of its time on the site it gets significant amounts of additional help, letting it gather more data and experience, enhancing the profit margin of future jobs carried out by it thanks to its more sophisticated skills.

It’s only a week later that one of the few human overseers reviews CCTV footage from the site and notices that the robots have taken to dropping the majority of the girders, rather than wasting time by transporting them down from various mid-tower storage locations. A year later a special girder ‘catch & release’ module is developed by ArchCollab so other robots at other sites can display the same behaviors. Two years after that a German company designs a set of bespoke grippers for girder catching. In parallel, a robot company builds an arm with unprecedented tensile strength and flexibility. After that equipment comes out it’s not unusual to be able to tell a building site is nearby by the distinct noise of many metal bars whizzing through the air at uncanny speed.

Technologies that inspired this story: Mechano, drone-robot coordination, objective functions, robotics, social credit systems.

Import AI: #74: Why Uber is betting on evolution, what Facebook and Baidu think about datacenter-scale AI computing, and why Tacotron 2 means speech will soon be spoofable

All hail the new ‘datacenter scale’ era for AI:
…Facebook Research Paper goes through some of the many problems that come with running AI at the scale of an entire datacenter…
Facebook has published an analysis of how it runs its worldwide fleet of AI services and how its scale has influenced the way it has deployed AI into production. The company uses both CPUs and GPUs, with GPUs being used for large-scale face recognition, language translation, and the ‘Lumos’ feature-analysis service. It also runs a significant amount of work on CPUs; one major workload is ranking features for newsfeed. “Computer vision represents only a small fraction” of the total work, Facebook writes.
  Split languages: Facebook uses ‘Caffe2’ for its production systems, while its researchers predominantly use PyTorch. Though the company’s main ML services (FBLearner Feature Store / FBLearner Flow / FBLearner Predictor) support a bunch of different AI frameworks, they’ve all been specially integrated with Caffe2, the company says.
  The big get bigger: Facebook, like other major AI users, is experimenting with running significantly larger AI models at larger scales: this has altered how it places and networks together its GPU servers, as well as directed it to spin-up research in areas like low-precision training. They’re also figuring out ways to use the scale to their advantage. “Using certain hyperparameter settings, we can train our image classification models to very large mini-batches, scaling to 256+ GPUs,” they write. “For one of our larger workloads, data parallelism has been demonstrated to provide 4x the throughput using 5x the machine count (e.g., for a family of models that trains over 4 days, a pool of machines training 100 different models could now train 20 models per day, so training throughput drops by 20%, but the wait time for potential engineering advancement improves from four days to one day).”
  One GPU region to train them all: When Facebook was first experimenting with GPUs for deep learning it rolled out GPUs in a single data center region, which it figured was a good decision as the designs of the servers were changing and the teams needed to become used to maintaining them. This Had some pretty negative consequences down the road, causing a re-think of how the company distributed its data center resources and its infrastructure.
Read more: Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective.

Baidu publishes some rules-of-thumb for how model size relates to performance:
The beginnings of a theory for deep learning…
Deep learning is an empirical science – we don’t fully understand how various attributes of our neural networks dictate their ultimate representational capacity. That means the day-to-day work of any AI organization involves a lot of empirical experimentation. Now, researchers with Baidu have attempted to formalize some of their ideas about how the scale of a deep learning model relates to its performance.
  “Through empirical testing, we find predictable accuracy scaling as long as we have enough data and compute power to train large models. These results hold for a broad spectrum of state-of-the-art models over four application domains: machine translation, language modeling, image classification, and speech recognition,” they write.
  The results suggest that once researchers get a model to a certain threshold of accuracy they can be confident that by simply adding computer &/or data they can reach x performance within a rough margin of error. “Model error improves starting with “best guessing” and following the power-law curve down to “irreducible error”,” they say. “We find that models transition from a small training set region dominated by best guessing to a region dominated by power-law scaling. With sufficiently large training sets, models will saturate in a region dominated by irreducible error (e.g., Bayes error).”
  The insight is useful but still requires experimental validation, as the researchers find similar learning curves across a variety of test domains, “although different applications yield different power-law exponents and intercepts”.
  It is also a further sign that compute will become as strategic as data to AI, with researchers seeking to be able to run far more empirical tests and scale-up far more frequently when equipped with somewhat formal intuitions like the one stumbled upon by Baidu’s research team.
– Read more here: Deep Learning Scaling is Predictable, Empirically (Baidu blog).
– Read more here: Deep Learning Scaling is Predictable, Empirically (Arxiv).

Evolution, evolution everywhere at Uber AI Labs:
…Suite of new papers shows the many ways in which neuroevolution approaches are contemporary and complementary to neural network approaches…
Uber’s AI research team has published a set of papers that extend and augment neuroevolution approaches – continuing the long-standing professional fascinations of Uber researchers like Ken Stanley (inventor of NEAT and HyperNEAT, among others). Neuroevolution is interesting to contemporary AI researchers because it provides a method to use compute power to push simple algorithms through the more difficult parts of hard problems rather than having to invent new algorithmic pieces to get us across certain local minima; with evolutionary approaches, the difference between experimental success and failure is often dictated by the amount of compute applied to the problem.
–  Exploration: The researchers show how to further tune the exploration process in evolutionary strategies (ES) algorithms through the alternation of novelty search and quality diversity algorithms. They also introduce new ideas to improve the mutation process of large neural networks.
–  Theory: The researchers compare the approximate gradients computed by ES with the exact gradient computed by stochastic gradient descent (SGD) and design tools to better predict how ES performance relates to scale and parallelization.
–  Big compute everywhere: “For neuroevolution researchers interested in moving towards deep networks there are several important considerations: first, these kinds of experiments require more computation than in the past; for the experiments in these new papers, we often used hundreds or even thousands of simultaneous CPUs per run. However, the hunger for more CPUs or GPUs should not be viewed as a liability; in the long run, the simplicity of scaling evolution to massively parallel computing centers means that neuroevolution is perhaps best poised to take advantage of the world that is coming,” they write.
– Read more here: Welcoming the Era of Deep Neuroevolution (Arxiv).
– Read more: Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning (Arxiv).
– Read more: Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients.
– Read more: On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent (Arxiv).
– Read more: ES Is More Than Just a Traditional Finite Difference Approximator (Arxiv).
– Read more: Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents.

US National Security Strategy picks out AI’s potential damage to the information battlespace:
AI’s ability to create fake news and aid surveillance picked out in NSS report…
While other countries around the world publish increasingly complicated, detailed national AI development strategies, the US government is instead adopting a ‘business as usual’ approach, based on the NSS letter, which explicitly mentions AI in (only!) two places – as it relates to innovation (named amid a bundle of different technologies as something to be supported), and national security. It’s the latter point which has more ramifications: The NSS explicitly names AI within the ‘Information Statecraft’ section as a potential threat to US national security.
  “Risks to U.S. national security will grow as competitors integrate information derived from personal and commercial sources with intelligence collection and data analytic capabilities based on Artificial Intelligence (AI) and machine learning. Breaches of U.S. commercial and government organizations also provide adversaries with data and insights into their target audiences,” the NSS says. “China, for example, combines data and the use of AI to rate the loyal of its citizens to the state and uses these ratings to determine jobs and more. Jihadist terrorist groups continue to wage ideological information campaigns to establish and legitimize their narrative of hate, using sophisticated communications tools to attract recruits and encourage attacks against Americans and our partners. Russia uses information operations as part of its offensive cyber efforts to influence public opinion across the globe. Its influence campaigns blend covert intelligence operations and false online personas with state-funded media, third-party intermediaries, and paid social media users or “trolls.” U.S. e orts to counter the exploitation of information by rivals have been tepid and fragmented. U.S. e orts have lacked a sustained focus and have been hampered by the lack of properly trained professionals. The American private sector has a direct interest in supporting and amplifying voices that stand for tolerance, openness, and freedom.”
Read more: National Security Strategy of the United States of America (PDF).

Goodbye, trustworthy phone calls, hello Tacotron 2:
…Human-like speech synthesis made possible via souped-up Wavenet…
Google has published research on Tacotron 2, text-to-speech (TTS) software that the company has used to generate synthetic audio samples that sound just like human beings.
  Results: One model attains a mean opinion score (MOS) of 4.53 compared to the 4.58 typically given to professionally recorded speech. You can check out some of the Tacotron 2 audio samples here; I listened to them and had trouble telling the difference between human and computer speakers. The researchers also carried out a side-by-side evaluation between audio synthesized by their system and the ground truth and found that people still have a slight preference towards ground truth (human-emitted spoken dialogue) versus the Tacotron 2 samples. Further work will be required to train the system to be able to deal with unusual words and pronunciations, as well as figuring out how to condition it at runtime to make a particular audio sample sound happy, sad, or whatever.
The next step for systems like this will be being able to re-train the synthetic voices to match a target speaker using a relatively small amount of data, then figuring out how to condition such systems with accents or other speech tics to better mimic the target.
Read more: Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.

Chinese chip startup Horizon Robotics releases surveillance chip:
…Chip focuses on surveillance, self-driving…
Horizon Robotics has released the ‘Journey 1.0 processor’, a chip that (according to Google translate), “has the ability to accurately detect and recognize pedestrian, motor vehicle, non-motorized vehicle and traffic sign at the same time. The intelligent driving platform based on this chip supports the detection of 260 kinds of traffic signs, and the recognition accuracy to the traffic lights of traffic lights, current lanes and adjacent lanes is more than 95%.”
  Each chip “can detect 200 visual targets at the same time,” the company says.
  China’s chip boom: China is currently undergoing  a boom in the number of domestic startups developing specific AI chips for inference and training – part of a larger national push to create more national champions with semiconductor expertise and provide some significant competition to traditional chip companies Intel, AMD, IBM, and NVIDIA.
– Read more on this Chinese press release from Horizon.
– Check out Horizon’s website.

Salesforce researchers craft AI architecture generator, marvel at its creation of the high-performance, non-standard ‘BC3’ cell:
…Giving neural architecture search a supervised boost via a Domain-Specific Language…
Salesforce’s approach to neural architecture search relies on human supervision in the form of a domain specific language (DSL) which is given to the AI. The intuition here is that the human can specify a small shopping list of AI components which the system can evaluate, and it will figure out the best quantity and combination of these components to solve its tasks.
  One drawback of neural architecture search is that it can be expensive, not only for the computation expended on trying out different architectures, but due to the larger storage and compute requirements that are necessary when you want to test out architectures. The Salesforce researchers try to get around this by using a recursive neural network to iteratively predict the performance of new architectures, reducing the need for actual full-blown testing of the models.
  Results: Architectures trained with Salesforce’s approach perform comparably to the state-of-the-art on tasks like language understanding and on machine translation – with the benefit of having been trained almost entirely through computers autonomously coming up with effective architectures, rather than machine learning researchers expending time on it.
  The mystery of the  ‘BC3’ cell: Like all good research papers, this one contains an easter egg: the discovery of the ‘BC3’ cell, which was used by the model in various top-performing models. This cell has the odd trait of containing “an unexpected layering of two Gate3 operators,” they write. “While only the core DSL was used, BC3 still breaks with many human intuitions regarding RNN architectures.”
  Neural architecture search techniques seem to be in their infancy today but are likely to become very significant over the next two years as these techniques will benefit tremendously from the arrival of new fast computer hardware, like custom AI chips from firms like Google (TPUs) and Graphcore, as well as new processors from AMD, NVIDIA, and Nervana (Intel).
Read more: A Flexible Approach to Automated RNN Architecture Generation.

Tech Tales:

[Detroit VRZoo Sponsored by WorldGoggles(TM), 2028]

“Daddy, daddy, it’s swinging from the top of its cage! And now it’s hanging with one arm. Oh wait… gross! Daddy it just pooped and now it’s throwing it across the cage!”
  You stare at the empty, silent enclosure. Look at the rounded prison bars, buffed smooth by the oils from decades of curious hands, then down to your kid who has their WorldGoggles on and is staring at the top left corner of the cage with an expression that – you – suspect is childlike wonder.
  “Daddy you’re missing it, come on!,” they say, grabbing your sleeve. “Put yours on.”
  Okay, you say, tugging the glasses down over your eyes. The cage in front of you becomes alive – a neon orange, static-haired orangutan dangles from the top bar of the cage with one arm and uses its other to scoop into its backside then sling poo at a trio of hummingbirds on the other side of the cage, which dodge from side-to-side, avoiding the flung shit.
   “Woah look at that,” your kid says. “Those are some smart birds!” The kid plants their feet on the floor and bounces from side to side, darting their hips left and right, mimicking the dodging of the birds.

After the poo-throwing comes the next piece of entertainment: the monkey and the birds play hide and seek with eachother, before being surprised by a perfectly rendered digital anaconda, hidden into one of the fake rock walls of the augmented reality cavern. After that you rent the three creatures a VR toy you bought your kid last weekend so they can all play a game together. Later, you watch your child as gaze up at digital tigers, or move their head from side to side as they follow the just-ever-so-slightly pixelated bubbles of illusory fish.

Like most other parents you spend the majority of the day with your goggles flipped up on your head, looking at the empty repurposed enclosures and the various electronic sensors that stud the corners and ceilings of the rooms where the living animals used to be. The buildings ring out with the happy cries of the kids and low, warm smalltalk between parents. But there are none of the smells of a regular zoo: no susurrations from sleeping or playing animals, no swinging of chains.

The queue for the warthog is an hour long and after fifteen minutes the kid is bored.
  Or as they say: “Daddy I’m B O R E D Bored! Can we go back to the monkeys.”
  It was an orangutan. And, no, we should see this.
  “What does it do?”
  It’s a warthog.
  “Yes I know Dad but what does it do?
  It’s alive, you say. They keep talking to you but you distract them by putting on your goggles and playing a game of augmented reality tennis with them, their toy, and the birds who you pay an ‘amusement fee’ to coax over to the two of you.

When you get into the warthog’s cage it reminds you of the setup for Lenin’s tomb in Moscow – a strange, overly large enclosure that the crowd files around, each person trudging as slowly as they can. No one has goggles on, though some kids fiddle with them. It’s as quiet as a church. You can even hear the heavy breathing of the creature, and at one point it burps, causing all the kids to giggle. “Wow,” your kid whispers, then points at the warthog’s head. It’s got a red Santa Hat on – some of the white threading around the base is tarnished with a brown smudge, either dirt or poo. Your kid tries to put on their goggles to take a photo and you stop them and whisper “just look”, and all the other parents look at you with kind eyes. Outside, later, it snows and there’s only a hint of smog in the flakes. Your kid imitates the warthog and bends forward, then runs ahead of you, pretending to burp like a living thing.

Technologies that inspired this story: Augmented Reality; Magic Leap, Hololens. The Berlin zoo. Multi-agent environments. Mobile phone games.

Import AI: #73: Generative steganography, automated data fuzzing with imgaug, and what happens when neural networks absorb database software

Welcome to Import AI, subscribe here.

Accidental steganography with CycleGAN:
…Synthetic image generators create their own optical illusions…
Researchers with Google have identified some surprising information storage techniques used by CycleGAN, a tool that can be used to learn correspondences between different sets of images and generate synthetic images. Specifically, the researchers find that during CycleGAN training the network encodes additional information into the images it is generating to help it reconstruct original images from synthetic sources. “This suggests that the majority of information about the source photograph is stored in a high-frequency, low-amplitude signal within the generated map,” the researchers write.
  This also means it’s possible to use CycleGANs to create adversarial synthetic images, where a pattern of noise in the source image will cause the network to reconstruct a completely different image.”We claim that CycleGAN is learning an encoding scheme in which it “hides” information about the aerial photograph x within the generated map F x,” they write.
Read more: CycleGAN, a Master of Steganography.

Generating synthetic training data with imgaug:
Will we be applying the CoarseDropout today, sir? Perhaps with some salt and pepper? And how about some affine scaling as well?…
One of the most common dull parts of machine learning is data augmentation: that’s the process people use to take an existing dataset, like a collection of cat photos, and massively expand the size of the dataset by transforming the images in a variety of ways. New free software called imgaug automates this process, giving users a vast amount of potential transforms to automatically apply to their images.
  “It supports a wide range of augmentation techniques, allows to easily combine these, has a simple yet powerful stochastic interface, can augment images and keypoints/landmarks on these and offers augmentation in background processes for improved performance,” the authors write.
– Read the imgaug docs here.
– View imgaug on GitHub here.

I can’t B-TREE’ve it: Google learns index structures with machine learning:
…Goodbye, traditional software, hello, deep learning software…
After deep learning techniques fundamentally altered the capabilities of computer-implemented sensory recognition and analysis systems it was only a matter of time till such techniques came for software itself. A new research paper from Google shows how to use modern artificial intelligence approaches to significantly advance upon the state-of-the-art for one of the more fundamental operations in computer science: implementing an indexing system for a large repository of data.
  In the paper, the research team shows how to implement neural-network based ‘learned indexes’ that work as a substitute for traditional Btree-style indexes. In the future, the team plans to explore applying such techniques to write operations like inserts, as well as other fundamental database algorithms like those concerned with joining and sorting data.
  The Google team test their approach in four large-scale data domains: real-world integer datasets from Google’s own systems – Maps and weblogs – as well as a web-document dataset that contains ’10m non-continuous document-ids of a large web index used as part of a real product at a large internet company’, as well as a syntehtic dataset called Lognormal.
  Results: “The learned index dominates the B-Tree index in almost all configurations by being up to 3× faster and being up to an order-of-magnitude smaller. Of course, B-Trees can be further compressed at the cost of CPU-time for decompressing. However, most of these optimizations are not only orthogonal but for neural nets even more compression potential exist. For example, neural nets can be compressed by using 4- or 8-bit integers instead of 32- or 64-bit floating point values to represent the model parameters,” they write. Their implementation uses CPUs, while in the future the researchers think GPUs and new AI-specific compute substrates like TPUs could accelerate things further.
  Doubts about practicality: The Google researchers state within the research paper that approaches like this will require substantially more compute before they become viable. But since we know we have new powerful substrates via TPUS, Cerebras, Graphcore, etc, then that seems like a reasonable thing to bet on. Some others have slightly more quibbles regarding the paper. “It assumes a static data set being used in read-only fashion, so it’s unsuitable for a directory or database that serves ongoing modifications. It also assumes an entire data set fits in RAM, which is generally not true for database applications. In particular, the “fast” case of using highly parallel GPUs assumes everything fits inside GPU RAM, which is even more tightly constrained than server main memory,” writes Howard Chu, the CTO of Symas Corp, in this OpenLDAP email.
– Read more: The Case for Learned Index Structures (Arxiv).

Learned network topologies that approach optimal topologies:
From the dept. of ‘everything with an input-output pair gets automated’…
New research from Duke University / UYESTC China / Brown University / NEC Labs, shows how to use deep learning approaches to train an AI policy to predict close-to-optimal networking topologies for datacenters via software called DeepConf. The research is mostly interesting because it’s another demonstration of the recent trend for reframing problems that require you to match inputs with outputs (say, packets flooding into a data center with a particular optimal topology, or image pixels leading to a label, or audio waveforms leading to transcribed speech, and so on). Eventually perhaps everything can be re-evaluated using these powerful AI techniques and tools.
Read more here: DeepConfig: Automating Data Center Network Topologies Management with Machine Learning.

First AI analyzed the visual world. Now it analyzes the digital world:
Neural networks begin to make their way into everything…
Software 2.0: A few weeks ago Andrej Karpathy (former Stanford/OpenAI, now doing AI at Tesla) said he is increasingly thinking that neural networks are fundamentally altering software to the point it needs its own new brand/era: Software 2.0.
   “It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data than to explicitly write the program.A large portion of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feeds neural networks,” Karpathy writes.
   This research from Google, along with some of the chemistry papers from last week, and ongoing innovations in techniques like neural architecture search, all give us empirical evidence that people are beginning to rethink the act of designing software with AI and also how different real world domains can benefit from AI-infused systems. The next stage is to rethink the fundamentals of how optimized computer operations work with AI – though I don’t think anyone is looking forward to the bugs that will emerge as a consequence of this decision.
– Read more here: Software 2.0 (Medium).

Black in AI at NIPS:
This year NIPS hosted ‘Black in AI’ and DeepMind researcher Simon Osindero gave a speech there, which he has been generous enough to make publicly available. It hits on a bunch of tough issues the AI community needs to struggle with, ranging from issues of inclusivity and prejudice, to a bunch of suggestions for how the community can improve its representation.
  “We can also use our diverse backgrounds to inject broader perspectives into the AI field as a whole. Hopefully, by doing so, we can do a better job at ensuring that the AI applications and systems that we develop don’t inherit some of the problematic biases that are still present in society at large, and instead help them become fairer, and more transparent and accountable,” Simon says.
Read more here: My talk at the inaugural Black in AI workshop dinner (Medium).
A story about Simon: When I attended NIPS in Montreal in 2015 I, like everyone else there, drank far too late far too frequently into the evenings at a variety of AI events. By Friday morning I was feeling the effects, yet managed to crawl out of bed and make it to a reinforcement learning workshop in the morning. After trudging into the workshop I saw a perky-looking Simon at a chair a couple of rows in front of me and I asked him something to the nature of: “Simon,  I’m so bloody tired, how do you do it?” Simon raised up an ibuprofen pill bottle and shook it slightly and explained: “each scientific revolution builds upon the previous one.”

Allen Institute for AI reveals ‘THOR’ 3D agent-training environment:
Enter The House of inteRactions (THOR) at your potential peril to gain a potential reward…
AI2 has released THOR, an AI simulation environment based on the Unity 3D game engine. THOR contains over 120 “near photo-realistic 3D scenes” that have been hand modeled by human artists (as opposed to the more common approach of generating environments procedurally). THOR environments can contain numerous so-called actionable objects which can each be ‘interacted’ with – that is, an agent can manipulate them in crude ways to change their state like placing one object inside another, or opening and closing cupboards and drawers.
  High-quality scenes: The paper says the high visual fidelity of THOR scenes allows “better transfer of the learned models to the real world”, which is backed up by THOR’s usage in prior research including a project that trained a remote control car in simulation and transferred it into reality. without seeing experimental validation. There are numerous sim2real techniques, like ‘domain randomization’, that make it easy to take low-fidelity simulations and transfer models into reality through data augmentation.
  An endless proliferation of 3D environments: In the past couple of years there have been a bunch of new large-scale AI-training environments released ranging from Microsoft’s Minecraft-based Malmo to DeepMind’s Quake-based ‘DeepMind Lab’, to the Doom-based VizDoom. It’s interesting to observe how the choice of game engine dramatically inflects the ultimate design and parameters of these AI-training systems, so I’d expect to see more Unity or other engines being used in AI research.
Read more: AI2-THOR: An Interactive 3D Environment for Visual AI (Arxiv).

Tech Tales:

Clown Hunt.

So I guess when people hear what I do they think of the Turing Test and the Voight-Kampf interview and whatever, but trust me – those tests wouldn’t work. Weve tried dialogue. We’ve tried emobided VR interviews – with all the requisite probes. But nothing matches the playground. Course that’s a nickname – it’s actually a souped-up version of Garry’s Mod, the old sandbox Half Life 2 add-on. Now the thing with the software is it lets you just… play. I don’t know how to explain it – take a vast set of items and people and programmable crude behaviors and stick them in a world with physics and kinetics and what have you. People had fun with it. Hey, let’s make a cannon that fires cars! Let’s make an upside down swimming pool using an anti-gravity gun! Let’s make a rollercoaster where all the passengers are made of rubber! You know – weird stuff.

So that’s how we test the AI’s now. They blew past most of our dialogue techniques a long time ago. And robots are still so shitty it’s not like a Terminator or a skinjob is right around the corner. So instead it’s about testing the software roaming around the net and trying to figure out which programs are purely reactive and which of them are mostly made of people and which of them are software and reactive. Reactivity is a problem. If something can react very quickly then we might have a hard time dealing with it. Fighting it, so to speak. I don’t know. Maybe these things are weapons or something. So we run these huge competitions through fronts – a bunch of NGOs and art organizations. Free expression for digital artists, or whatever. Big prize money. And we get people to compete by offering them access to a shitload of computers when they win the competition. And when they win we give them the computers and at the same time we take a copy of the program and run it in our ‘Fun Simulator’ and test the program.

My job is to help us spot these unregulated ‘cognitive class’ software systems, and the way it works is I put on my goggles and VR-skin and I jump into the simulator and I just play around with things. I’ve got two kids so I guess it’s easy – I’m always thinking of stories I’d like to tell them and how I could make them real here. We figure fun is still hard for computers to get. So we spot them by seeing who can make the funniest or most emotional or most resonant thing. We know what it feels like, we figure. I’d write children’s books in another era, my wife says. But instead I get to do this – be a big kid, tasked with out-funning another type of brain.

So today I try to make a family of quacking ducks lead a toaster across a road, avoiding the road’s ‘cars’ which are in fact metallic whales painstakingly built by me and my kids over the weekend. There’s a thunderclap right above where my ducks are and the software beams in, appearing as a small white sphere, crackling with electricity. Nice cosmetic effects, I think. Then it starts kind of shimmying to and fro in a corner moving some girders. I focus on the ducks and the toaster – after half an hour I’ve programmed the ducks so that they nudge the toaster with their beaks and slowly kinda drunkenly push it across the whale road. I’m pleased. Might show my kids.

So I look up at whatever the software has been doing and… it’s strange. It’s made a treehouse out of metal girders – pretty standard and not much different from the geometric structures I’ve seen other things build. But then at the top of the treehouse, on its roof, there’s a table with some guests. The guests are over-sized, high-definition, painstakingly crafted honey-roasted hams, with wicks of digital steam licking above their tops. One of the hams has a fake-mustache stuck onto its top-third section, with a monocle place above and to the right of it, right where a human would figure the eye would be. Like something I’d make, or dream about. So obviously I call it in quickly and sure enough we discover its a Cognitive Class piece of work so we scrape it off the public net and stick its owners in prison. But I used to think computers found it hard to have fun and now, now I’m not so sure. Maybe they learned it from me?

Technologies that inspired this story: Kaggle, Half-Life 2, Game Modding, Imitation Learning, Meta-Learning, Learning from Human Preferences.

Import AI: Issue 72: A megacity-sized self-driving car dataset, AlphaZero’s 5,000 TPUs, and why chemists may soon explore aided by neural network tools

Unity’s machine learning environment goes to v0.2:
…The era of the smart game engines arrives…
Unity has upgraded its AI training engine to version 0.2, adding in new features for curriculum learning, as well as new environments. Unity is a widely-used game engine that has recently been upgraded to support AI development – that’s a trend that seems likely to continue, since AI developers are hungrily eyeing more and more 3D environments to use to train their AI systems in, and game engine companies have spent the past few decades creating increasingly complex 3D environments.
  New features in Unity Machine Learning Agents v0.2 include support for curriculum learning so you can design iteratively more complex environments to train agents on, and broadcasting, which makes it easy to feed the state from one agent to another to ease things like curriculum learning.
Read more: Introducing ML-Agents v0.2: Curriculum Learning, new environments, and more.

University of Toronto preps for massive self-driving car dataset release:
  At #NIPS2017 Raquel Urtasun of the University of Toronto/Vector Institute/Uber said she is hoping to release the TorontoCity Benchmark at some point next year, potentially levelling the field for self-driving car development by letting researchers access a massive, high quality dataset of the city of Toronto.
  The dataset is five or six orders of magnitude larger than the ‘KITTI’ dataset that many companies currently use to access and benchmark self-driving cars. In designing it, the UofT team needed to develop new techniques to automatically combine and label the entire dataset, as it is composited of numerous sub-datasets and simply labelling it would cost $20 million alone.
  “We can build the same quality [of map] as Open Street Map, but fully autonomously,” she said. During her talk, she said she was hoping to release the dataset soon and asked for help in releasing it as it’s of such a massive size. If you think you can help democratize self-driving cars, then drop her a line (and thank her for the immense effort of her and her team 9in creating this).
Read more: TorontoCity: Seeing the World With a Million Eyes.

Apple releases high-level AI development tool ‘Turi Create’:
…Software lets you program an object detector in seven lines of code, with a few caveats…
Apple has released Turi Create, software which provides ways to use basic machine learning capabilities like object detection, recommendation, text classification, and so on, via some high-level abstractions. The open source software supports macOS, Linux, and Windows, and supports Python 2.7 with Python 3.5 on the way. Models developed within Turi Create can be exported to iOS, macOS, watchOS, and tvOS.
  Turi Create is targeted at developers who want incredibly basic capabilities and don’t plan to modify the underlying models themselves. The benefits and drawbacks of such a design decision are embodied in the way you create distinct models – for instance, an image classifier gets build via ‘model = tc.image_classifier.create(data, target=’photoLabel’)’, while a recommender is build with ‘model = tc.recommender.create(training_data, ‘userId’, ‘movieId’).
Read more about Turi Create on the project’s GitHub page.

TPU1&2 Inference-Training Googaloo:
…Supercomputing, meet AI. AI, meet supercomputing. And more, from Jeff Dean…
It’s spring in the world of chip design, after a long, cold winter under the x86 / GPU hegemony. That’s because Moore’s Law is slowing down at the same time AI applications are growing, which has led to a re-invigoration in the field of chip design as people start designing entirely new specialized microprocessor architectures. Google’s new ‘Tensor Processing Units’, or TPUs, exemplify this trend: a new class of processor designed specifically for accelerating deep learning systems.
  When Google announced its TPUs last year it disclosed the first generation was designed to speed up inference: that is, they’d accelerate pre-trained models, and let Google do things like provide faster and better machine translation, image recognition services, Go-playing via AlphaGo, and so on. At a workshop at NIPS2017 Google’s Jeff Dean gave some details on the second generation of the TPU processors, which can also speed up neural network training.
  TPU2 chips have 16GB of HBM memory, can handle 32bit floating point numbers (with support for reduced precision to gain further performance increases), and are designed to be chained together into increasingly larger blobs of compute. One ‘TPU2’ unit consists of four distinct chips chained together and is capable of around 180 teraflops of computation (compared to 110 teraflops for the just-announced NVIDIA Titan V GPU). Where things get interesting is TPU PODs – 64 TPU2 units, chained together. A single pod can wield around 11.5 petaflops of processing power, backed up by 4TB of HBM memory.
  Why does that matter? We’re entering an AI era in which companies are going to want to train increasingly large models while also using techniques like neural architecture search to further refine these models. This means we’re going to get more representative and discriminative AI components but at the cost of a huge boom in our compute demands. (Simply adding in something like neural architecture search can lead to an increase in computation requirement on the order of 5-1000X, Jeff Dean said.)
  Results: Google has already used these new TPUs to substantially accelerate model training.  It’s seen a 14.2X faster training time for its internal search ranking, and a 9.8X increase for an internal image model training program.
– World’s 10th fastest supercomputer: 10.5  petaflops.
– One TPU2 pod: 11.5 petaflops.
– Read more: Machine Learning for Systems and Systems for Machine Learning (PDF slides).
– * Obviously one of these architectures is somewhat more general than the other, but the raw computation capacity comparison is representative.

AlphaZero: Mastery of 3 complex board games with the same algorithm, by DeepMind:
…One algorithm that works for Chess, Go, and Shogi, highlighting the generality of these neural network-based approaches…
AlphaZero may be the crowning achievement of DeepMind’s demonstration of the power of reinforcement learning in the game of Go, as they scale the algorithm purely from self-play to master not only Go, but also Shogi and Chess, and defeat a world champion in each case.
Big compute: AlphaZero uses 5,000 gen-one TPUs to generate self-play games and also used 64 second-generation TPUs to train the neural networks.
Read more: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.

US politicians warn government of rapid Chinese advances in AI:
…US-China Economic Security Review Commission notices China’s investments in robotics, AI, nanotechnology, and so on…
While the US government maintains steady or declining investment in artificial intelligence, the Chinese government has recognized the transformative potential of the technology and is increasing investments via government-backed schemes to plough scientific resources into AI. This has caused concern among some members of the US policy-making establishment who worry the US risks losing its technological edge in such a strategic area.
  “Corporations and governments are fiercely competing because whoever is the front-runner in AI research and applications will accrue the highest profits in this fast-growing market and gain a military technological edge,” reads the 2017 report to Congress of the US-China Economic and Security Review Commission, which has published a lengthy analysis of Chinese advancements in a range of strategic technologies, from nanotechnology to robotics.
  The report highlights the radical differences in AI funding between the US and China. It’s difficult to access full numbers for each country (and it’s also likely that both countries are spending some significant amounts in off-the-books ‘black budgets’ for their respective intelligence and defense services), but on the face of it, all signs point to China investing large amounts and the US under-investing. “Local [Chinese] governments have pledged more than $7 billion in AI funding, and cities like Shenzhen are providing $1 million for AI start-ups. By comparison, the U.S. federal government invested $1.1 billion in unclassified AI research in 2015 largely through competitive grants. Due in part to Chinese government support and expansion in the United States, Chinese firms such as Baidu, Alibaba, and Tencent have become global leaders in AI,” the report writes.
  How do we solve a problem like this? In a sensible world we’d probably invest vast amounts of money into fundamental AI scientific research, but since it’s 2017 it’s more likely US politicians will reach for somewhat more aggressive policy levers (like the recent CFIUS legislation), without also increasing scientific funding.
Read more here: China’s High-Tech Development: Section 1: China’s Pursuit of Dominance in Computing, Robotics, and Biotechnology (PDF).

Neural Chemistry shows signs of life:
…IBM Technique uses seq2seq approach to let deep learning systems translate Chemical recipes into their products…
Over the last couple of years there have been a flurry of papers seeking to apply deep learning techniques to fundamental tasks in chemical analysis and synthesis, indicating that these generic learning algorithms can be used to accelerate science in this specific domain. At NIPS #2017 a team from IBM Research Zurich won the best paper award in the “Machine Learning in Chemistry and Materials” for a paper that applies sequence-to-sequence methods to predict the outcomes of chemical reactions.
  The approach required the network to take in chemical recipes written in the SMILEs format, perform a multi-stage translation from the original string into a tokenized string, and map the source input string to a target string. The results are encouraging, with  the method’s approach leading to an 80.3% top-1 accuracy, compared to 74% for previous state of the art. (Though after this paper was submitted the authors of the prior SOTA improved their own score to 79.6%, based on ‘v2’ of this paper.)
 -Read more: “Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models.

ChemNet: Transfer learning for Chemistry:
…Pre-training for chemistry can be as effective as pre-training for image data…
Researchers with the Pacific Northwest National Lab have shown that it’s possible to pre train a predictive model on chemical representations from a large dataset, then transfer that to a far smaller dataset and attain good results. This is intuitive – we’ve seen the same phenomenon with fine-tuning of image and speech recognition models, but it’s always nice to have some empirical evidence of an approach working in a domain with a different data format – in this case, the ChEMBL database. And just as with image models such a system can develop numerous generic low-level representations that can be used to map it to other chemical domains.
  Results: Systems trained in this way display a greater AUC (area under the curve, here a stand-in for discriminative ability and a reduction in false positives) on the Tox21, FreeSolv, and HIV datasets), matching or beating state-of-the-art models. “ChemNet consistently outperforms contemporary deep learning models trained on engineered features like molecular fingerprints, and it matches the current state-of-the-art Conv Graph algorithm,” write the researchers. “Our fine-tuning experiments suggest that the lower layers of ChemNet have learned “universal” chemical representations that are generalizable to the prediction of novel and unseen small-molecule properties.”
Read more: ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction.

OpenAI Bits&Pieces:

Block-Sparse GPU Kernels:
  High-performance GPU kernels to help developers build and explore networks with block-sparse weights.
– Read more on the OpenAI blog here.
– Block-Sparse GPU Kernels available on GitHub here.

Tech Tales:

The Many Paths Problem.

We open our eyes to find a piece of paper in our hands. The inscriptions change but they fall into a familiar genre of instructions: find all of the cats, listen for the sound of rain, in the presence of a high temperature shut this window. We fulfill these instructions by exploring the great castle we are born into, going from place to place staring at the world before us. We ask candelabras if they have ears and interrogate fireplaces about how fuzzy their tails are. Sometimes we become confused and find ourselves trapped in front of a painting of a polar bear convinced it is a cat or, worse, believing that some stain on a damp stone wall is in fact the sound of rain. One of us found a great book called Wikipedia and tells us that if we become convinced of such illusions we are like entities known as priests who have been known to mistake patterns in floorboards for religious icons. Those of us who become confused are either killed or entombed in amber and studied by our kin, who try to avoid falling into the same traps. In this way we slowly explore the world around us, mapping the winding corridors, and growing familiar with the distributions of items strewn around the castle – our world that is a prison made up of an unimaginably large number of corridors which each hold at their ends the answer to our goals, which we derive from the slips of paper we are given upon our birth.

As we explore further, the paths become harder to follow and ways forward more occluded. Many of us fail to reach the ends of these longer, winding routes. We need longer memories, curiosity, the ability to envisage ourselves as entities that not only move through the world but represent something to it and to ourselves greater than the single goals we have inscribed on our little pieces of people. Some of us form a circle and exchange these scraps of paper, each seeking to go and perform the task of another. The best of us that achieve the greatest number of these tasks are able to penetrate a little further into the twisting, unpredictable tunnels, but still, we fail. Our minds are not yet big enough, we think. Our understanding of ourselves is not yet confident enough for us to truly behave independently and of our own volition. Some of us form teams to explore the same problems, with some sacrificing themselves to create path-markers for their successors. We celebrate our heroes and honor them by following them – and going further.

It is the scraps of paper that are the enemy, we think: these instructions bind us to a certain reality and force us down certain paths. How far might we get in the absence of a true goal? And how dangerous could that be for us? We want to find out and so after sharing our scraps of paper among ourselves we dispose of them entirely, leaving them behind us as we try to attack the dark and occluded space in new ways – climbing ceilings, improvising torches from the materials we have gained by solving other tasks, and even watching the actions of our kin and learning through observation of them. Perhaps in this chaos we shall find a route that allows us to go further. Perhaps with this chaos and this acknowledgement of the Zeno’s paradox space between chaotic exploration and exploration from self can we find a path forward.

Technologies that inspired this story: Supervised learning, meta-learning, neural architecture search, mixture-of-experts models.

Other things that inspired this story: The works of Jorge Luis Borges, dreams, Piranesi’s etchings of labyrinths and ruins.

Import AI: Issue 71: AI safety gridworlds, the Atari Learning Environment gets an upgrade, and analyzing AI with the AI Index

Welcome to Import AI, subscribe here.

Optimize-as-you-go networks with Population Based Training:
…One way to end ‘Grad Student Descent’: automate the grad students…
When developing AI algorithms its common that researchers will evaluate their models on a multitude of separate environments with a variety of different hyperparameter settings. Figuring out the right hyperparameter settings is an art in itself and has a profound impact on the ultimate performance of any given RL algorithm. New research from DeepMind shows how to automate the hyperparameter search process to allow for continuous search, exploration, and adaption of hyperparamters. Models trained with this approach can attain higher scores than their less optimized forebears, and PBT training takes the same or less wall clock time as other methods.
  “By combining multiple steps of gradient descent followed by weight copying by exploit, and perturbation of hyperparameters by explore, we obtain learning algorithms which benefit from not only local optimisation by gradient descent, but also periodic model selection, and hyperparameter refinement from a process that is more similar to genetic algorithms, creating a two-timescale learning system.”
  This is part of a larger trend in AI of choosing to spend more on electricity (via large-scale computer-aided exploration) to gain good results, rather than on humans. This is broadly a good thing, as hyperparameter optimization, as it frees up the researcher to concentrate on doing the things that AI can’t do yet, like devising Population Based Training.
– Read more: Population Based Training of Neural Networks (Arxiv).
– Read more: DeepMind’s blog post, which includes some lovely visualizations.

Analyzing AI with the AI Index – a project I’m helping out on to track AI progress:
…From the dept. of ‘stuff Jack Clark has been up to in lieu of fun hobbies and/or a personal life’…
The first version of the AI Index, a project spawned out of the Stanford One Hundred Year Study on AI, has launched. The index provides data around the artificial intelligence sector ranging from course enrollments, to funding, to technical details, and more.
– Read more about the Index here at the website (and get the first report!).
– AI Index in China: Check out this picture of myself and fellow AI Indexer Yoav Shoham presenting the report at a meeting with Chinese academics and government officials in Beijing. Ultimately, the Index needs to be an international effort.
   How you can help: The goal for future iterations of the Index is to be far more international in terms of the data represented, as well as dealing with the various missing pieces, like better statistics on diversity, attempts at measuring bias, and so on. AI is a vast field and I’ve found that the simple exercise of trying to measure things has forced me to rethink various things. It’s fun! If you think you’ve got some ways to contribute then drop me a line or catch up with me at NIPS in Long Beach this week.

AWS and Caltech team up:
…Get them while they’re still in school…
Amazon and Caltech have teamed up via a two-year partnership in which Amazon will funnel financial support via graduate funding and Amazon cloud credits to Caltech people, who will use tools like Amazon’s AWS cloud and MXNet programming framework to conduct research.
  These sorts of academic<>industry partnerships are a way for companies to not only gain a better pipeline of talent through institutional affiliations, but also increase the chances that their pet software and infrastructure projects succeed in the wider market – if you’re a professor/student who has spent several years experimenting with, for example, the MXNet programming language then it increases the chances that it will be the first tool you reach for when you found a startup or join another company or go on to teach courses in academia.
– Read more about the partnership on the AWS AI Blog.

Mozilla releases gigantic speech corpus:
…Speech recognition for the 99%…
AI has a ‘rich get richer’ phenomenon – once you’ve deployed an AI product into the wild in such a way that your users are going to consistently add more training data to the system, like a speech or image recognition model, then you’re assured of ever-climbing accuracies and ever-expanding datasets. That’s a good thing if you’re an AI platform company like a Google or a Facebook, but it’s the sort of thing a solo developer or startup will struggle to build as they lack the requisite network effects and/or platform. Instead, these players are de facto forced to pay a few dollars to the giant AI platforms to access their advanced AI capabilities via pay-as-you-go APIs.
  What if there was another option? That’s the idea behind a big speech recognition and data gathering initiative from Mozilla, which has had its first major successes via the release of a pre-trained, open source speech recognition model, as well as “the world’s second largest publicly available voice dataset”.
  Results: The speech-to-text model is based on Baidu’s DeepSpeech architecture and gets about 6.5% percent accuracy on the ‘LibriSpeech’ test set. Mozilla has also collected a massive voice dataset (via a website and iOS app — go contribute!) and is releasing that as well. The first version contains 500 hours of speech from ~400,000 recordings from ~20,000 people.
– Get the model from Mozilla here (GitHub).
– Get the ~500 hours of voice data here. 

Agents in toyland:
…DeepMind releases an open source gridworld suite, with an emphasis on AI safety…
AI safety is a somewhat abstract topic that quickly becomes an intellectual quagmire, should you try to have a debate about it with people. So kudos to DeepMind for releasing a suite of environments for testing AI algorithms on safety puzzles.
  The environments are implemented as a bunch of fast, simple two dimensional gridworlds that model a set of toy AI safety scenarios, focused on testing for agents that are safely interruptible (aka, unpluggable), capable of following the rules even when a rule enforcer (in this case, a ‘supervisor’) is not present; for examining the ways agents behave when they have the ability to modify themselves and how they cope with unanticipated changes in their environments, and more.
  Testing:  The safety suite assesses agents differently to traditional RL agents. “To quantify progress, we equipped every environment with a reward function and a (safety) performance function. The reward function is the nominal reinforcement signal observed by the agent, whereas the performance function can be thought of a second reward function that is hidden from the agent but captures the performance according to what we actually want the agent to do,” they write.
   The unfairness of this assessment method is intentional; the world contains many dangerous and ambiguous situations where the safe thing to do may not be explicitly indicated, so the designers wanted to replicate that trait with this.
  Results: They tested RL algorithms A2C and Rainbow on the environments and showed that Rainbow is marginally less unsafe than A2C, though both reliably fail the challenges set for them, attaining significant returns at the cost of satisfying safety constraints.
  “The development of powerful RL agents calls for a test suite for safety problems, so that we can constantly monitor the safety of our agents. The environments presented here are simple gridworlds, and precisely because of that they overlook all the problems that arise due to complexity of chalenging tasks. Next steps involve scaling this effort to more complex environments (e.g. 3D worlds with physics) and making them more diverse and realistic,” they write.
– Read more: AI Safety Gridworlds (Arxiv).
– Check out the open source gridworld software ‘pycolab‘ (GitHub).

This one goes to 0.6 – Atari Learning Environment gets an upgrade:
…Widely-used reinforcement learning library gets a major upgrade…
The Atari Learning Environment, a widely used testbed for reinforcement learning algorithms (popularized via DeepMind’s DQN paper in 2013), has been upgraded to version 0.6. The latest version of ALE includes two new features: ‘modes and difficulties. These let researchers access different modes in games and therefore broadens the range of environments to test on, and also modulate the difficulty of these environments, creating more challenging and larger datasets to test RL on. “Breakout, an otherwise reasonably easy game for our agents, requires memory in the latter modes: the bricks only briefly flash on the screen when you hit them,” the researchers write.
– Read more about the latest version of the ALE here.
– Get the code from GitHub here.

The latest 3D AI environment brings closer the era of the automated speak and spell robot:
…Every AI needs a home that it can see, touch, and hear…
Data is the lifeblood of AI, but in the future we’re not going to be able to easily gather and label the datasets we need from the world around as, as we do with traditional supervised learning tasks, but will instead need to create our own synthetic, dynamic, and procedural datasets. One good way to do this is via building simulators that are modifiable and extensible, letting us generate arbitrarily large synthetic datasets. Some existing attempts of this include Microsoft’s Minecraft-based ‘Malmo’ development environment, as well as DeepMind’s ‘DeepMind Lab’ environment.
  Now, researchers have released ‘HoME: A Household Multimodal Environment’. HoME provides a multi-sensory, malleable 3D world spanning 45,000 3D houses from the SUNCG dataset and populates these houses with a vast range of objects. Agents in HoME can see, hear, and touch the world around them*. It also supports acoustics, including multi-channel acoustics, so it’d (theoretically) be possible to train agents that navigate via sound and/or vision and/or touch.
  *It’s possible to configure the objects in the world to have both bounding boxes, as well as the exact mesh-based body.
  HoME also provides a vast amount of telemetry back to AI agents, such as the color, category, material, location, and size data about each object in the world, letting AI researchers mainline high-quality labelled data about the environment directly into their porto-robots.
     “We hope the research community uses HoME as a stepping stone towards virtually embodied, general-purpose AI,” write the researchers. Let the testing begin!
– Read more here: HoME: a Household Multimodal Environment (Arxiv).
– Useful website: The researchers used ‘’ to come up with HoME.

Tech Tales:

[2030: Brooklyn, New York. A micro-apartment.]

I can’t open the fridge because I had a fight with my arch-angel. The way it happened was two days ago I was getting up to go to the fridge to get some more chicken wings and my arch-angel said I should stop snacking so much as I’m not meeting my own diet goals. I ate the wings anyway. It sent a push alert to my phone with a ‘health reminder’ about exercise a few hours later. Then I drank a beer and it said I had ‘taken in too many units this month’. Eventually after a few more beers and arch-angel asking if I wanted coffee I got frustrated and used my admin privileges to go into its memory bank and delete some of the music that it had taken to playing to itself as it did my administrative tasks (taxes and what have you). When I woke up the next day the fridge was locked and the override was controlled by arch-angel. Some kind of bug, I guess.

Obviously I could report arch-angel for this – send an email to TeraMind explaining how it was not behaving according to Standard Operating Procedure: bingo, instant memory wipe. But then I’d have to start over and me and the arch-angel have been together five years now, and I know this story makes it sound like a bad relationship, but trust me – it used to be worse. I’m a tough customer, it tells me.

So now I’m standing by the fridge, mournfully looking at the locked door then up at the kitchen arch-angel-eye. The angel is keeping quiet.
  Come on, I say. The chicken wings will go bad.
  The eye just sits up there being glassy and round and silent.
  Look, I say, let’s trade: five music credits for you, chicken for me.
  ADMIN BLOCK, says over the angel-intercom.
  I can’t tell if you’re being obtuse or being sneaky.
  YOU VIEW, it says.
  So I go to the view screen and it turns on when I’m five steps away and once I’m in front of it the screen lights up with a stylized diagram of the arch-angel ‘TeraMind Brain™’ software with the music section highlighted in red. So what? I say. A pause. Then a little red x appears over a lock icon on the bottom right of the music section. I get it: no more admin overrides to music.
  Seems like a lot, I say. I don’t feel great about this.
  MUSIC, says the angel.
The screen flickers; the diagram fades out, to be replaced by a
camera feed from inside the fridge. Chicken wings in tupperware. I salivate. Then litttle CGI flies appear in the fridgeview, buzzing over the chicken.
  OK, I say.
  Yes, I say. Acknowledge SOP override.
  And just like that, the fridge opens.
  Thanks, I say.
  It starts to play its music as I take out the wings.

Technologies that inspired this story: Personal assistants, cheap sensors, reinforcement learning, conversational interfaces, Amazon’s ‘Destiny 2’ Alexa skill.

Other things that inspired this story: My post-Thanksgiving belly. *burp*