Import AI: #77: Amazon tests inventory improvement by generating fake customers with GANs, the ImageNet of video arrives, and robots get prettier with Unity-MuJoCo tie-up.

by Jack Clark

Urban flaneurs generate fake cities with GANs:
…Researchers discover same problems that AI researchers have been grappling with, like those relating to interpretability and transparency…
Researchers have used generative adversarial networks to generate a variety of synthetic, fictitious cities. The project shows that “a basic, unconstrained GAN model is able to generate realistic urban patterns that capture the great diversity of urban forms across the globe,” they write. This isn’t particularly surprising since we know GANs can typically approximate the distribution of the data they are fed on – though I suspect the dataset (30,000 images) might be slightly too small to do away with things like over-fitting.
  Date used: The Global Urban Footprint, an inventory of built-up land at 12m/px resolution, compiled by the German Aerospace Center.
  Questions: It’s always instructive to see the questions posed by projects that sit at the border between AI and other disciplines, like geography. For this project, some open questions the researchers are left with include: “How to evaluate the quality of model output in a way that is both quantitative, and interpretable and intuitive for urban planning analysis? How to best disentangle, explore, and control latent space representations of important characteristics of urban spatial maps? How to learn from both observational and simulated data on cities?,” and more.
  Read more here: Modeling urbanization patterns with generative adversarial networks (Arxiv).

The ImageNet of video (possibly) arrives:
…MIT’s ‘Moments in Time’ dataset consists of one million videos, each of which is 3 seconds long…
Much of the recent progress in machine learning has been driven in part by the availability of large-scale datasets providing a sufficiently complex domain to stress-test new scientific approaches against. MIT’s new ‘moments in time’ dataset might just be the dataset we need for video understanding, as it’s far larger than other available open source datasets (eg activitynet, kinetics, UCF, etc), and also has a fairly broad set of initial labels (339 verbs linked to a variety of different actions or activities.)
  Video classification baselines: The researchers also test the new dataset on a set of baselines based on systems that use techniques like residual networks, optical flow, and even sound (via usage of a SoundNet network). These baselines get top-5 accuracies of as high as 50% or so, which means that at least one selection within five proffered by the system is correct. The best performing approach is a ‘temporal relation network’ (TRN). This network attained a score of about 53% and was trained on RGB frames using the InceptionV3 image classification architecture.
  Next:“Future versions of the dataset will include multi-labels action description (i.e. more than one action occurs in most 3-second videos), focus on growing the diversity of agents, and adding temporal transitions between the actions that agents performed,” the researchers write.
   Read more: Moments in time dataset: one million videos for event understanding (Arxiv).

Ugly robots no more: Unity gets a MuJoCo plugin:
…Tried-and-tested physics simulator gets coupled to a high-fidelity game engine…
Developers keen to improve the visual appearance of their AI systems may be pleased to know that MuJoCo has released a plugin for the Unity engine. This will let developers import MuJoCo models directly into Unity then visualize them in snazzier environments.
  “The workflow we envision here is closer to the MuJoCo use case: the executable generated by Unity receives MuJoCo model poses over a socket and renders them, while the actual physics simulation and behavior control take place in the user’s environment running MuJoCo Pro,” write the authors.
  Read more here: MuJoCo Plugin and Unity Integration.

Google censors itself to avoid accidental racism:
…Search company bans search terms to protect itself against photos triggering insulting classifications…
A little over two years ago Google’s Google Photos application displayed an appalling bug: searches for ‘gorillas’ would bring up photos of black people. There was a swift public outcry and Google nerfed its application so it wouldn’t respond to those terms. Two years later, despite ample progress in AI and machine learning, nothing has changed.
  “A Google spokesperson confirmed that “gorilla” was censored from searches and image tags after the 2015 incident, and that “chimp,” “chimpanzee,” and “monkey” are also blocked today. “Image labeling technology is still early and unfortunately it’s nowhere near perfect,” the spokesperson wrote in an email, highlighting a feature of Google Photos that allows users to report mistakes,” Wired reports.
   Read here: When it comes to Gorillas, Google Photos Remains Blind (Wired).

The first ever audiobook containing a song generated by a neural network?
…’Sourdough’ by Robin Sloan features AI-imagined Croatian folk songs…
Here’s a fun blog post by author Robin Sloan about using AI (specifically, SampleRNN) to generate music for his book. Check out the audio samples.
   Read more: Making The Music of the Mazg.

Miri blows past its 2017 funding target thanks to crypto riches:
…Crypto + AI research, sitting in a tree, S-Y-N-E-R-G-I-Z-I-N-G!…
The Machine Intelligence Research Institute in Berkeley has raised over $2.5 million with its 2017 fundraiser, with a significant amount of funding coming from the recent boom in cryptocurrencies.
  66% of funds raised during this fundraiser were in the form of cryptocurrency (mainly Bitcoin and Ethereum),” Miri writes.
   Read more here: Fundraising success! (Miri).

Amazon turns to GANS to simulate e-commerce product demand… and it sort of works!
…As if the e-retailer doesn’t have enough customers, now it’s inventing synthetic ones…
Researchers with Amazon’s Machine Learning team in India have published details on eCommerceGAN, a way to use GANs to generate realistic, synthetic customer and customer order data. This is useful because it lets you test your system for the vast combinatorial space of possible customer orders and, ideally, get better at predicting how new products will match with existing customers, and vice versa.
  “The orders which have been placed in an e-commerce website represent only a tiny fraction of all plausible orders. Exploring the space of all plausible orders could provide important insights into product demands, customer preferences, price estimation, seasonal variations etc., which, if taken into consideration, could directly or indirectly impact revenue and customer satisfaction,” the researchers write.
  The eCommerce GAN (abbreviated to ‘ecGAN’) lets the researchers create a synthetic “dense and low-dimensional representation of e-commerce orders”. They also create an eCommerce-conditional-GAN (ec^2GAN), which lets them “generate the plausible orders involving a particular product”.
  Results: The researchers created a 3D t-SNE map of both real customer orders and GAN-generated ones. The plots showed a strong correlation between the two with very few outliers, suggesting that their ecGAN approach is able to generate data that falls within the distribution of what e-retailers actually see. To test the the ec^2GAN they see if it can conditionally generate orders that have similar customer<>product profiles to real orders – and they succeed. It might sound a bit mundane but this is a significant thing for Amazon: it now has a technique to simulate the ‘long tail’ of customer<>product combinations and as it gets better at predicting these relationships it could theoretically get better at optimizing its supply-chain / just-in-time inventory / marketing campaigns / new product offerings and test groups, and so on.
  Data: The researchers say they “use the products from the apparel category for model training and evaluation. We randomly choose 5 million orders [emphasis mine] made over the last one year in an e-commerce company to train the proposed models.” Note that they don’t specify where this data comes from, though it seems overwhelmingly likely it derives from Amazon since that’s where all the researchers worked during this project, and no other dataset is specified.
   Read more: eCommerceGAN: A Generative Adversarial Network for eCommerce (Arxiv).

Why AI research needs to harness huge amounts of compute to progress:
…The future involves big batches, massive amounts of computation, and, for now, lots of hyperparameter tuning…
AI researchers are effective in proportion to the quantity of experiments they can run over a given time period. This is because deep learning-based AI is predominantly an empirical science, so in the absence of strong theoretical guarantees researchers need to rigorously test algorithms to appropriately debug and develop them.
  That fact has driven recent innovations large-scale distributed training of AI algorithms, initially for traditional classification tasks, like the two following computer vision examples (covered in #69 of Import AI.)
   July, 2017: Facebook trains an ImageNet model in ~1 hour using 256 GPUs.
   November, 2017: Preferred Networks trains ImageNet in ~15 minutes using 1024 NVIDIA P100 GPUs.
   Now, as AI research becomes increasingly focused on developing AI agents that can take actions in the world, the same phenomenon is happening in reinforcement learning, as companies ranging from DeepMind (Ape-X, Gorilla, others) to OpenAI (Evolutionary Strategies, others) try to reduce the wall clock time it takes to run reinforcement learning experiments.
  New research from deepsense.ai, Intel, and the Polish Academy of Sciences shows how to scale-up and tune a Batch Asynchronous Advantage Actor-Critic algorithm with the ADAM optimizer and a large batch size of 2048 to let them learn to competently play a range of Atari games in a matter of minutes; in many cases it takes the system just 20 minutes or so to attain competitive scores on games like Breakout, Boxing, Seaquest, and others. They achieve this by scaling up their algorithm via techniques gleaned from the distributed systems world (eg, parameter surveys, clever things with temporal alignment across different agents, etc), which lets them run their algo across 64 workers comprising 768 distinct CPU cores.
  Next: PPO: The authors note that PPO, a reinforcement learning algorithm developed by OpenAI, is a “promising area of future research” for large-scale distributed reinforcement learning.
   Read more: Distributed Deep Reinforcement Learning: learn how to play Atari games in 21 minutes.

Googlers debunk bogus research into getting neural networks to detect sexual orientation:
“AI is a general-purpose technology that can be used to automate a great many tasks, including ones that should not be undertaken in the first place”…
Last fall, researchers with Stanford published a paper on Arxiv claiming that a neural network-based image classification system they designed could detect sexual orientation more accurately than humans. The study – Deep neural networks are more accurate than humans at detecting sexual orientation from facial images – was criticized for making outlandish claims and widely covered i nthe press. Now, the paper has been accepted for publication in a peer-reviewed academic journal – the Journal of Personality and Social Psychology. This seems to have motivated Google researchers Margaret Mitchell and Blaise Aguera y Arcas, and Princeton professor Alex Todorov, to take a critical look at the research.
  The original study relied on a dataset composed of 35,326 images taken from public profiles on a US dating website as its ground-truth data. You can get a sense of the types of photos present here by creating composite “average” images from the ground-truth labelled data – when you do this you notice some significant differences: the “average” heterosexual male face doesn’t have glasses, while the gay face does, and similarly the faces of the “average” heterosexual females have eyeshadow on them, while the lesbian faces do not.
  Survey:Might it be the case that the algorithm’s ability to detect orientation has little to do with facial structure, but is due rather to patterns in grooming, presentation and lifestyle?” wonder the the Google and Princeton researchers. To analyze this they surveyed 8,000 Americans using Amazon mechanical turk and asked them 77 yes/no questions, ranging from sexual disposition, to whether they have a beard, wear glasses, and so on. The results of the survey seem to roughly track with the “average” images we can extract from the dataset, suggesting that rather than developing a neural network that can infer your fundamental sexual proclivity by looking at you, the researchers have instead built a snazzy classifier that conditions the chance on whether you are gay or straight based on whether you’re wearing makeup or glasses or not.
  To illustrate the problems with the research the Googlers show that they can attain similar classification accuracies to the original experiment purely through asking a series of yes/no questions, with no visual aid. “For example, for pairs of women, one of whom is lesbian, the following not-exactly-superhuman algorithm is on average 63% accurate: if neither or both women wear eyeshadow, flip a coin; otherwise guess that the one who wears eyeshadow is straight, and the other lesbian. Adding six more yes/no questions about presentation (“Do you ever use makeup?”, “Do you have long hair?”, “Do you have short hair?”, “Do you ever use colored lipstick?”, “Do you like how you look in glasses?”, and “Do you work outdoors?”) as additional signals raises the performance to 70%,” they write.
  Alternate paper title: In light of this criticism, perhaps a better title for the paper would be Deep neural networks are more accurate than humans at predicting the correspondence between various garments and makeup and a held-out arbitrary label. But we knew this already, didn’t we?
   Read more here: Do algorithms reveal sexual orientation or just expose our stereotypes?

OpenAI Bits & Pieces:

Science interviews OpenAI’s Tim Salimans for a story about Uber’s recent work on neuroevolution.
   Read more: Artificial intelligence can ‘evolve’ to solve problems (Science).

Tech Tales:

Simulpedia.

You get the call after utilization in the data center ticks up from 70% to 80% all the way to 90% in the course of 24 hours.
   “Takeoff?” you ask on the phone.
   “No,” they say. “Can’t be. This is something else. We’ll have more hardware tomorrow so we can scale with it, but we need you to take a look. We’ve sent a car.”

So you get into the car and go to an airport and fly a few thousand miles and land and get into another blacked-out car which auto-drives to the site and you go through security and then the person who spoke to you on the phone is in front of you saying “it isn’t stopping.”
  “Alright,” you say, “Show me the telemetry.”
  They lead you into a room whose walls are entirely made of computer monitors. Your phone interfaces with the local computer system and after a few seconds you’re able to control the screens and navigate the data. You dive in, studying the jagged lines of learning graphs, the pastel greens and reds of the utilization dashboard, and eventually the blurry squint-and-you’ll-miss-it concept-level representations of some of the larger networks. And all the while you can see utilization in the data center increasing, even as new hardware is integrated.
   “What the hell is it building in there,” you mutter to yourself, before strapping on some VR goggles and going deeper. Navigating high-dimensional representations is a notorious mindbender – most people have a hard time dealing with non-euclidean geometries; corridors that aren’t corridors, different interpretations of “up” and “down” depending on which slice of a plane you’ve become embedded in, spheres that are at once entirely hollow and entirely solid, and so on. You navigate the AI’s embedding, trying to figure out where all of the computation is being expended, while attempting to stop yourself from throwing up the sandwich you had in the car.
   And then, after you grope your way across a bridge which becomes a ceiling which becomes a door that folds out on itself to become the center of a torus, you find it: in one of the panes of the torus looking into one of the newer representational clusters you can see a mirror-image of the larger AI’s structure you’ve been navigating. And beyond that you can make out another torus in the distance containing another pane connecting to another large-scale non-euclidean representation graph. You sigh, take off your glasses, and make a phone call.

“It’s a bug,” you say. “Did you check the recursion limits in the self-representation system?”
   Of course they hadn’t. So much time and computation wasted, all because the AI had just looped into an anti-pattern where it had started trying to simulate itself, leading to the outward indicators of it growing in capability – somewhat richer representations, faster meta-learning, a compute and data footprint growing according to some clear scale law. But those symptoms didn’t bely a greater  intelligence, rather just the AI attempting to elucidate its own Kolmogorov complexity to itself – running endless simulations of itself simulating simulations of itself, to try and understand a greater truth, when in fact it was just a mirrow endlessly refracting upon itself.

Concepts that inspired this story: Procedural maze generators, Kolmogorov complexity, non-euclidean virtual reality (YouTube video).