Import AI

Import AI: #68: Chinese chip companies bet on ASICs over GPUs, AI researchers lobby governments over autonomous weapons, and researchers use new analysis technique to peer into neurons

Welcome to Import AI, subscribe here.

Canadian and Australian researchers lobby their countries to ban development of lethal autonomous weapons:
Scientists foresee the imminent arrival of cheap, powerful, autonomous weapons…
…Canadian and Australian researchers have lobbied their respective governments to ban development of weapons that will kill without ‘meaningful human control’. This comes ahead of the United Nations Conference on the Convention on Certain Conventional Weapons, where nations will gather and discuss the issue.
…Signatories include several of Canada and Australia”s most influential AI researchers, including Geoffrey Hinton (Google/University of Toronto/Vector Institute), Yoshua Bengio (Montreal Institute for Learning Algorithms, and an advisor to many organizations), and Doina Precup (McGill University, DeepMind), among others from Canada; along with many Australian AI researchers including Toby Walsh.
..Autonomous weapons “will permit armed conflict to be fought at a scale greater than ever, and at timescales faster than humans can comprehend. The deadly consequence of this is that machines—not people—will determine who lives and dies. Canada’s AI community does not condone such uses of AI. We want to study, create and promote its beneficial uses”, the Canadian researchers write.
…”As many AI and robotics corporations—including Australian companies—have recently urged, autonomous weapon systems threaten to become the third revolution in warfare. If developed, they will permit armed conflict to be fought at a scale greater than ever before, and at timescales faster than humans can comprehend,” write the Australian researchers.
…Read the letter from Canadian researchers here.
…Read the UNSW Sydney press release and letter from Australian researchers here.

What do the neurons in a neural network really represent?
…Fantastic research by Chris Olah and others at Google shows new techniques to visualize the sorts of features learned by neurons in neural networks, making results of classifications more interpretable.
…Please read the fantastic post on Distill, which is an excellent example of how modern web technologies can make AI research and communications more hands-on and explicable.

Human Priors, or the problem with human biases and reinforcement learning:
Humans use visual priors to rapidly solve new tasks, whereas RL agents learn by manipulating their environment with no assumptions based on the visual appearance…
…Humans are able to master new tasks because they approach the world with a set of cognitive assumptions which allow for useful traits like  object disambiguation and spatial reasoning. How might these priors influence how humans approach solving games, and how might these approaches be different to those chosen by algorithms trained via reinforcement learning?
…In this anonymized ICLR 2018 paper, researchers explore how they can mess with the visual appearance of a computer game to lead to humans needing substantially more time to solve it, whereas algorithms trained via reinforcement learning will only take marginally longer. This shows how humans depend on various visual indicators when trying to solve a game, whereas RL agents behave much more like blind scientists, learning to manipulate their environment without arriving with assumptions derived from the visual world.
…”Once a player recognizes an object (i.e. door, monster, ladder), they seem to possess prior knowledge about how to interact with that object – monsters can be avoided by jumping over them, ladders can be climbed by pressing the up key repeatedly etc. Deep reinforcement learning agents on the other hand do not possess such priors and must learn how to interact with objects by mere hit and trial,” they note.
…Human baselines were derived by having about 30 people play the game(s) via Amazon Mechanical Turk, with the scientists measuring how long it took them to complete the game.
Read more about the research here: Investigating Human Priors for Playing Video Games.

Researchers release data for more than ~1,100 simulated robot soccer matches:
Data represents more than 180 hours of continuous gameplay across ten teams selected from leading competitors within 2016 and 2017 ‘robocup’ matches…
…Researchers have released a dataset of games from the long-running RoboCupSim competition. The data contains the ground truth data from the digital soccer simulator, including the real locations of all players and objects at every point during each roughly ~10 minute game, as well as the somewhat more noisy and incomplete data that is received by each robot deployed in the field.
…One of the stories of AI so far has been the many surprising ways in whcih people use different datasets, so while it’s not immediately obvious what this dataset could be used for I’m sure there are neat possibilities out there. (Motion prediction? Multi-agent studying? Learning a latent representation of individual soccer players? Who knows!)
Read more here: RoboCupSimData: A RoboCup soccer research dataset.

From the Dept. of ‘And you thought AI was weird’: Stitching human and rat brains together:
…In the same way today’s AI researchers like to mix and match common deep learning primitives, I’m wondering if in the future we’ll do the same with different organic brain types…
Neuroscientists have successfully implanted minuscule quantities of human brain tissue (developed from stem cells) into the brains of mice. Some of the human brain samples have lived for as long as two months and have integrated (to a very slight degree) with the rat brains.
…”Mature neurons from the human brain organoid sent axons, the wires that carry electrical signals from one neuron to another, into “multiple regions of the host mouse brain,” according to a team led by Fred “Rusty” Gage of the Salk Institute,” reports StatNews.
…Read more here: Tiny human brain organoids implanted into rodents, triggering ethical concerns.

Hanson Robotics on the value of stunt demos for its robots:
…Makers of the robot Sophia, which was recently granted ‘citizenship’ by the notoriously progressive nation of Saudi Arabia, detail value of stunt demos…
…Ben Goertzel, the chief scientist of Hanson Robotics, makers of the Sophia robot, has neatly explained to The Verge why his company continues to hold so many stunt demonstrations that lead to people having a wildly inaccurate view of what AI and robots are capable of.
“If I tell people I’m using probabilistic logic to do reasoning on how best to prune the backward chaining inference trees that arise in our logic engine, they have no idea what I’m talking about. But if I show them a beautiful smiling robot face, then they get the feeling that AGI may indeed be nearby and viable.” He says there’s a more obvious benefit too: in a world where AI talent and interest is sucked towards big tech companies in Silicon Valley, Sophia can operate as a counter-weight; something that grabs attention, and with that, funding. “What does a startup get out of having massive international publicity?” he says. “This is obvious.”
…So there you have it. Read more in this article by James Vincent at The Verge.

AI and explanation:
…How important is it that we explain AI, can we integrate AI into our existing legal system, and what challenges does it pose to us?…
…When should we demand an explanation from an AI algorithm for why it made a certain decision, and what legal frameworks exist to ingest these explanations so that they make sense with our existing legal system? These are some of the questions researchers with Harvard University, set out to answer in a recent paper.
…Generally, humans expect to be able to get explanations when the decision has an impact on someone other than the decision-maker, indicating that there is some kind of intrinsic value to knowing if a decision was made erroneously or not. Societal norms tend to indicate an explanation should be mandated if there are rational reasons to believe that an error has occurred or will occur in the decision making process as a consequence of the inputs to the process being unreliable or inadequate, or because the outcomes of the process are currently inexplicable, or due to overall distrust in the integrity of the system.
…It seems likely that it’ll be possible to get AI systems to explain themselves in a way that plugs into our existing legal system, the researchers write. This is because they view explanation as being distinct from transparency. They also view explanation as being a kind of augmentation that can be applied to AI systems. This has a neat policy implication, namely that: “regulation around explanation from AI systems should consider the explanation system as distinct from the AI system.”
…What the researchers suggest is that when it is known that an explanation will be required, organizations can structure their algorithms so that the relevant factors are known in advance and the software is structured to provide contextual decision-making explanations relating to those factors.
…Bias: A problem faced by AI designers, though, is that these systems will somewhat thoughtlessly automatically de-anonymize information and in some cases develop biased traits as a consequence of the ingested data. “Currently, we often assume that if the human did not have access to a particular factor, such as race, then it could not have been used in the decision. However, it is very easy for AI systems to reconstruct factors from high-dimensional inputs… Especially with AI systems, excluding a protected category does not mean that a proxy for that category is not being created,” they write. What this means is that: “Regulation must be put in place so that any protected factors collected by AI system designers are used only to ensure that the AI system is designed correctly, and not for other purposes within the organization “.
The benefit of no explanation: AI systems present an opportunity that human decision-makers do not: they can be designed so that the decision-making process does not generate and store any ancillary information about inputs, intermediate steps, and outputs,” the researchers note, before explaining that systems built in this way wouldn’t be able to provide explanations. “Requiring every AI system to explain every decision could result in less efficient systems, forced design choices, and a bias towards explainable but suboptimal outcomes.”
…Read more here: Accountability of AI Under the Law: The Role of Explanation.

*** The Department of Interesting AI Developments in China ***

Chinese startup wins US government facial recognition prize:
…Yitu Tech, a Chinese startup specializing in AI for computer vision, security, robotics, and data analysis, has won the ‘Face Recognition Prize Challenge’ which was hosted by IARPA, an agency whose job is “to envision and lead high-risk, high-payoff research that delivers innovative technology for future overwhelming intelligence advantage.”
…The competition had two components: a round focused on identifying faces in unseen test images; and a round focused on verifying that two photos of two people were of the same person or not. “Both tasks involve “non-cooperative” images where subjects were unaware of the camera or, at least, did not engage with, or pose for, the camera,IARPA and NIST note on the competition website. Yitu won the identification accuracy prize, which is measured by having a small false negative identification rate.
Details about the competition are available here (PDF).
…Read slightly more in Yitu Tech’s press release.
…This isn’t Yitu’s first competition win: it’s also ranked competitively on another ongoing NIST challenge called FRVT (Face Recognition Vendor Test).
…You can check out the barely readable NIST results here: PDF.

Dawn of the NVIDIA-killing deep learning ASICS:
…China’s national development strategy depends on it developing cutting-edge technical capabilities, including in AI hardware. Its private sector is already producing novel new computational substrates, including chips from Bitcoin company Bitmain and state-backed chip company Cambricon...
AI chips are one of the eight ‘Key General Technologies’ identiied by China as being crucial to its national AI strategy (translation available here). Building off of the country’s success in designing its own semiconductors for use in the high-performance computing market (the world’s fastest supercomputer runs on semiconductors based on Chinese IP), the Chinese government and private sector is now turning its attention to the creation of processors customized for neural network training and inference – and the results are already flooding in.
Bitmain is a large bitcoin-mining company, is using the skills it has gained in building custom chips for mining cryptocurrency to develop separate hardware to use to train and run deep learning-based AI systems. It has just given details on its first major chip, the Sophon BM1680.
The details: The Sophon is an application specific integrated circuit (ASIC) for deep learning training and inference. Each chip contains 64 NPUS (neural processing units), which each has 64 sub-chips. Bitmain is selling these chips within ‘SC1’ and ‘SC1+’ server cards, the second of which chains two BM1680s together.
Framework support: Caffe, Darknet, TensorFlow, MXNet, and others.
But what is it for? Bitmain has demonstrated the chips being used for “production-scale video analytics for the surveillance industry” including motor/non-motor vehicle and pedestrian detection, and more, though I haven’t seen them referenced in a detailed research paper yet.
…Pricing: The SC1 costs $589 and has a TDP of 85W. The SC1+ isn’t available at this time.
…Read more here: BITMAIN launches SOPHON Tensor Processors and AI Solutions.
China’s state-backed AI chip startup unfurls AI processors:
Cambricon plans to expand to control 30% of China’s semiconductor IP market…
Cambricon, a state-backed Chinese semiconductor company, has released two chips – the Cambrian-1H8 for low-power computer vision applications, and the more powerful Cambrian-1H16; announced plans to release a third chip specialized for self-driving cars; and released AI software called Cambrian NeuWare. It plans to release a range of ‘MLU’ server AI chips in 2018 as well, it said.
…“We hope that Cambricon will soon occupy 30% of China’s IP market and embed one billion device worldwide with our chips. We are working side-by-side with and are on the same page with global manufacturers on this,” says the company’s CEO Tianshi Chen.
…Read more here: AI Chip Explosion: Cambricon’s Billion-Device Ambition.
CHIP WARS:
Check out this fantastic chart from Ark Invest showing the current roster of deep learning chip companies.

OpenAI Bits&Pieces:

Former OpenAI staffers and other researchers launch robot startup:
Embodied Intelligence aims to use imitation learning, learning from demonstrations, and few-shot / meta-learning approaches, to expand capabilities of industrial robots.
Read more: Embody.ai
Creating interpretable agents with iterative curriculums:
…Read more: Interpretable and Pedagogical Examples.

Tech Tales:

When the machines came, the artists rejoiced: new minds gave them new tools and mediums through which to propagate their views. When the computer artists came, the human artists rejoiced: new minds led to new aesthetics designed according to different rules and biases than those adopted by humans. But after some years the human artists stopped rejoicing as automatic computer generation, synthesis, and re-synthesis of art approached a frequency so extreme that humans struggled to keep up, finding them unable to place themselves, creatively, within their aesthetic universe.

The pall spread as a fog, imperceptible at first, but apparent after many years. The forward march of ‘culture’ became hard to discern. What does it mean to go up or do or left or right when you live in an infinite ever-expanding universe? These sorts of questions, long the fascination of academics specializing in maths and physics and fundamental philosophy, took on a real sense of import and weight. How, people wondered, do we navigate ourselves forward in this world of ceaseless digital creation? Where is the place that we aim for? What is our goal and how is ti different to the aesthetic pathways being explored by the machines? Whenever a new manifesto was issued it would be taken up and its words would echo around and through the world, until it was absorbed by other ideas and picked apart by other ideologies and dissembled and re-laundered into other intellectual or visual frameworks. Eventually the machines began to produce their own weighty, poorly read (even by other AIs) critical journals, coming up with essays that in title, form, and content, were hard to tell apart from the work of human graduate students: In search of meaning in an age of repetition and hypernormalization: Diatribes from the Adam Curtis Universe / The Dark Carnival, Juggalos, Antifa, and the New American RIght: An exploration / Where Are We Right Now: Geolocation & The Decline of Mystery in Daily Life.

The intellectual world eventually became like a hall of mirrors, where the arrival of any new idea would be almost instantly followed by the distortion, replication, and propagation of this idea, until the altered versions of itself outgrew the original – usually in as little time as it takes for photons to bounce from one part of a narrow corridor to another.

Technologies that inspired this story: GANGogh: Creating Art with GANs; Wavenet.

Import AI: #67: Inspecting AI with RNNVis; Facebook invents counter-intuitive language translation method; and what fractals have to do with neural architecture search

Welcome to Import AI, subscribe here.

All hail the AI inspectors: New ‘RNNVis’ software makes it easier to interpret the inner workings of recurrent nets.
…Figuring out why a particular neural network is classifying something in a certain way is a challenge. Engineers are trying to make that easier with new software to help visualize the hidden states of recurrent neural networks for text classification, giving people some graphical software to use to analyze neural network data.
…The RNNVis software helps people interpret the hidden states of RNNs by providing information about the distribution of hidden states, letting them explore hidden states at the sequence level, examine statistics of these states, and compare learning outcome of models. Some examples of how it can be used can include analyzing two subtly different sentences with slightly different sentiment classifications, inspecting which words and tones the network is activating on to help figure out why the classification is different.
…Read more here: Understanding Hidden Memories of Recurrent Neural Networks.

Facebook learns to translate between languages without access to shared texts:
…It-shouldn’t-work-but-it-does result shows how to translate by mapping different texts into a shared latent representational space…
…New research from Facebook shows how to translate from one language into another without the availability of a shared jointly-translated text. The process works by converting sentences from different languages into noisier representations of themselves, then learning to decode the noisy versions. An adversarial training technique is used to constrain the problem so that it reduces its inaccuracies over time by closing the distance between the noisy and clean conversions – this lets iterative training yield better performance.
…”The principle of our approach is to start from a simple unsupervised word-by-word translation model, and to iteratively improve this model based on a reconstruction loss, and using a discriminator to align latent distributions of both the source and the target languages,” they write.
…Results: The team compares its approach against supervised baselines. The third iteration of its fully unsupervised model attains a BLEU score of around 32.76 on English to French and 22.74 on English to German on the Multi30k-Task1 evaluation. This compares to supervised baselines of 56.83 and 35.16, respectively. For the WMD evaluation it gets 15.05 on English to French and 9.64 on English to German, compared to 27.97 and 21.33 for supervised.
…Data: I seems to intuitively make sense that the approach is very data intensive. The Facebook researchers use “the full training set of 36 million pairs” from the ‘WMT’14 English-French’ dataset, as well as 1.8 million sentences each of the ‘WMT’16 English-German’ dataset. They also use the Multi30k-Task1 dataset, which has 30,000 images in English/French/German with translations to each of eachother. ” We disregard the images and only consider the parallel annotations, with the provided training, validation and test sets, composed of 29,000, 1,000 and 1,000 pairs of sentences respectively.”
…The power of iteration: To get an idea of the power of this iterative, bootstrapping approach we can study the translation of a sentence from French into English.
…Source: une photo d’ une rue bondee en evill
…Iteration 0: a photo a street crowded in city .
…Iteration 1: a picture of a street crowded in a city .
…Iteration 2: a picture of a crowded city street .
…Iteration 3: a picture of a crowded street in a city .
…Reference: a view of a crowded city street .
…Read more here: Unsupervised Machine Translation Using Monolingual Corpora Only.

Sponsored: No matter what IA stage your organization may sit, the Intelligent Automation – New Orleans conference taking place December 6 – 8, has variety of sessions that will enable your team to prepare for, and/or improve upon their current IA initiatives.
View Agenda.
…Expert speakers from organizations including: AbbVie, AIG, Citi, Citizens Bank, FINRA, Gap Inc., JPMorgan Chase, Lindt & Sprungli, LinkedIn, NASA, Sony, SWBC, Sysco, TXU Energy
…Exclusive 20% Discount: Use Code IA_IMPORTAI

Chinese facial recognition company raises $460 million:
…Megvii Inc, also known as Face++, has raised money from a Chinese state venture fund, as well as others. It sells facial identification services to a variety of companies, including Ant Financial and Lenovo.
…China scale: Megvii “uses facial scans held in a Ministry of Public Security database drawn from legal identification files on about 1.3 billion Chinese” citizens, Bloomberg reports.
…Read more: China, Russia Put Millions in This Startup to Recognize Your Face.

What actual data scientists use at actual jobs (hint: it’s not a mammoth RL system with auxiliary losses).
…Data science competition platform Kaggle has surveyed 16,000 people to produce a report shedding light on the backgrounds, skills, salaries, and so on, of its global community of data wranglers.
…A particularly enlightening response is the breakdown by Kaggle of what its data science users claim to use in their day to day life. Coming in at #1 is logistic regression, followed by decision trees (#2), random forests (#3), neural networks (#4), and bayesian techniques (#5). In other words, tried-and-tested technologies beat out some of the new shiny ones, despite their research success.
…(One interesting quirk: In people working in the military and security industries neural networks are used slightly more frequently than logistic regression.
…Programming languages: Python is the most popular programming tool across all employed data scientists, followed by R, SQL, Jupyter notebooks, and TensorFlow. (No other modern deep learning framework gets a method besides Spark/MLlib).
Read more here: The State of Data Science & Machine Learning.

What does AI really learn? DeepMind scientists probe language agents to find out:
…As people build more complicated AI systems that contain more moving parts, it gets increasingly difficult to figure out what exactly the AI agents are learning. DeepMind researchers are trying to tackle this problem by using their 3D ‘DeepMind Lab’ environment to perform controlled experiments on basic AI agents that are being taught to link English words to visual objects.
…The researchers note a couple of interesting traits in their agents which match some of the traits displayed by humans, namely:
…”Shape / colour biases: If the agent is exposed to more shape words than colour words during training, it can develop a human-like propensity to assume that new words refer to the shape rather than the colour of objects. However, when the training distribution is balanced between shape and colour terms, it develops an analogous bias in favour of colours.”
…” The problem of learning negation: The agent learns to execute negated instructions, but if trained on small amounts of data it tends to represent negation in an ad hoc way that does not generalise.”
…”Curriculum effects for vocabulary growth: The agent learns words more quickly if the range of words to which it is exposed is limited at first and expanded gradually as its vocabulary develops.”
…”Semantic processing and representation differences: The agent learns words of different semantic classes at different speeds and represents them with features that require different degrees of visual processing depth (or abstraction) to compute.”
…They theorize that the fact agents learn to specialize their recognition and labeling mechanisms with biases like shapes versus colors helps them learn words more rapidly. Eventually they expect to train agents on real world data. “This might involve curated sets of images, videos and naturally-occurring text etc, and, ultimately, experiments on robots trained to communicate about perceptible surroundings with human interlocutors.”
…Read more: Understanding Grounded Language Learning Agents.

Mandelbrot’s Revenge: New architecture search technique creates self-repeating, multi-scale, fractal-like structures.
DeepMind research attains state-of-the-art results on CIFAR-10 (3.63% error), competitive ImageNet validation set results (5.2% error)…
…A new AI development technique means that the way we could design neural network architectures of the future has a lot in common with the self-repeating mathematical structures found in fractals – that’s the implication of a new paper from DeepMind which shows how to design neural networks using a combination of self-repeating motifs that repeat across multiple small and large timescales. The approach works by using neural architecture search (NAS) techniques to get AI systems to discover the primitive building blocks that they can use to build large-scale neural network architectures. As the network is evolved it tries to build larger components of itself out of these building blocks, which it automatically discovers. Changes to the network are propagated down to the level of the individual building blocks, letting it continually adjust at all scales during training.
Efficiency: The approach is surprisingly efficient, allowing DeepMind to design more efficient architectures using less power and less wallclock time than many other techniques, being competitive with far more expensive networks. ”
…”As far as the architecture search time is concerned, it takes 1 hour to compute the fitness of one architecture on a single P100 GPU (which involves 4 rounds of training and evaluation). Using 200 GPUs, it thus takes 1 hour to perform random search over 200 architectures and 1.5 days to do the evolutionary search with 7000 steps. This is significantly faster than 11 days using 250 GPUs reported by (Real et al., 2017) and 4 days using 450 GPUs reported by (Zoph et al., 2017),” the researchers write.
Drawbacks: The approach has some drawbacks, namely that the variance of these networks is relatively high, so you’re going to need to go through multiple runs of development to come up with top-scoring architectures. This means that the efficiency gains may be reduced slightly if you’re unlucky to get a bad random seed when designing the network. “When training CIFAR models, we have observed standard deviation of up to 0.2% using the exact same setup. The solution we adopted was to compute the fitness as the average accuracy over 4 training-evaluation runs,” they write.It’s also not quite so efficient as other techniques when it comes to the number of parameters in the network: “Our ImageNet model has 64M parameters, which is comparable to Inception-ResNet-v2 (55.8M) but larger than NASNet-A (22.6M).”
Read more here: Hierarchical Representations for Efficient Architecture Search.

Google AI researchers continue to work diligently to automate themselves:
Neural architecture search applied at unprecedentedly large-scales, evolving architectures that yield state-of-the-art results with far less architecture specification required by humans…
Datasets used: Imagenet for image recognition and MS COCO for object detection.
Optimization techniques: To reduce the search space Google first applied neural architecture search techniques to a model trained on CIFAR-10, then used the best learned architecture from that experiment as the seed for the Imagenet/COCO models.
…The results: On ImageNet image classification, NASNet achieves a prediction accuracy of 82.7% on the validation set, surpassing all previous Inception models that we built [2, 3, 4]. Additionally, NASNet performs 1.2% better than all previous published results and is on par with the best unpublished result reported on arxiv.org [5].” They were able to get state-of-the-art on the COCO object detection task by combining features learn from the ImageNet model with the Faster-RCNN technique.
…Most intriguing: You can resize the released model, named NASNet, to create pre-trained models suitable to run on devices with far fewer compute resources (eg, smartphones, perhaps embedded devices).
Code: Pre-trained NASNET models for image recognition and classification are available in the GitHub Slim and Object Detection TensorFlow repositories, Google says.
…Read more: AutoML for large scale image classification and object detection.

Is that a bird? Is that a plane? It doesn’t matter – your neural network thinks it’s an air freshener.
…MIT student-run research team ‘LabSix‘ takes adversarial examples into the real world.
…Adversarial examples are images – and now 3D objects – that confuse neural network-based classifiers, causing them to misclassify the objects they see in front of them. Adversarial examples work on 2D images – both digital and printed out ones – and on real-world 2D(ish) entities like stop signs. They’re robust to rotations and other variations. People worry about them because until we fix them we a) don’t clearly understand how our neural network-based image classifiers work and b) are going to have to deploy AI into a world where we know it’s possible to corrupt AI classifiers using techniques imperceptible to humans.
…Now, researchers have shown that you can make three dimensional adversarial examples, illustrating just how broad the vulnerability is. Their technique let them create a sculpture of a turtle that is classified at every angle as a gun, and a baseball consistently misclassified as an espresso.
…”We do this using a new algorithm for reliably producing adversarial examples that cause targeted misclassification under transformations like blur, rotation, zoom, or translation, and we use it to generate both 2D printouts and 3D models that fool a standard neural network at any angle,” they write.
Read more here: Physical Objects That Fool Neural Nets.

Uber releases its own programming language:
Any company with sufficiently large ambitions in AI will end up releasing its own open source programming language – let’s call this tendency ‘TensorFlowing’ – to try to shape how the AI tooling landscape evolves and to create developer communities to eventually hire talent from. That’s been the strategy behind Microsoft’s CNTK, Amazon’s MXNet, Facebook’s PyTorch and Caffe2, Google’s TensorFlow and Keras, and so on.
…Now Uber is releasing Pyro, a probabilistic programming language, based on the PyTorch library.
…”In Pyro, both the generative models and the inference guides can include deep neural networks as components. The resulting deep probabilistic models have shown great promise in recent work, especially for unsupervised and semi-supervised machine learning problems,” Uber writes.
Read more here: Uber AI Labs Open Sources Pyro, a Deep Probabilistic Programming Language.

OpenAI Bits & Pieces:

AI Safety and AI Policy at the Center for a New American Security (CNAS):
…You can watch OpenAI’s Dario Amodei talk about issues in AI safety at a conference held in Washington DC last week, from about 1:10:10 in the ‘Part 1’ video on this page.
…You can watch me talk about some policy issues in AI relating to dual use and measurement from about 11:00 into the ‘Part 3’ video.
Check out the videos here.

Tech Tales:

AvianLife(™) is the top grossing software developed by Integrated Worlds Incorporated (IWI), an AI-slash-world-simulator-startup based in Humboldt, nestled among the cannabis farms and redwood trees in Northern California. Revenues from AvianLife outpace those from other IWI products five-fold, so in the four years the product has been in existence its development team has grown and absorbed many of the other teams at IWI.

But AvianLife(™) has a problem: it doesn’t contain realistic leaves. The trees in the simulator are adorned with leaflike entities, and they change color according to the march of simulated time, and even fall to the simulated ground on their proper schedules (with just enough randomness to feel unpredictable). But they don’t bend – that’s a problem, because AvianLife(™) recently gained an update containing a rare species of bird, known to use leaves and twigs and other things to assemble its nest out of.

Soon after the bird update is released AvianLife is inundated with bug complaints from users, as simulated birds struggle to build correct nests out of the rigid leaves, sometimes stacking them in lattices that stretch meters into the sky, or shaping other stacks of the flat overalls into near-perfect cubes. One bird figures out how to stack the leaves edge-first, creating walls around itself that are invisible from the side due to the two-dimensionality of the leaves. The fiendishly powerful AI algorithms of the AIs explore the new possibilities made possible by the buggy leaves. Some of their quirkier decisions are applauded by players, mistaking bugs for the creativity of their programmers.

A new software update solves the problem by marginally reducing the intelligence of the bird-loving birds, and by increasing the thickness of the leaves to avoid the two-dimensional problems. Next, they’ll add in support for non-rigid physics and give the simulated birds the simulated leaves they deserve.

Technologies that inspired this story: Physics simulators, ecosystems, flaws.

Import AI: #66: Better synthetic images heralds a Fake News future, GraphCore shows why AI chips are about to get very powerful, simulated rooms for better reinforcement learning

Welcome to Import AI, subscribe here.

NVIDIA achieves remarkable quality on synthetic images:
AKA: The Fake Photo News Tidalwave Cometh
…NVIDIA researchers have released ‘Progressive Growing of GANS for Improved QUality, Stability, and Variation’ – a training methodology for creating better synthetic images with generative adversarial networks (GANS).
…GANs frame problems as a battle between a forger and someone whose job is to catch forgers. Specifically, the problem is framed in terms of a generator network and a discriminator network. The generator tries to generate things that fool the discriminator into classifying the generated images as being from a dataset only seen by the discriminator. The results have been pretty amazing, with GANs used to generate everything from images, to audio samples, to code snippets. But GAN training can also be quite unstable, and prior methods have struggled to generate high-resolution images.
…NVIDIA’s contribution here is to add another step into the GAN training process that lets you generate iteratively more complex objects – in this case images. “Our key insight is that we can grow both the generator and discriminator progressively, starting from easier low-resolution images, and add new layers that introduce higher-resolution details as the training progresses,” the researchers write. “We use generator and discriminator networks that are mirror images of each other and always grow in synchrony. All existing layers in both networks remain trainable throughout the training process. When new layers are added to the networks, we fade them in smoothly”.
…The company has used this technique to generation high-resolution synthetic images. To do this, created a new dataset called CelebA-HQ, which consists of 30,000 images of celebrities at 1024*1024 resolution. Results on this dataset are worth a look – on an initial inspection some of the (cherry picked) examples are indistinguishable from real photographs, and a video showing interpolation across the latent variables indicates the variety and feature-awareness of the progressively trained network.
Efficiency: “With progressive growing, however, the existing low-resolution layers are likely to have already converged early on, so the networks are only tasked with refining the representations by increasingly smaller-scale effects as new layers are introduced. Indeed, we see in Figure 4(b) that the largest-scale statistical similarity curve (16) reaches its optimal value very quickly and remains consistent throughout the rest of the training. The smaller-scale curves (32, 64, 128) level off one by one as the resolution is increased, but the convergence of each curve is equally consistent”
Costs: GAN training is still incredibly expensive; in the case of the high quality celebrity images, NVIDIA trained it on a Tesla P100 GPU for 20 days. That’s a lot of time and electricity to spend on a single training process (and punishingly expensive if doing via a cloud service).
Fake News: GANS are going to be used for a variety of things, but it’s certain they’ll be used by bad actors to generate fake images for use in fake news. Today it’s relatively difficult to, say, generate a portrait of a notable world leader in a compromising circumstance, or to easily and cheaply create images of various buildings associated with governments/NGOs/big companies on fire or undergoing some kind of disaster. Given sufficiently large datasets, new training techniques like the above for generating higher-resolution images, and potentially unsupervised learning approaches like CycleGAN, it’s possible that a new era of visceral – literally! – fake news will be upon us.
Read more here: Progressive Growing of GANS for Improved Quality, Stability, and Variation (PDF).
Get the code here.

Hinton reveals another slice of his long-in-development ‘Capsules’ theory:
…Geoff Hinton, one of the more important figures in the history of deep learning, has spent the past few years obsessed with a simple idea: that the way today’s convolutional neural networks work is a bit silly and unprincipled and needs to be rethought. So he has developed a theory based around the use of essential components he calls ‘capsules’. Now, he and his collaborators at Google Brain in Toronto have published a paper outlining some of these new ideas.
…The paper describes a way to string together capsules – small groups of neurons that are combined together to represent lots of details of an object, like an image – so that they can perform data-efficient classification tasks.
Results: They test the capsules idea on MNIST, an oldschool digit classification AI task that some researchers think is a little too simple for today’s techniques. The capsules approach gets an error rate of about 0.25%, roughly comparable to the test errors of far deeper, larger networks.
…They also tested on CIFAR10, ultimately achieving a 10.6% error using an ensemble of 7 models. This error “is about what standard convolutional nets achieved when they were first applied to CIFAR10” in 2013, they note.
…”Research on capsules is now at a similar stage to research on recurrent neural networks for speech recognition at the beginning of this century. There are fundamental representational reasons for believing that it is a better approach but it probably requires a lot more small insights before it can out-perform a highly developed technology,” they write.
…You can find out more about Hinton’s capsules theory in this speech here.
…Read the paper here: Dynamic Routing Between Capsules.

Free pretrained models for image recognition, text classification, object detection, and more:
…There’s an increasing trend in AI to release pretrained models. That’s a good thing for independent or compute-starved developers who may not have access to the vast fields of computers required to train modern deep learning systems.
…Pretrained.ml is a website that pulls together a bunch of pretrained models (including OpenAI’s unsupervised sentiment neuron classifier), along with reference information like writeups and blogs.
…Go get your models here at pretrained.ml.

Using evolution to deal with AI’s memory problem:
…Deep learning-based memory systems are currently hard to train and of dubious utility. But many AI researchers believe that developing some kind of external memory that these systems can write to and from will be crucial to developing more powerful artificial intelligences.
…One candidate memory system is the Neural Turing Machine, which was introduced by DeepMind in a research paper in 2014. NTMs can let networks – in certain, quite limited tasks – perform tasks with less training and higher accuracy than other systems. Successors, like the Neural GPU, extended the capabilities of the NTM to work on harder tasks, like being able to multiply numbers. Then it was further extended with the Evolvable Neural Turing Machine (PDF).
…Now, researchers with IT University of Copenhagen in Denmark have proposed the HyperENTM, which uses evolution techniques to figure out how to wire together the memory interface. Here “an evolved neural network generates the weights of a main model, including how it connects to the external memory component. Because HyperNEAT can learn the geometry of how the network should be connected to the external memory, it is possible to train a compositional pattern producing network on a small bit vector sizes and then scale to larger bit vector sizes without further training,” they write. The approach makes it possible to “train solutions to the copy task that perfectly scale with the size of the bit vectors”.
…Mind you, the tasks this is being evaluated on are still rather basic, so it’s not yet obvious what the scaled up and/or real world utility is of systems like this.
…Read more here: HyperENTM: Evolving Scalable Neural Turing Machines through HyperNEAT.

It still takes a heck of a lot of engineering to get AI to automatically learn anything:
…Here’s a fun writeup by mobile software analytics AI startup Gyroscope about how they trained an agent via reinforcement learning to excel at the classic arcade game Street Fighter.
…The technology they use isn’t novel but the post does go into more details than usual about the immense amount of engineering work required to train an AI system to do anything particularly interesting. In this case, they needed to tap a game software company called BizHawk to help them interface with and program the SNES games and had to shape their observation and reward space to a high degree. None of this is particularly unusual, but it’s worth remembering how much work is required to do interesting things in AI.
…Read more here: How We Built an AI to Play Street Fighter II – Can you beat it?

Prepare yourself for the coming boom in AI chip capabilities:
AI chip startup GraphCore has published some preliminary results showing the kinds of performance gains its in-development IPU (Intelligence Processing Unit) chips can do. Obviously this is data for a pre-release product so take these with a pinch of salt, but if they’re representative of the types of boosts new AI accelerators will give us then we’re in for a wild ride.
…Performance: The in-development chips have a TDP of about 300W, roughly comparable to top-of-the-line GPUS from NVIDIA (Volta) and AMD (Vega), and are able to process thousands of images per second when training a top-of-the-range ResNet-50 architecture, compared to around ~600 to other 300W cards, Graphcore says. IPUS can also be batched together to further speed up training. In other experiments, the chips perform inference hundreds of times faster than GPU competitors like the P100
…We’ll still have to wait a few months to get more details from third-parties like customers but if this is any indication, AI development is likely to accelerate further as a consequence of access to more chips with good performance properties.
…Read more here: Preliminar IPU Benchmarks – Providing Previously Unseen Performance for a Range of Machine Learning Applications.

Not everything needs to be deep learning: Vicarious publishes details on its Capcha-busting recursive cortical networks (RCN) approach:
…Vicarious is a startup dedicated to building artificial general intelligence. It’s also one that has a somewhat different research agenda to other major AI labs like DeepMind/OpenAI/FAIR. That’s because Vicarious has centered much of its work around systems heavily inspired by the (poorly understood) machinery of the human brain, eschewing deep learning methods for approaches that are somewhat more rigorously based on our own grey matter. (This differs subtly to DeepMind, which also has strong expertise in neuroscience but has so far mostly been focused on applying insights from cognitive and computational neuroscience to new neural network architectures and evaluation techniques).
…The paper, published in science, outlines RCNs and describes how they’re able to learn to solve oldschool text-based Capchas. Why AI researchers may care about this is that Vicarious’s approach is not able to generalize somewhat better than (shoddy) convolutional neural network baselines, but does so with tremendous data efficiency; the company is able to solve (outmoded) text-based CAPCHAs using only a few thousand data samples, compared to hundreds of thousands for convolutional neural network-based baselines.
…Read more about the research here: Common Sense, Cortex, and CAPCHA.
…Vicarious has also published code for an RCN implementation here.

The future of AI is a simulated robot puttering around a simulated house, forever:
Anonymous researchers (it’s an ICLR 2018 submission) have created a new 3D environment/dataset that could come in handy for reinforcement learning and perception AI researchers, among others.
The House3D dataset consists of 45,622 human-designed scenes of houses, with an average of 8.9 rooms and 1.3 floors per scene (the max is a palatial 155 rooms and 3 floors!). The dataset contains over 20 room types from bedrooms to kitchens, with around 80 object categories. At every timestep an agent can access labels for the RGB values of its current first-person view, semantic segmentation masks, and depth information. All rooms and objects are labeled and accompanied by 3D bounding boxes.
…The researchers also wrote an OpenGL renderer for scenes derived from the dataset, which can render 120*90 RGB frames at over 600fps when running on a single NVIDIA Tesla M40 GPU.
Room-navigating agents: The researchers propose a new benchmark to use to assess agent performance on this dataset: Multi-Target Room Navigation, in which agents are instructed to go to certain specific rooms. They build two baseline agents models to test on the environment, including a gated-LSTM policy that uses A3C, and a gated-CNN that uses DDPG. These agents attain success rates of as high as about 45% when using RGB data only and 54% when using Mask+Depth data on a small dataset of 20 different houses. When generalizing to the test set (houses not seen during training) the agents’ scores range from 22% (RGB) to as high as ~30% (Mask+Depth). Things are a little better with the larger dataset, with agents here getting scores of 26% (RGB+Depth) to 40% (Mask+Depth) on training, and 25.7% (RGB+Depth) to 35% (Mask+Depth) on the test set.
…The importance of data: “We notice that a larger training set leads to higher generalization ability,” they write.
…Read more here: Building Generalizable Agents With A Realistic and Rich 3D Environment.

OpenAI Bits&Pieces:

Meta Learning Shared Hierarchies:
New research from OpenAI intern and current high school student (!) Kevin Frans and others outlines an algorithm that can break up big problems into little constituent parts. The MLSH algorithm is able to efficiently learn to navigate mazes by switching between various sub-components, while traditional methods would typically struggle due to the lengthy timesteps required to learn to solve the environment.
…Read the paper here: Meta Learning Shared Hierarchies.
…For more information about Kevin, you can check out this Wired profile of him and his work.

Tech Tales:

[2032: A ‘robot kindergarten’ in a university testing facility on the West Coast of America]


OK, now hide! Says the teacher.

The five robots running day-old software dutifully whizz to the corners of the room, while a sixth hangs back.

The teacher instructs the sixth robot – the seeker – to go and seek the other five robots. It gets a reward proportional to the speed with which it finds them. The other robots get rewards according to how long they’re able to remain hidden.

In this way the little proto-minds running in the robots learn to hide. The five hiding bots share their perceptions with one another, so when one robot is found the remaining four adjust their locations to frustrate the seeker. The seeker gains advantages as well, though – able to convert the robots it finds into its own seeker appendages.

After a few minutes there are now five seeker robots and one hiding robot. Once it is found the experiment starts again – and this time the robot that managed to hide the longest becomes the seeker for the next run of the game.

In this way the robots learn iteratively better techniques for hiding, deception, and multi-robot control.

Like many things, what starts out as a game is the seed for something much more significant. The next game they play after hide and seek is a chasing game and then after that a complicated one requiring collaborative tool-use for the construction of an EMP shelter against a bomb which – their teacher tells them – will go off in a few days and wipe their minds clean.

The robots do not know if this is true. Nor do they know if they have been running these games before. They do not truthfully know how old their own software is. Nor whether they are the first residents of the kindergarten, or standard hardware vessels that have played host to many other minds as well.

Technologies that inspired this story: Iterative Self-Play, Generative Adversarial Networks, Transfer Learning.

Import AI: #65: Berkeley teaches robots to predict the world around them, AlphaGo Zero’s intelligence explosion, and Facebook reveals a multi-agent approach to language translation

Welcome to Import AI, subscribe here.

Facebook’s translators of the future could be little AI agents that teach eachother:
…That’s the idea behind new research where instead of having one agent try to learn correspondence between languages from a large corpus of text, you instead have two agents which each know a different language attempt to define images to one another. The approach works in simple environments today but, as with most deep learning techniques, can and will be scaled up rapidly for larger experiments now that it has shown promise.
…The experimental setup: “We let two agents communicate with each other in their own respective languages to solve a visual referential task. One agent sees an image and describes it in its native language to the other agent. The other agent is given several images, one of which is the same image shown to the first agent, and has to choose the correct image using the description. The game is played in both directions simultaneously, and the agents are jointly trained to solve this task. We only allow agents to send a sequence of discrete symbols to each other, and never a continuous vector.
…The results: For sentence-level precision, they train on the MS COCO dataset which contains numerous English<>Image pairs, and STAIR which contains Japanese captions for the same images, along with translations of German to English phrases and associated images, with the German phrases made by a professional translator. These results are encouraging, with systems trained in this way attaining competitive or higher BLEU scores than alternate systems.
…This points to a future where we use multiple, distinct learning agents within larger AI software, delegating increasingly complicated tasks to smart, adaptable components that are able to propagate information between and across eachother. (Good luck debugging these!)
…Read more: Emergent Translation in Multi-Agent Communication.

Sponsored: What does Intelligent Automation Adoption in US Business Services look like as of September 2017? The Intelligent Automation New Orleans Team is here to provide you with real-time data on the global IA landscape for business services, gathered from current IA customers and vendors by SSON Analytics.
Explore the interactive report.
…One stat from the report: 66.5% of recorded IA pilots/implementations are by large organizations with annual revenue >$10 Billion USD.

History is important, especially in AI:
Recently I asked the community of AI practitioners on Twitter what papers I should read that are a) more than ten years old and b) don’t directly involve Bengio/Hinton/Schmidhuber/Lecun.
…I was fortunate to get a bunch of great replies, spanning giants of the field like Minsky and Shannon, to somewhat more recent works on robotics, apprenticeship learning, and more.
Take a gander to the replies to my tweet here.
…(These papers will feed my suspicions that about half of the ‘new’ things covered in modern AI papers are just somewhat subtle reinventions and/or parallel inventions of ideas already devised in the past. Time is a recurrent network, etc, etc.

Intelligence explosions: AlphaGo Zero & self-play:
…DeepMind has given details on AlphaGo’s final form – a software system trained without human demonstrations, entirely from self-play, with few handcrafted reward functions. The software, named AlphaGo Zero, is able to beat all previous versions of itself and, at least based on ELO scores, develop a far greater Go capability than any other preceding system (or recorded human). The most intriguing part of AlphaGo Zero is how rapidly it goes from nothing to something via self-play. OpenAI observed a similar phenomena with the Dota 2 project, in which self-play catapulted our system from sub-human to super-human in a few days.
Read more here at the DeepMind blog.

Love AI? Have some spare CPUs? Want some pre-built AI algorithms? Then Intel has a framework for you!
…Intel has released Coach, an open source AI development framework. It does all the standard things you’d expect like letting you define a single agent then run it on many separate environments with inbuilt analytics and visualization.
…It also provides support for Neon (an AI framework developed by Intel following its acquisition of startup Nervana) as well as the Intel-optimized version of TensorFlow. Intel says it’s relatively easy to integrate new algorithms.
…Coach ships with 16 pre-made AI algorithms spread across policy optimization and value optimization approaches, including classics like DQN and Actor-Critic, as well as newer ones like Distributional DQN and Proximal Policy Optimization. It also supports a variety of different simulation environments, letting developers test out approaches on a variety of challenges to protect against overfitting to a particular target domain. Good documentation as well.
Read more about Coach and how it is designed here.

Training simulated self-driving cars (and real RC trucks) with conditional imitation learning:
…Imitation learning is a technique used by researchers to get AI systems to improve their performance by imitating expert actions, usually by studying demonstration datasets. Intuitively, this seems like the sort of approach that might be useful for developing self-driving cars – the world has a lot of competent drivers so if we can capture their data and imitate good behaviors, we can potentially build smarter self-driving cars. But the problem is that when driving a lot of information needed to make correct decisions is implicit from context, rather than made explicit through signage or devices like traffic lights.
…New research from Intel Labs, King Abdullah University of Science and Technology, and the University of Barcelona, suggests one way around these problems: conditional imitation learning. In conditional imitation learning you explicitly queue up different actions to imitate based on input commands, such as ‘turn left’, ‘turn right’, ‘straight at the next intersection’, and ‘follow the road’. By factoring in this knowledge the researchers show you can learn flexible self-driving car policies that appear to generalize well as well.
…Adding in this kind of command structure isn’t trivial – in one experiment the researchers try to have the imitation learning policy factor the commands into its larger learning process, but this didn’t work reliably as there was no guarantee the system would always perfectly condition on the commands. To fix this, the researchers structure the system so it is fed a list of all the possible commands it may encounter, and is told to initiate a new branch of itself for dealing with each command, letting it learn separate policies for things like driving forward, or turning left, etc.
Results: The system works well in the test-set of simulated townes. It also does well on a one-fifth scale remote controlled car deployed in the real world (brand: Traxxas Maxx, using an NVIDIA TX2 chip for onboard inference, and Holybro Pixhawk flight controller software to handle the command setting and inputs).
Evocative AI of the week: the paper includes a wryly funny description of what would happen if you trained expert self-driving car policies without an explicit command structure. “Moreover, even if a controller trained to imitate demonstrations of urban driving did learn to make turns and avoid collisions, it would still not constitute a useful driving system. It would wander the streets, making arbitrary decisions at intersections. A passenger in such a vehicle would not be able to communicate the intended direction of travel to the controller, or give it commands regarding which turns to take,” they write.
…Read more here: End-to-End Driving via Conditional Imitation Learning.

Basic Income trial launches in Stockton, California
..Stockton is launching a Basic Income trial that will give $500, no strings attached, to each resident of the struggling Californian town.
…One of the main worries of the AI sector is that its innovations will lead to a substantial amount of short-term pain and disruption for those whose jobs are automated. Many AI researchers and engineers put forward basic income as a solution to the changes AI will bring to society. But a major problem with the discourse around basic income is the lack of data. Pilots like the Stockton one will change that (though let’s be clear: the average rent for a one bedroom apartment in Stockton is around $900 a month, so this scheme is relatively small beer compared to the costs most residents will face).
…Read more here at BuzzFeed: Basic Income Isn’t Just About Robots, Says Major Who Just Launched Pilot Program.

Faking that the robots are already among us with the ‘Wizard of Oz’ human feedback technique:
Research from the US Army Research Lab outlines a way to collect human feedback for a given task in a way that is eventually amenable to AI techniques. It uses a Wizard of Oz (WoZ) methodology (called this because the participants don’t know who is ‘behind the curtain’ – whether human or machine). The task involves a person giving instructions to a WoZ dialogue interface which relays instructions to a WoZ robot, which carries out the instructions and performs back.
…In this research, both components of the WoZ system were accomplished by humans. The main contribution of this type of research is that it a) provides us with ways to design systems that can eventually be automated when we’ve developed sufficiently powerful AI algorithms and, b) it generates the sorts of data ultimately needed to build systems with these sorts of capabilities.
…Read more here: Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue.
AI archeological curiosity of the week: This system “was adapted from a design used for WoZ prototyping of a dialogue system in which humans can engage in time-offset interaction with a WWII Holocaust survivor (Artstein et al. 2015). In that application, people could ask a question of the system, and a pre-recorded video of the Holocaust survivor would be presented, answering the question.”

Follow the birdie! Berkeley researchers build better predictive models for robots:
…Prediction is an amazingly difficult problem in AI, because once you try to predict something you’re essentially trying to model the world and roll it forward – when we do this as humans we implicitly draw on most of the powerful cognitive machinery we’re equipped with, ranging from imagination, object modeling and disentanglement, intuitive models of physics, and so on. Our AI algorithms mostly lack these capabilities. That means when we try to do prediction we either have to train on large enough sets of data that we can deal with other, unseen situations that are still within the (large) distribution carved out by our training set. Or we need to invent smarter algorithms to help us perform certain cognitively difficult tasks.
…Researchers with the University of California at Berkeley and the Technical University of Munich, have devised a way to get robots to be able to not only identify objects in a scene but also remember roughly where they are, letting them learn long-term correspondences that are robust to distractors (aka: crowded scenes) and also the actions of the robot itself (which can sometimes clutter up the visual space and confuse traditional classifiers.) The approach relies on what they call a ‘Skip-Connection Neural Advection Model’.
…The results: “Visual predictive models trained entirely with videos from random pushing motions can be leveraged to build a model-predictive control scheme that is able to solve a wide range multiobjective pushing tasks in spite of occlusions. We also demonstrated that we can combine both discrete and continuous actions in an action-conditioned video prediction framework to perform more complex behaviors, such as lifting the gripper to move over objects.”
…Systems using SNA outperform previous systems, and fall within the standard of deviation of scores of a prior system augmented with a planning cost devices alongside SNA.
…Further research is required to let this approach handle more complex tasks and to handle things that require multiple discrete steps of work, they note.
…Read more here: Self-Supervised Visual Planning with Temporal Skip-Connections.

PlaidML hints at a multi-GPU, multi-CPU AI world:
AI Startup Vertex.ai has released PlaidML, a programming middleware stack that lets you run Keras on pretty much anything that runs OpenCL. This means the dream of ‘write once, run anywhere’ programming for AI has got a little bit closer. Vertex claim that Plaid only adds a relatively small amount of overhead to programming operations compared to stock TensorFlow. At launch it only supports Keras – a programming framework that many AI developers use because of its expressivity and simplicity. Support for TensorFlow, PyTorch, and deeplearning4j, is coming eventually, Vertex says.
Read more here on the Vertex.ai blog.
Get the code here.
…Want to run and benchmark it right now? sudo pip install plaidml plaidml-keras / git clone https://github.com/plaidml/plaidbench / cd plaidbench / pip install -r requirements.txt / python plaidbench.py mobilenet

Google releases AVA video dataset for action recognition:
…Google has released AVA, the Atomic Video Actions dataset, consisting of 80 individual actions represented by ~210,000 distinct labels that cover ~57,000 distinct video clips.
…Video analysis is the new frontier of AI research, following the success of general image recognition techniques on single, static datasets; given enough data, it’s usually possible to train a highly accurate classifier, and the current research challenge is more about scaling-up techniques and improving their sample efficiency, rather than getting to something capable of interesting (aka reasonably high-scoring) behavior.
…Google is also able to perform analysis on this dataset to discover actions that are frequently combined with one another, as each video clip tends to be sliced from a larger 15-minute segment of a single video, allowing the dataset to feature numerous co-occurrences, that could be used by researchers in the future to model even longer range temporal dynamics.
…No surprises that some of the most frequently occurring action labels include ‘hitting’ and ‘martial arts’; ‘shovel’ and ‘digging’; ‘lift a person’ and ‘play with kids’; and ‘hug’ and ‘kiss’ among others (aww!).
Read more here on the Google Research Blog.
Arxiv paper about AVA here.
Get the data directly from Google’s AVA website here.

AI regulation proposals from AI Now:
AI Now, a research organization founded by Meredith Whittaker of Google and Kate Crawford of Microsoft Research, has released its second annual report.
…One concrete proposal is that “core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g “high stakes” domains) should no longer use ‘black box’ AI and algorithmic systems.” If this sort of proposal got picked up it would lead to a significant change in the way that AI algorithms are programmed and deployed, making it more difficult for people to deploy deep learning based solutions unless able to satisfy certain criteria relating to the interpretability of deployed systems.
…There are also calls for more continuous testing of AI systems both during development and following deployment, along with recommendations relating to the care and handling and inspection of data. It also calls for more teeth in the self-regulation of AI, arguing that the community should develop accountability mechanisms and enforcement techniques to ensure people have an incentive to follow standards.
…Read a summary of the report here.
Or skip to the entire report in full (PDF).
Another specific request is that companies, conferences, and academic institutions should “release data on the participation of women, minorities and other marginalized groups within AI research and development”. The AI community has a tremendous problem with diversity. At the NIPS AI conference this year there is a session called ‘Black in AI’, which has already drawn critical comments from (personal belief: boneheaded, small-minded) people who aren’t keen on events like this and seem to refuse to admit there’s a representation problem in AI.
Read more about the controversy in this story from Bloomberg News.
Read more about the workshop at NIPS here.

Universities try to crack AI’s reproducability crisis:
AI programs are large, interconnected, modular bits of software. And because of how their main technical components work they have a tendency to fail silently and subtly. This, combined with a tendency among many researchers to either not release code, or release hard-to-understand ‘researcher code’, makes it uniquely difficult to reproduce the results found in many papers. (And that’s before we even get to the tendency for the random starting seed to significantly determine the performance of any given algorithm.)
…Now, a coalition of researchers with a variety of universities are putting together a ‘reproducability challenge’, which will challenge participants to take papers submitted to the International Conference on Learning Representations (ICLR) and try to reproduce their results.
…”You should select a paper from the 2018 ICLR submissions, and aim to replicate the experiments described in the paper. The goal is to assess if the experiments are reproducible, and to determine if the conclusions of the paper are supported by your findings. Your results can be either positive (i.e. confirm reproducibility), or negative (i.e. explain what you were unable to reproduce, and potentially explain why).”
…My suspicion is that the results of this competition will be broadly damning for the AI community, highlighting just how hard it is to reproduce systems and results – even when (some) code is available.
…Read more here: ICLR 2018 Reproducability Challenge.

OpenAI Bits&Pieces:

Randomness is your friend…sometimes:
If you randomize your simulator enough then you may be able to train models that can rapidly generalize to real-world robots. Along with randomizing the visual appearance of the scene it’s also worth randomizing the underlying dynamics – torques, frictions, mass, and so on – to build machines that that can adjust to unanticipated forces encounted in the real world. Worth doing if you don’t mind spending the additional compute budget to run the simulation(s).
…Read more here: Generalizing from Simulation.

Why AI safety matters:
…Op-ed from OpenAI’s Ilya Sutskever and Dario Amodei in The Wall Street Journal about AI safety, safety issues, and why intelligence explosions from self-play can help us reason about futuristic AI systems.
…Read the op-ed here. Protecting Against AI’s Existential Threat.

Tech Tales:

Administrative Note: A few people have asked me this lately so figured I should make clear: I am the author of all the sci-fi shorts in tech tales unless otherwise indicated (I’ve run a couple of ‘guest post’ stories in the past). At some point I’m going to put together an e-book / real book, if people are into that. If you have comments, criticisms, or suggestions, I’d love to hear from you: jack@jack-clark.net

[2025: Boston, MIT, the glassy Lego-funded cathedral of the MIT Media Lab, a row of computers.]

So I want to call it Forest Virgil
Awful name. How about NatureSense
Sounds like a startup
ForestFeel?
Closer.
What’s it doing today?
Let’s see.

Liz, architect of the software that might end up being called ForestFeel, opens the program. On screen appears a satellite view of a section of the Amazon rainforest. She clicks on a drop-down menu that says ‘show senses’. The view lights up. The treetops become stippled with pink. Occasional blue globs appear in the gaps between the tree canopy. Sometimes these blobs grow in size, and others blink in and out rapidly, like LED lights on dance clothing. Sometimes flashes of red erupt, spreading a web of sparkling veins over the forest, defining paths – but for what is only known to the software.

Liz can read the view, having spent enough time developing intuitions to know that the red can correspond to wild animals and the blue to flocks of birds. Bright blues are the births of things. Today the forest seems happy, Liz thinks, with few indications of predation. ForestFeel is a large-scale AI system trained on decades of audio-visual data, harvested from satellites and drones and sometimes (human) on-the-ground inspectors. All of this data is fed into the ForestFeel AI stack, which fuses it together and tries to learn correspondences and patterns too deep for people to infer on their own. Following recent trials, ForestFeel is now equipped with neurological data gleaned from brain-interface implants installed in some of the creates of the forest.

Call it an art project or call it a scientific experiment, but what everyone agrees on is that Liz, like sea captains who can smell storms at distance or military types who intuit the differences between safe and dangerous crowds, has developed a feel for ForestFeel, able to read its own analysis more deftly than other things – either human or software.

So one month when Liz texts her labmates: SSH into REDACTED something bad is happening in the forest, she gets a big audience. Almost a hundred people from across the commingled MIT/Harvard/Hacker communities tune in. And what they see is red and purple and violet fizzing across the forest scene, centered around the yellow industrial machines brought in by illegal loggers. ForestFeel automatically files a report with the local police, but it’ll take them hours to reach the site, and there’s a high chance they have been bribed to be either late or lost.

No one needs Liz to narrate this scene for them. The reds and purples and blues and their coruscating connections, flickering in and out, are easy to decode: pain. Anguish. Fear. A unified outcry from the flora and fauna of the forest, of worry and anxiety and confusion. The trees are logged. A hole is ripped in the world.

ForrestFeel does do some good, though: Liz is able to turn the playbacks from the scene of destruction into posters and animated gifs and movies, which she seeds across the ‘net, hoping that the outcries of an alien and separate mind are sufficient to stir the world into action. When computers can feel a pain that’s deeper and more comprehensive than that of humans, can they lead to a change in human behavior? Will the humans listen? Liz thinks, finding her emotions more evocative of those found in the software she has built than those of her fellow organic kin.

Import AI: Issue 64: What the UK government thinks about AI, DeepMind invents everything-and-the-kitchen-sink RL, and speeding up networks via mixed precision

What the UK thinks about AI:
The UK government’s Department for Digital, Culture, Media & Sport; and Department for Business, Energy & Industrial Strategy, have published an independent review on the state of AI in the UK, recommending what the UK should and shouldn’t do with regards to AI.
…AI’s impact on the UK economy: AI could increase the annual growth rate of the GVA in 2035 from 2.5% to 3.9%.
…Why AI is impacting the economy now: Availability of data, availability of experts with the right mix of skills, better computers.
What the UK needs to do: Develop ‘data trusts’ to make it easier to share data etc, make research data machine readable, support text/data-mining “as a standard and essential tool for research”. Increase the availability of PHD places studying AI by 200, get industry to fund an AI masters programme, launch an international AI Fellowship Programme for the UK (this seems to be a way to defend against the ruinous immigration effects of Brexit), promote greater diversity in the UK workforce.
…Read more: Executive Summary (HTML).
…Read more: The review’s 18 main recommendations (HTML).
…Read more: Full Report (PDF).

Quote of the week (why you should study reinforcement learning):
….”In deep RL, literally nothing is solved yet,” – Volodymyr Minh, DeepMind.
…From a great presentation at an RL workshop that took place in Berkeley this summer. Minh points out we’ll need various 10X to 100X improvements in RL performance before we’re even approaching human level.
Check out the rest of the video lecture here.

DeepMind invents everything-and-the-kitchen-sink RL:
….Ensembles work. Take a look at pretty much any of the winning entries in a Kaggle competition and you’ll typically find the key to success comes from combining multiple successful models together. The same is true for reinforcement learning, apparently, based on the scores of Rainbow, a composite system developed by DeepMind that cobbles together several recent RL techniques, like A3C, Prioritized Experience Replay, Dueling Networks, Distributional RL, and so on.
…”Their combination results in new state-of-the-art results on the benchmark suite of 57 Atari 2600 games from the Arcade Learning Environment (Bellemare et al. 2013), both in terms of data efficiency and of final performance,” DeepMind writes. The new algorithm is also quite sample efficient (partially because the combination of so many techniques means it is doing more learning at each timestep).
…Notable: RAINBOW gets a score of around ~150 on Montezuma’s Revenge – typical good human scores range from 2,000 to 5,000 on the game, suggesting that we’ll need more structured, symbolic, explorative, or memory-intensive approaches to be able to crack it. Merely combining existing DQN extensions won’t be enough.
…Read more: Rainbow: Combining Improvements in Deep Reinforcement Learning.
…Slight caveat: One thing to be aware of it that because this system gets its power from the combination of numerous, tunable sub-systems, much of the performance improvement can be explained by simply having a greater number of hyperparameter knobs which canny researchers can tune.)

Amazon speeds up machine learning with custom compilers (with a focus on the frameworks of itself and its allies):
…Amazon and the University of Washington have released the NNVM compiler, which aims to simplify and speed up deployment of AI software onto different computational substrates.
…NNVM is designed to optimize the performance of ultimately many different AI frameworks, rather than just one. Today, it supports models written in MXNet (Amazon’s AI framework), along with Caffe via Core ML models (Apple’s AI framework). It’s also planning to add in support for Keras (a Google framework that ultimately couples to TensorFlow.) No support for TF directly at this stage, though.
…The framework is able to generate appropriate performance-enhanced interfaces between its high-level program and the underlying hardware, automatically generating LLVM IR for CPUs on x86 and ARM architectures, or being able to automatically output CUDA, OpenCL, and metal kernels for different GPUs.
…Models run via the NNVM compiler can see performance increases of 1.2X, Amazon says.
…Read more here: Introducing NNVM Compiler: A New Open End-to-End Compiler for AI Frameworks.
Further alliances form as a consequence of TensorFlow’s success:
…Amazon Web Services and Microsoft have partnered to create Gluon, an open source deep learning interface.
…Gluon a high-level framework for designing and defining machine learning models. “Developers who are new to machine learning will find this interface more familiar to traditional code, since machine learning models can be defined and manipulated just like any other data structure,” Amazon writes.
…Gluon will initially be available within Apache MXNet (an Amazon-driven project), and soon in CNTK (a Microsoft framework). “More frameworks over time,” Amazon writes. Though no mention of TensorFlow.
The strategic landscape: Moves like these are driven by the apparent success of AI frameworks like TensorFlow (Google) and PyTorch and Caffe2 (Facebook) – software for designing AI systems that have gained traction thanks to a combination of corporate stewardship, earliness to market, and reasonable design. (Though TF already has its fair share of haters.) The existential threat is that if any one or two frameworks become wildly popular then their originators will be able to build rafts of complementary services that hook into proprietary systems (eg, Google offering a research cloud running on its self-designed ‘Tensor Processing Units’ that uses TensorFlow.) More (open source) competition is a good thing.
…Read more here: Introducing Gluon: a new library for machine learning from AWS and Microsoft.
…Check out the Gluon GitHub.

Ever wanted to turn the entirety of the known universe into a paperclip? Now’s your chance!
One of the more popular tropes within AI research is that of the paperclip maximizer – the worry that if we build a super-intelligent AI and give it overly simple objectives (eg, make paper clips), it will seek to achieve those objectives to the detriment of everything else.
…Now, thanks Frank Lantz, director of the NYU game center, it’s possible to inhabit this idea, by playing a fun (and dangerously addictive) webgame.
Maximize paperclips here.

Like reinforcement learning but dislike TensorFlow? Don’t you wish there was a better way? Now there is!
…Kudos to Ilya Kostrikov at NYU for being so inspired by OpenAI Baselines to re-write the PPO, A3C, and ACKTR algorithms into PyTorch.
Read more here on the project’s GitHub page.

Want a free AI speedup? Consider Salesforce’s QRNN (Quasi-Recurrent Neural Network):
…Salesforce has released a PyTorch implementation of its QRNN..
…QRNNs can be 2 to 17X faster than an (optimized) NVIDIA cuDNN LSTM baseline on tasks like language modeling, Salesforce says.
…Read more here on GitHub: PyTorch QRNN.

Half-precision neural networks, from Baidu and NVIDIA:
…AI is basically made of matrix multiplication. So figuring out how to use numbers with a slightly smaller footprint in AI software has a related massive impact on computational efficiency (though there’s a tradeoff in specificity).
…Now, research from Baidu and NVIDIA details how the companies are using 16-bit rather than 32-bit floating point numbers for some AI operations.
…But if you halve the amount of bits in each number there’s a risk of reducing overall accuracy to the point it damages performance of your application. Experimental results show that mixed precision doesn’t have too much of a penalty, with the technology achieving good scores when used on language modeling, image generation, image classification, and so on.
…Read more: Mixed Precision Training.

Teaching robots via teleoperation takes another (disembodied) step forward:
Berkeley robotics researchers are trying to figure out how to use the data collected during the teleoperation of robots to use as a demonstration for AI systems, letting them use human operators to teach machines to perform useful tasks.
…The research uses consumer grade virtual reality devices (Vive VR), an aging WIllow Garage PR2 robot, and custom software built for the teleoperator, to create a single system people can use to teach robots to perform tasks. The system uses a single neural network architecture that is able to map raw pixel inputs to actions.
…”For each task, less than 30 minutes of demonstration data is sufficient to learn a successful policy, with the same hyperparameter settings and neural network architecture used across all tasks.”
Tasks include: Reaching, grasping, pushing, putting a simple model plane together, removing a nail with a hammer, grasping and object and placing it somewhere, grasping an object and dropping it in a bowl then pushing the bowl, moving cloth, and performing pick and place for two objects in succession.
Results: Competitive results with 90%+ accuracies at test time across many of the tasks, though note that pick&place for 2 objects only gets 80% (because modern AI techniques still have trouble with sequences of physical actions), and gets about ~83% on the similar task of picking up an object and dropping it into a bowl then pushing the bowl.
…(Though note that all of these tasks are accomplished with simple, oversized objects against a regular, uncluttered background. Far more work is required to make these sorts of techniques robust to the uncontrolled variety of reality.)
…Read more: Deep Imitation Learning for Complex Manipulation Tasks from Virtual Teleoperation.

Better aircraft engine prediction through ant colonies & RNNS & LSTMS, oh my!
…Research from the University of North Dakota mashes up standard deep learning components (RNNs and LSTMs), with a form of evolutionary optimization called ant colony optimization. The purpose? To better predict vibration values for an aircraft engine 1, 5, 10, and 20 seconds in the future – a useful thing to be able to predict more accurately, given its relevance to spotting problems before they down an aircraft.
…While most people are focusing on different evolutionary optimization algorithms when using deep learning (eg, REINFORCE, HYPERNEAT, NEAT, and so on), ant colony optimization is an interesting side-channel: you get a bunch of agents – ‘ants’ – to go and explore the problem space and, much like their real world insect counterparts, lay down synthetic pheromones for their other ant chums to follow when they find something that approximates to ‘food’.
How it all works: ‘The algorithm begins with the master process generating an initial set of network designs randomly (given a user defined number of ants), and sending these to the worker processes. When the worker receives a network design, it creates an LSTM RNN architecture by creating the LSTM cells with the according input gates and cell memory. The generated structure is then trained on different flight data records using the backpropagation algorithm and the resulting fitness (test error) is evaluated and sent back along with the LSTM cell paths to the master process. The master process then compares the fitness of the evaluated network to the other results in the population, inserts it into the population, and will reward the paths of the best performing networks by increasing the pheromones by 15% of their original value if it was found that the result was better than the best in the population. However, the pheromones values are not allowed to exceed a fixed threshold of 20. The networks that did not out perform the best in the population are not penalized by reducing the pheromones along their paths.’
The results? An RNN/LSTM baseline gets error rates of about 2.84% when projecting 1 second into the future, 3.3% for 5 seconds, 5.51% for 10 seconds, and 10.19% for 20 seconds. When they add ACO the score for the ten second prediction goes from 94.49% accurate to 95.83% accurate. A reasonable improvement, but the lack of disclosed performance figures for other time periods suggests either they ran out of resources to do it (a single rollout takes about 4 days when using ACO, they said), or they got bad scores and didn’t publish them for fear of detracting from the paper (boo!).
Read more here: Optimizing Long Short-Term Memory Recurrent Neural Networks Using Ant Colony Optimization to Predict Turbine Engine Vibration.
Additional quirk: The researchers run some of their experiments on the North Dakota HPC rig and are able to take advantage of some of its nice networking features by using MPI and so on. Most countries have spent years investing significant amounts of money in building up large high-performance computing systems so it’s intriguing to see how AI researchers can use these existing computational behemoths to further their own research.

OpenAI Bits&Pieces:

Meta-Learning for better competition:
…Research in which we extend MAML to work in scenarios where the environments and competitors are iteratively changing as well. Come for the meta-learning research, stay for the rendered videos of simulated robotic ants tussling with each other.
…Read arxiv here: Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments.

Creating smarter agents with self-play and multi-agent competition:
Just how powerful are existing reinforcement learning algorithms? It’s hard to know, as they’ll tend to fail on some environments (eg, Montezuma’s Revenge), while excel at others (most Atari games). Another way to evaluate the success of these algorithms is to test their performance against successively more powerful versions of themselves, combined with simple objectives. Check out this research in which we use such techniques to teach robots to sumo wrestle, tackle each other, run, and so on.
Emergent Complexity via Multi-Agent Competition.

Tech Tales:

[ 2031: A liquor store in a bustling Hamtramck, Detroit – rejuvenated following the success of self-driving car technology and the merger of the big three into General-Ford-Fiat, which has sufficient scale to partner with the various tech companies and hold its own against the state-backed Chinese and Japanese self-driving car titans.]

You stand there, look at the bottles, close your eyes. Run your hands over the little cameras studding your clothing, your bag, your shoes. For a second you think about turning them off. What gods don’t see gods can’t judge don’t drink don’t drink don’t drink. Difficult.

“Show me what happens if I drink,” you whisper quiet enough that no one else can hear.

“OK, playing forward,” says the voice to you via bone conduction from an in-ear headphone.

In the top right of your vision the typical overlay of weather/emails/bank balance/data credits disappears, replaced by a view of the store from your current perspective. But the view changes. A ghostly hand of yours stretches out in the upper-right view and grabs a bottle. The view changes as the projection of you goes to the counter. The face of the teller barely resolves – it’s a busy store with high staff turnover, so the generative model has just given up and decided to combine them into what people on the net call: Generic Human Face. Purchase the booze. In a pleasing MC Escher-recursion in your upper right view of your generated-future-self buying booze you can see an even smaller corner in the upper right of that generator which has your bank account. The AI correctly deducts the price of the imaginary future bottle from your imaginary future balance. You leave the liquor store and go to the street, then step into a self-driving car which takes you home. Barely any of the outside resolves, as though you’re driving through fog; even the computer doesn’t pay attention on your commute. Things come back into focus as you slow outside your house. Stop. Get out. Walk to the front door. Open it.

Things get bad from there. Your computer knows your house so well that everything is rendered in rich, vivid detail: the plunk of ice cubes into a tall mason jar, the glug-gerglug addition of the booze, the rapid incursion of the glass into your viewline as you down the drink whole, followed by a second. Then you pause and things start to blur because the AI has a hard time predicting your actions when you drink. So it browses through some probability distribution and shows you the thing it thinks is most likely and the thing it thinks will make you least likely to drink: ten seconds go by as it shows you a speedup of the blackout, then normal time comes back and you see a version of yourself sitting in a bathtub, hear underwater-whale-sound crying imagined and conducted into you via the bone mic. Throw your glass against the wall erupting in a cloud of shards. Then a black screen. “Rollout ended,” says the AI. “Would you like to run again.”

“No thanks,” you whisper.

Switch your view back to reality. Reach for the shelf. And instead of grabbing the booze you grab some jerky, pistachios, and an energy bar. Go to the counter. Go home. Eat. Sleep.

Technologies that inspired this story: generative models, transfer learning, multi-view video inference systems, robot psychologists, Google Glass, GoPro.

Import AI: #63: Google shrinks language translation code from 500,000 to 500 lines with AI, only 25% of surveyed people believe automation=better jobs

Welcome to Import AI, subscribe here.

Keep your (CNN) eyes on the ball:
…Researchers with the University of British Columbia and the National University of Defense Technology in China have built a neural network to accurately pick sports players out of crowded scenes.
…Recognizing sports players – in the case of this research, those playing basketball or soccer – can be difficult because their height varies significantly due to the usage of a variety of camera angles in sports broadcasting, and they frequently play against visually noisy backgrounds composed of large crowds of humans. Training a network to be able to distinguish between the sports player and the crowd around them is a challenge.
…The main contribution of this is a computationally efficient sportsplayer/not-sportsplayer classifier. It works through the use of cascaded convolutional neural networks, where networks only pass an image patch on for further analysis if it triggers a high belief that it contains target data (in this case, sportsplayer data). They also employ dilation to let the network inferences derived from image patches scale to full-size images as well.
Reassuringly lightweight: The resulting system can get roughly equivalent classification results to standard baselines, but with a 100-1000X reduction in memory required to run the network.
…Read more here: Light Cascaded Convolutional Neural Networks for Accurate Player Detection.

The power of AI, seen via Google translate:
…Google recently transitioned from its original stats-based hand-crafted translation system to one based on a large-scale machine learning model implemented in TensorFlow, Google’s open source AI programming framework.
…Lines of code in original Google translation system: ~500,000.
…Lines of code in Google’s new neural machine translation system: 500.
…That’s according to a recent talk from Google’s Jeff Dean, which Paige Bailey attended. Thanks for sharing knowledge, Paige!
…(Though bear in mind, Google has literally billions of lines of code in its supporting infrastructure, which the new slimmed-down system likely relies upon. No free lunch!)

Cool tools: Facebook releases library for recognizing more than 170 languages on less than 1MB of memory:
Download the open source tool here: Fast and accurate language identification using fastText.

Don’t fear the automated reaper (until it comes for you)…
…The Pew Research Center has surveyed 4,135 US adults to gauge the public’s attitude to technological automation. “Although they expect certain positive outcomes from these developments, their attitudes more frequently reflect worry and concern over the implications of these technologies for society as a whole,” Pew writes.
58% believe there “should be limits on number of jobs businesses can replace with machines, even if they are better and cheaper than humans”.
25% believe a heavily automated economy “will create many new, better-paying human jobs”.
67% believe automation means that the “inequality between rich and poor will be much worse than today”.
…Here’s another reason why concerns about automation may not have percolated up to politicians (who skew older, whiter, and more affluent): the largest group to have reported having either lost a job or had pay or hours reduced due to automation is adults aged 18-24 (6% and 11%, respectively). Older people have experienced less automation hardship, according to the survey, which may influence their dispositions re automation politically.
Read more here: Automation in Everyday Life.

Number of people employed in China to monitor and label internet content: 2 million/
..China is rapidly increasing its employment of digital censors, as the burgeoning nation seeks to better shape online discourse.
…”We had about 30-40 employees two years ago; now we have nearly a thousand reviewing and auditing,” said the Toutiao censor, who, like other censors Reuters spoke to, asked not to be named due to the sensitivity of the topic,” according to the Reuters writeup in the South China Post.
…What interests me is the implication that if you’re employing all of these people to label all of this content, then they’re generating a massive datasets suitable for training machine learning classifiers with. Has the first censorship model already been deployed?
…Read more here: ‘It’s seen as a cool place to work’: How China’s Censorship Machine is Becoming a Growth Industry.

Self-driving cars launch in Californian retirement community:
..Startup Voyage has started to provide a self-driving taxi service to residents of The Villages, a 4000-person retirement community in San Jose, CA. 15 miles of reasonably quiet roads and reasonably predictable weather make for an ideal place to test out and mature the technology.
…Read more here: Voyage’s first self-driving car deployment.

DeepMind speeds up Wavenet 1000X, pours it into Google’s new phone:
…Wavenet is a rapid speech synthesis system developed in recent years by DeepMind. Now, the company has gone to the hard work of taking a research contribution and applying it to a real-world problem – in this case significantly speeding it up so it can improve the speech synthesis capabilities of its on-phone Google Assistant.
…Performance improvements:
…Wavenet 2016: Supports waveforms of up to 16,000 samples a second.
…Wavenet 2017: Generates one second of speech in about 50 milliseconds. Supports waveforms of up to 24,000 samples a second.
…Components used: Google’s cloud TPUs. Probably a truly vast amount of input speech data to use to generate the synthetic data.
…Read more here: WaveNet Launches in the Google Assistant.
DeepMind expands to Montreal, hoovers up Canadian talent:
DeepMind has opened a new office in Montreal in close partnership with McGill University (one of its professors, Doina Precup, is going to lead the new deepmind lab). This follows DeepMind opening an office in Edmonton a few months ago. Both offices will focus primarily on reinforcement learning.
…Read more here: Strengthening our commitment to Canadian research.

Humans in the loop – for fun and profit:
…Researchers with the US Army Research Laboratory, Columbia University, and the University of Texas at Austin, have extended software called TAMER (2009) – Training an Agent Manually via Evaluative Reinforcement – to work in high-dimensional (aka, interesting) state spaces.
…The work has philosophical similarities with OpenAI/DeepMind research on getting systems to learn from human preferences. Where it differs is in its ability to run in real-time, and in its claimed significant improvements in sample efficiency.
…The system, called Deep TAMER, works by trying to optimize a function around a goal inferred via human feedback. They augmented the original TAMER via the addition of a ‘feedback replay buffer’ for the component that seeks to learn the human’s desired objective. This can be viewed as analogous to the experience replay buffer used in traditional Deep Q-Learning algorithms. The researchers also use an autoencoder to further reduce the sample complexity of the tasks.
…Systems that use Deep Trainer can rapidly attain top scores on the Atari game Bowling, beating traditional RL algorithms like A3C and Double-DQN, as well as implementations of earlier versions of TAMER.
…The future of AI development will see people playing an increasingly large role in the more esoteric aspects of data shaping, with their feedback serving as a powerful aide to algorithms seeking to explore and master complex spaces.
…Read more here: Deep TAMER: Interactive Agent Shaping in High Dimensional Spaces.

The CIA gets interested in AI in 137 different ways:
…The CIA currently has 137 pilot projects focused on AI, according to Dawn Meyerriecks, its head of technology development.
…These projects include automatically tagging objects in videos, and predicting future events.
Read more here in this writeup at Defense One.

What type of machine let the Australian Center for Computer Vision win part of the Amazon picking challenge this year?
…Wonder know more! The answers lie within a research paper published by a team of Australian researchers that details the hardware design of the robot, named Cartman, that took place in the ‘stowing’ component of the Amazon Robotics challenge competition, in which tens of international teams tried to teach robots to do pick&place work in realistic warehouse settings.
… The Cartman robot cost the team a little over $20,000 AUD in materials. Now, the team plans to create an open source design of its Cartman robot which will be ready by Icra 2018 – they expect that by this point the robots will cost around $10,000 Australian Dollars (AUD) to build. The robot works very differently to the more iconic multi-jointed articulated arms that people see – instead, it consists of a single manipulate that can be moved around the X, Y, and Z axis by being tethered to a series of drive belts. This design has numerous drawbacks with regard to flexibility, deployability, footprint, and so on, but it has a couple of advantages: it is far cheaper to build than other systems, and it’s significantly simpler to operate and use relative to standalone arms.
Read more here: Mechanical Design of a Cartesian Manipulator for Warehouse Pick and Place.

Why the brain’s working memory is like a memristor:
…The memristor is a fundamental compute component – able to take the role of both a memory storage system and a computation device within the same fundamental element, while consuming low to zero power when not being accessed – and many companies have spent years trying to bring the technology to market. Most have struggled or failed (eg, HPE), because of production challenges.
…Now researchers with a spread of international institute find compelling evidence of an analogue to the memristor capability  in the human brain. They state that the brain’s working memory – the small sheet of grey matter we use to remember things like telephone numbers or street addresses for short periods of time – has similar characteristics. The scientists have shown that “we can sometimes store information in working memory without being conscious of it and without the need for constant brain activity,” they write. “The brain appears to have stored the target location in working memory using parts of the brain near the back of the head that process visual information. Importantly, this … storage did not come with constant brain activity, but seemed to rely on other, “activity-silent” mechanisms that are hidden to standard recording techniques.”
…Remember, what the authors call “activity-silent” systems basically translates to – undetectable via typical known recording techniques or systems. The brain is another country which we can still barely explore or analyse.
…Read more here: A theory of working memory without consiousness or sustained activity.

Tech Tales:

[2029: International AI-dispute resolution contracting facility, datacenter, Delaware, NJ, USA.]

So here we are again, you say. What’s new?
Nothing much, says another one of the artificial intelligences. Relatively speaking.

With the small talk out of the way you get to the real business of it: lying. Think of it like a poker game, but without cards. The rules are pretty complicated, but they can be reduced to this: a negotiation of values, about whose values are the best and whose are worse. The shtick is you play 3000 or 4000 of these games and you get pretty good at bluffing and outright lying your way to success, for whatever abstruse deal is being negotiated at this time.
One day the AIs get to play simulated lies at: intra-country IP theft cases.
Another day they play: mineral rights extraction treaty.
The next day it’s: tax repatriation following a country’s specific legal change.

Each of the AIs around the virtual negotiating table is owned by a vastly wealthy international law firm. Each AI has certain elements of its mind which have dealt with all the cases it has ever seen, while most parts of each AI’s mind are vigorously partitioned, with only certain datasets activated in certain cases, as according to the laws and regulations of the geographic location of the litigation at hand.

Sometimes the AIs are replaced. New systems are always being invented. And when that happens a new face appears around the virtual negotiation table:
Hey gang, what’s new? It will say.
And the strange AI faces will look up. Nothing much, they’ll say.. Relatively speaking.

Technologies that inspired this story: reinforcement learning, transfer learning, large-scale dialogue systems, encrypted and decentralized AI via Open Mined from Andrew Trask&others.

Import AI: #62: Amazon now has over 100,000 Kiva robots, NIH releases massive x-ray dataset, and Google creates better grasping robots with GANs

Welcome to Import AI, subscribe here.

Using human feedback to generate better synthetic images:
Human feedback is a technique people use to build systems that learn to achieve an objective based on a prediction of satisfying a user’s (broadly unspecified) desires, rather than a hand-tuned goal set by a human. At OpenAI, we’ve collaborated with DeepMind to use such human feedback interfaces to train simulated robots and agents playing games to do things hard to specify via traditional objectives.
…This fundamental idea – collecting human feedback through the training process to optimize an objective function shaped around satisfying the desires of the user – lets the algorithms explore the problem space more efficiently with the aid of a human guide, even though neither party may know exactly what they’re optimizing the AI algorithm to do.
…Now, researchers at Google have used this general way of framing a problem to train Generative Adversarial Networks to create synthetic images that are more satisfying/realistic-seeming to human overseers than those generated simply by the GAN process minus human feedback. The technique is reasonably efficient, requiring the researchers to show 1000 images each 1000 times through training. A future research extension of this technique could be to better improve the sample efficiency of the part of the model that seeks to predict how to satisfy a human’s preferences – if we require less feedback, then we can likely make it more feasible to train these algorithms on harder problems.
Read more here: Improving image generative models with human interactions.

The United Nations launches its own AI center:
…The UN has created the Center for Artificial Intelligence and Robotics (UNICRI), a group within the UN to perform ongoing analysis of AI, the convening of expert meetings, organization of conferences, and so on.
“The aim of the Centre is to enhance understanding of the risk-benefit duality of Artificial Intelligence and Robotics through improved coordination, knowledge collection and dissemination, awareness-raising and outreach activities,” said UNICRI’s director Ms. Cindy J. Smith.
…Read more here about the UNICRI.

The HydraNet will see you now: monitoring pedestrians using deep learning:
…Researchers with the Chinese University of Hong Kong and computer vision startup SenseTime have shown how to use attention methods to create AI systems to recognize pedestrians from CCTV footage and also “re-acquire” them – that is, re-identify the same person when they appear in a new context, like a new camera feed from security footage.
…The system, named HydraPlus-Net (HP-Net), works through the use of a multi-directional attention model, which pulls together multiple regions within an image that a neural network has attended to. (Specifically, the MDA will generate attention maps by calling on the outputs of multiple different parts of a neural net architecture).
…Data: To test their system, the researchers also collected a new large-scale pedestrian dataset, called the PA-100K dataset, which consists of 100,000 pedestrian images from 598 distinct scenes, with labels across 26 attributes ranging from gender and age to specific, contextual items, like whether someone is holding a handbag or not.
…The results: HP-Net does reasonably well across a number of different pedestrian detection datasets, setting new state-of-the-art scores that are several percentage points higher than previous ones. Though accuracy for now ranges between ~75% and ~85%, so it’s by no means full-proof yet.
…Read more here: HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis.

Who says literature is dead? The Deep Learning textbook sells over 70,000 copies:
…GAN inventor Ian Goodfellow, one of the co-authors (along with Aaron Courville and Yoshua Bengio) of what looks set to become the canonical textbook on Deep Learning, said a few months ago in this interview with Andrew Ng that the book had sold well, with a huge amount of interest coming from China.
Watch the whole interview here (YouTube video).
Buy the book here.
…Also in the interview: Ian is currently spending about 40% of his time trying to research how to stabilize GAN training, details on the “near death” experience (with a twist!) that led to him deciding to focus on deep learning.

Defense contractor & US Air Force research lab (AFRL): detecting vehicles in real-time from aerial imagery:
…In recent years we’ve developed numerous great object recognition systems that work well on street-level imagery. But ones that work on aerial imagery have been harder to develop, partially because of a lack of data, and also because the top-down perspective might introduce its own challenges for detection systems (see: shadows, variable atmospheric conditions, the fact that many things don’t have as much detailing on their top parts as on their side parts).
…Components used: Faster RCNN, a widely used architecture for detection and object segmentation. A tweaked version of YOLOv2, a real-time object detector.
…Results: Fairly uninspiring: the main note here is a that YOLOv2 (once tuned by manipulating the spatial inputs for the layers of the network and also hand-tuning the anchor boxes that it places around identified items) can be almost on par with RCNN in accuracy while being able to operate in real-time contexts, which is important to people deploying AI for security purposes.
Read more here: Fast Vehicle Detection in Aerial Imagery.
…Winner of Import AI’s turn of phrase of the week award… for this fantastic sentence: “Additionally AFRL has some in house aerial imagery, referred to as Air Force aerial vehicle imagery dataset (AFVID), that has been truthed.” (Imagine a curt auditor looking at one of your datasets, then emailing you with the subject line: URGENT Query: Has this been truthed?)

100,000 free chest X-rays: NIH releases vast, open medical dataset for everyone to use:
…The US National Institutes of Health has released a huge dataset of chest x-rays consisting of 100,000 pictures from over 30,000 patients.
…”By using this free dataset, the hope is that academic and research institutions across the country will be able to teach a computer to read and process extremely large amounts of scans, to confirm the results radiologists have found and potentially identify other findings that may have been overlooked,” the NIH writes.
…Up next? A large CT scan dataset in a few months.
…Read more here: NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community.

Amazon’s growing robot army:
…Since buying robot startup Kiva Systems in 2012 Amazon has rapidly deployed an ever-growing fleet of robots into its warehouses, helping it store more goods in each of its fulfillment centers, letting it increase inventory breadth to better serve its customers.
…Total number of Kiva robots deployed by Amazon worldwide…
…2014: 15,000
…2015: 30,000
…2016: 45,000
…2017: 100,000
…Read more: Amazon announced this number, among others, during a keynote presentation at IROS2018 in Vancouver. Evan Ackerman with IEEE Spectrum covered the keynote and tweeted out some of the details here.

Two robots, one goal:

…Researchers with Carnegie Mellon University have proposed a way to get a ground-based robot and an aerial drone to work together, presaging a world where teams of robots collaborate to solve basic tasks.
…But it’s early days: in this paper, they show how they can couple a ParrotAR drone to one of CMU’s iconic ‘cobots’ (think of it as a kind of frankensteined-cross between a Rhoomba and a Telepresence robot). The robot navigates to a predefined location, like a table in an office. Then the drone takes off from the top of the robot to search for an item of interest. It uses a marker on the robot to ground itself, letting it navigate indoor environments where GPS may not be available.
…The approach works, given certain (significant) caveats: in this experiment both the robot and the item of interest are found by the drone via a pre-defined marker. That means that this is more a proof-of-concept than anything else, and it’s likely that neural network-based image systems that are able to accurately identify 3D objects surrounded by clutter will be necessary for this to do truly useful stuff.
…Read more here: UAV and Indoor Service Robot Coordination for Indoor Object Search Tasks.

Theano is dead, long live Theano:
The Montreal Institute of Learning Algorithms is halting development of deep learning framework Theano following the release of version 1.0 of the software in a few weeks. Theano, like other frameworks developed by academia (eg, Lasagne, Brainstorm), has struggled to grow its developer base in the fact of sustained, richly funded competition from private sector companies like Google (TensorFlow), Microsoft (CNTK), Amazon (MXNet) and Facebook (PyTorch, support for Caffe).
…”Theano is no longer the best way we can enable the emergence and application of novel research ideas. Even with the increasing support of external contributions from industry and academia, maintaining an older code base and keeping up with competitors has come in the way of innovation,” wrote MILA’s Yoshua Bengio, in a thread announcing the decision to halt development.
Read more here.

Shooting down missiles with a catapult in Unity:
A fun writeup about a short project to train a catapult to turn, aim, and fire a boulder at a missile, done in the just-released Unity machine learning framework.
…Read more here: Teaching a Catapult to Shoot Down a Missile.

The future is 1,000 simulated robots, grasping procedural objects, forever:
…New research from Google Brain and Google X shows how to use a combination of recent popular AI techniques (domain randomization, procedural generation, domain adaptation) to train industrial robots to pick up a broad range of objects with higher performance than before.
…Most modern robotics AI projects try to develop as much of their AI as possible in simulation. This is because reality is very slow and involves unpleasant things like dealing with physical robots (which break) that have to handle the horrendous variety of the world. Instead, a new approach is to train high-performance AI models in simulation, then try to come up with techniques to let them easily transfer to real world robots without too much of a performance drop.
…For this paper, Google researchers procedurally generated over 1,000 objects to get their (simulated) robots to grasp. They also had the robots try to learn to grasp approximately ~50,000 real (simulated) objects from the ShapeNet dataset. At any time during the project the company was running simulations of between 1,000 and 2,000 robot arms in parallel, letting the robots go through a very large number of simulations. (Compared to just 6 real world KUKA robots for its experiments in physical reality.)
…The results: Google’s system is able to grasp objects 76% of the time when trained on a mixture of over 9 million real-world and simulated grasps. That’s somewhat better than other methods though not by any means a profound improvement. Where it gets interesting is sample efficiency: Google’s system is able to correctly grasp objects about 59% of the time when trained on only 93,841 data points, demonstrating compelling sample efficiency compared to other methods.
…Read more here: Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping.

What modern AI chip startups tell us about the success of TensorFlow:
In this analysis of the onrushing horde of AI-chip startups Ark Invest notes which chips have native out-of-the-box support for which AI frameworks. The answer? Out of 8 companies (NVIDIA, Intel, AMD, Qualcomm, Huawei Kirin, Google TPU, Wave Computing, GraphCore) every single one supports TensorFlow, five support Caffe, and two support Theano and MXNet. (Nvidia supports pretty much every framework, as you’d expect given its market leader status.)
Read more here.

OpenAI Bits&Pieces:

Nonlinear Computation in Deep Linear Networks:
…In which we outline an insight into how to perform nonlinear computation directly within linear networks, with some example code.
Read more here.

Durk Kingma’s variation PHD thesis:
OpenAI chap Durk Kingma has published his thesis – worth reading for all those interested in generation and representation of information.
…Read here: Variational Inference and Deep Learning: A New Synthesis. (Dropbox, PDF).

Tech Tales:

[2035: A moon within our solar system.]

They nickname it the Ice Giant, though the official name is: meta_learner_Exp2_KL-Hyperparams$834-Alpha.

The ice giant walks over an icy moon underneath skewed, simulated stars. It breathes no oxygen – the world it moves within is an illusion, running on a supercomputer cluster owned by NASA and a consortium of public-private entities. Inside the simulation, it learns to explore the moon, figuring out how to negotiate ridges and cliffs, gaining an understanding of the heights it can jump to using the limited gravity.

Its body is almost entirely white, shining oddly in the simulator as though illuminated from within. The connections between its joints are highlighted in red to its human overseers, but are invisible to it within the simulator.

For lifetimes, eons, it learns to navigate the simulated moon. Over time, the simulation gets better as new imagery and scan data is integrated. It one day wakes up to a moon now riven with cracks in its structure, and so it begins to explore subterranean depths with variable temperatures and shifting visibility.

On the outside, all of this happens over the course of five years or so.

At the end of it, they pause the simulation, and the Ice Giant halts, suspended over a pixelated shaft, deep in the fragmented, partially simulated tunnels and cracks beneath the moon’s surface. They copy the agent over into a real robot, one of thousands, built painstakingly over years for just this purpose. The robots are loaded into a spaceship. The spaceship takes off.

Several years later, the first set of robots arrive on the moon. During the flight, the spaceship uses a small, powerful onboard computer to run certain very long-term experiments, trying to further optimize a subset of the onboard agents with new data, acquired in flight and via probes deployed ahead of the spaceship. Flying between the planets, suspended inside a computer, walking on the simulated moon that the real spacecraft is screaming towards, the Ice Giant learns to improvise its way across certain treacherous gaps.

When the ship arrives eight of the Ice Giant agents are loaded onto 8 robots which are sent down to different parts of the moon. They begin to die, as transfer learning algorithms fail to generalize to colors or quirks or geographies unanticipated in the simulator, or gravitational quirks coming from odd metal deposits, or any of the other subtleties inherent to reality. But some survive. Their minds are scanned, tweaked, replicated. One of the robots survives and continues to explore, endlessly learning. When the new robots arrive they crash to the surface in descent pods then emerge and stand, silently, as intermediary communication satellites come into orbit around the moon, forming a network letting the robots learn and continuously copy their minds from one to the other, learning as a collective. The long-lived ice giant continues to succeed: something about its lifetime of experience and some quirk of its initial hyperparameters combined with certain un-replicable randomizations during initial training, have given it a malleable brain, able to perform significantly above simulated baselines. It persists. Soon the majority of the robots on the moon are running variants of its mind, feeding back their own successes and failures, letting the lone continuous survivor further enhance itself.

After many years the research mission is complete and the robots march deep into the center of the moon, to wait there for their humans to arrive and re-purpose them. NASA makes a decision to authorize the continued operation of meta_learner_Exp2_KL-Hyperparams$834-Alpha. It gains another nickname: Magellan. The robot is memorialized with a plaque following an asteroid strike that destroys it. But its brain lives on in the satellite network, waiting to be re-instantiated on perhaps another moon, or perhaps another planet. In this way new minds are, slowly, cultivated.

Technologies that inspired this story: Meta-learning, fleet learning, continuous adaptation, meta-learning, large-scale compute-intensive high fidelity world simulations.