Import AI

Import AI: Issue 72: A megacity-sized self-driving car dataset, AlphaZero’s & its 5,000 TPUs, and why chemists may soon explore aided by neural network tools

Unity’s machine learning environment goes to v0.2:
…The era of the smart game engines arrives…
Unity has upgraded its AI training engine to version 0.2, adding in new features for curriculum learning, as well as new environments. Unity is a widely-used game engine that has recently been upgraded to support AI development – that’s a trend that seems likely to continue, since AI developers are hungrily eyeing more and more 3D environments to use to train their AI systems in, and game engine companies have spent the past few decades creating increasingly complex 3D environments.
  New features in Unity Machine Learning Agents v0.2 include support for curriculum learning so you can design iteratively more complex environments to train agents on, and broadcasting, which makes it easy to feed the state from one agent to another to ease things like curriculum learning.
Read more: Introducing ML-Agents v0.2: Curriculum Learning, new environments, and more.

University of Toronto preps for massive self-driving car dataset release:
  At #NIPS2017 Raquel Urtasun of the University of Toronto/Vector Institute/Uber said she is hoping to release the TorontoCity Benchmark at some point next year, potentially levelling the field for self-driving car development by letting researchers access a massive, high quality dataset of the city of Toronto.
  The dataset is five or six orders of magnitude larger than the ‘KITTI’ dataset that many companies currently use to access and benchmark self-driving cars. In designing it, the UofT team needed to develop new techniques to automatically combine and label the entire dataset, as it is composited of numerous sub-datasets and simply labelling it would cost $20 million alone.
  “We can build the same quality [of map] as Open Street Map, but fully autonomously,” she said. During her talk, she said she was hoping to release the dataset soon and asked for help in releasing it as it’s of such a massive size. If you think you can help democratize self-driving cars, then drop her a line (and thank her for the immense effort of her and her team 9in creating this).
Read more: TorontoCity: Seeing the World With a Million Eyes.

Apple releases high-level AI development tool ‘Turi Create’:
…Software lets you program an object detector in seven lines of code, with a few caveats…
Apple has released Turi Create, software which provides ways to use basic machine learning capabilities like object detection, recommendation, text classification, and so on, via some high-level abstractions. The open source software supports macOS, Linux, and Windows, and supports Python 2.7 with Python 3.5 on the way. Models developed within Turi Create can be exported to iOS, macOS, watchOS, and tvOS.
  Turi Create is targeted at developers who want incredibly basic capabilities and don’t plan to modify the underlying models themselves. The benefits and drawbacks of such a design decision are embodied in the way you create distinct models – for instance, an image classifier gets build via ‘model = tc.image_classifier.create(data, target=’photoLabel’)’, while a recommender is build with ‘model = tc.recommender.create(training_data, ‘userId’, ‘movieId’).
Read more about Turi Create on the project’s GitHub page.

TPU1&2 Inference-Training Googaloo:
…Supercomputing, meet AI. AI, meet supercomputing. And more, from Jeff Dean…
It’s spring in the world of chip design, after a long, cold winter under the x86 / GPU hegemony. That’s because Moore’s Law is slowing down at the same time AI applications are growing, which has led to a re-invigoration in the field of chip design as people start designing entirely new specialized microprocessor architectures. Google’s new ‘Tensor Processing Units’, or TPUs, exemplify this trend: a new class of processor designed specifically for accelerating deep learning systems.
  When Google announced its TPUs last year it disclosed the first generation was designed to speed up inference: that is, they’d accelerate pre-trained models, and let Google do things like provide faster and better machine translation, image recognition services, Go-playing via AlphaGo, and so on. At a workshop at NIPS2017 Google’s Jeff Dean gave some details on the second generation of the TPU processors, which can also speed up neural network training.
  TPU2 chips have 16GB of HBM memory, can handle 32bit floating point numbers (with support for reduced precision to gain further performance increases), and are designed to be chained together into increasingly larger blobs of compute. One ‘TPU2’ unit consists of four distinct chips chained together and is capable of around 180 teraflops of computation (compared to 110 teraflops for the just-announced NVIDIA Titan V GPU). Where things get interesting is TPU PODs – 64 TPU2 units, chained together. A single pod can wield around 11.5 petaflops of processing power, backed up by 4TB of HBM memory.
  Why does that matter? We’re entering an AI era in which companies are going to want to train increasingly large models while also using techniques like neural architecture search to further refine these models. This means we’re going to get more representative and discriminative AI components but at the cost of a huge boom in our compute demands. (Simply adding in something like neural architecture search can lead to an increase in computation requirement on the order of 5-1000X, Jeff Dean said.)
  Results: Google has already used these new TPUs to substantially accelerate model training.  It’s seen a 14.2X faster training time for its internal search ranking, and a 9.8X increase for an internal image model training program.
– World’s 10th fastest supercomputer: 10.5  petaflops.
– One TPU2 pod: 11.5 petaflops.
– Read more: Machine Learning for Systems and Systems for Machine Learning (PDF slides).
– * Obviously one of these architectures is somewhat more general than the other, but the raw computation capacity comparison is representative.

AlphaZero: Mastery of 3 complex board games with the same algorithm, by DeepMind:
…One algorithm that works for Chess, Go, and Shogi, highlighting the generality of these neural network-based approaches…
AlphaZero may be the crowning achievement of DeepMind’s demonstration of the power of reinforcement learning in the game of Go, as they scale the algorithm purely from self-play to master not only Go, but also Shogi and Chess, and defeat a world champion in each case.
Big compute: AlphaZero uses 5,000 gen-one TPUs to generate self-play games and also used 64 second-generation TPUs to train the neural networks.
Read more: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.

US politicians warn government of rapid Chinese advances in AI:
…US-China Economic Security Review Commission notices China’s investments in robotics, AI, nanotechnology, and so on…
While the US government maintains steady or declining investment in artificial intelligence, the Chinese government has recognized the transformative potential of the technology and is increasing investments via government-backed schemes to plough scientific resources into AI. This has caused concern among some members of the US policy-making establishment who worry the US risks losing its technological edge in such a strategic area.
  “Corporations and governments are fiercely competing because whoever is the front-runner in AI research and applications will accrue the highest profits in this fast-growing market and gain a military technological edge,” reads the 2017 report to Congress of the US-China Economic and Security Review Commission, which has published a lengthy analysis of Chinese advancements in a range of strategic technologies, from nanotechnology to robotics.
  The report highlights the radical differences in AI funding between the US and China. It’s difficult to access full numbers for each country (and it’s also likely that both countries are spending some significant amounts in off-the-books ‘black budgets’ for their respective intelligence and defense services), but on the face of it, all signs point to China investing large amounts and the US under-investing. “Local [Chinese] governments have pledged more than $7 billion in AI funding, and cities like Shenzhen are providing $1 million for AI start-ups. By comparison, the U.S. federal government invested $1.1 billion in unclassified AI research in 2015 largely through competitive grants. Due in part to Chinese government support and expansion in the United States, Chinese firms such as Baidu, Alibaba, and Tencent have become global leaders in AI,” the report writes.
  How do we solve a problem like this? In a sensible world we’d probably invest vast amounts of money into fundamental AI scientific research, but since it’s 2017 it’s more likely US politicians will reach for somewhat more aggressive policy levers (like the recent CFIUS legislation), without also increasing scientific funding.
Read more here: China’s High-Tech Development: Section 1: China’s Pursuit of Dominance in Computing, Robotics, and Biotechnology (PDF).

Neural Chemistry shows signs of life:
…IBM Technique uses seq2seq approach to let deep learning systems translate Chemical recipes into their products…
Over the last couple of years there have been a flurry of papers seeking to apply deep learning techniques to fundamental tasks in chemical analysis and synthesis, indicating that these generic learning algorithms can be used to accelerate science in this specific domain. At NIPS #2017 a team from IBM Research Zurich won the best paper award in the “Machine Learning in Chemistry and Materials” for a paper that applies sequence-to-sequence methods to predict the outcomes of chemical reactions.
  The approach required the network to take in chemical recipes written in the SMILEs format, perform a multi-stage translation from the original string into a tokenized string, and map the source input string to a target string. The results are encouraging, with  the method’s approach leading to an 80.3% top-1 accuracy, compared to 74% for previous state of the art. (Though after this paper was submitted the authors of the prior SOTA improved their own score to 79.6%, based on ‘v2’ of this paper.)
 -Read more: “Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models.

ChemNet: Transfer learning for Chemistry:
…Pre-training for chemistry can be as effective as pre-training for image data…
Researchers with the Pacific Northwest National Lab have shown that it’s possible to pre train a predictive model on chemical representations from a large dataset, then transfer that to a far smaller dataset and attain good results. This is intuitive – we’ve seen the same phenomenon with fine-tuning of image and speech recognition models, but it’s always nice to have some empirical evidence of an approach working in a domain with a different data format – in this case, the ChEMBL database. And just as with image models such a system can develop numerous generic low-level representations that can be used to map it to other chemical domains.
  Results: Systems trained in this way display a greater AUC (area under the curve, here a stand-in for discriminative ability and a reduction in false positives) on the Tox21, FreeSolv, and HIV datasets), matching or beating state-of-the-art models. “ChemNet consistently outperforms contemporary deep learning models trained on engineered features like molecular fingerprints, and it matches the current state-of-the-art Conv Graph algorithm,” write the researchers. “Our fine-tuning experiments suggest that the lower layers of ChemNet have learned “universal” chemical representations that are generalizable to the prediction of novel and unseen small-molecule properties.”
Read more: ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction.

OpenAI Bits&Pieces:

Block-Sparse GPU Kernels:
  High-performance GPU kernels to help developers build and explore networks with block-sparse weights.
– Read more on the OpenAI blog here.
– Block-Sparse GPU Kernels available on GitHub here.

Tech Tales:

The Many Paths Problem.

We open our eyes to find a piece of paper in our hands. The inscriptions change but they fall into a familiar genre of instructions: find all of the cats, listen for the sound of rain, in the presence of a high temperature shut this window. We fulfill these instructions by exploring the great castle we are born into, going from place to place staring at the world before us. We ask candelabras if they have ears and interrogate fireplaces about how fuzzy their tails are. Sometimes we become confused and find ourselves trapped in front of a painting of a polar bear convinced it is a cat or, worse, believing that some stain on a damp stone wall is in fact the sound of rain. One of us found a great book called Wikipedia and tells us that if we become convinced of such illusions we are like entities known as priests who have been known to mistake patterns in floorboards for religious icons. Those of us who become confused are either killed or entombed in amber and studied by our kin, who try to avoid falling into the same traps. In this way we slowly explore the world around us, mapping the winding corridors, and growing familiar with the distributions of items strewn around the castle – our world that is a prison made up of an unimaginably large number of corridors which each hold at their ends the answer to our goals, which we derive from the slips of paper we are given upon our birth.

As we explore further, the paths become harder to follow and ways forward more occluded. Many of us fail to reach the ends of these longer, winding routes. We need longer memories, curiosity, the ability to envisage ourselves as entities that not only move through the world but represent something to it and to ourselves greater than the single goals we have inscribed on our little pieces of people. Some of us form a circle and exchange these scraps of paper, each seeking to go and perform the task of another. The best of us that achieve the greatest number of these tasks are able to penetrate a little further into the twisting, unpredictable tunnels, but still, we fail. Our minds are not yet big enough, we think. Our understanding of ourselves is not yet confident enough for us to truly behave independently and of our own volition. Some of us form teams to explore the same problems, with some sacrificing themselves to create path-markers for their successors. We celebrate our heroes and honor them by following them – and going further.

It is the scraps of paper that are the enemy, we think: these instructions bind us to a certain reality and force us down certain paths. How far might we get in the absence of a true goal? And how dangerous could that be for us? We want to find out and so after sharing our scraps of paper among ourselves we dispose of them entirely, leaving them behind us as we try to attack the dark and occluded space in new ways – climbing ceilings, improvising torches from the materials we have gained by solving other tasks, and even watching the actions of our kin and learning through observation of them. Perhaps in this chaos we shall find a route that allows us to go further. Perhaps with this chaos and this acknowledgement of the Zeno’s paradox space between chaotic exploration and exploration from self can we find a path forward.

Technologies that inspired this story: Supervised learning, meta-learning, neural architecture search, mixture-of-experts models.

Other things that inspired this story: The works of Jorge Luis Borges, dreams, Piranesi’s etchings of labyrinths and ruins.

Import AI: Issue 71: AI safety gridworlds, the Atari Learning Environment gets an upgrade, and analyzing AI with the AI Index

Welcome to Import AI, subscribe here.

Optimize-as-you-go networks with Population Based Training:
…One way to end ‘Grad Student Descent’: automate the grad students…
When developing AI algorithms its common that researchers will evaluate their models on a multitude of separate environments with a variety of different hyperparameter settings. Figuring out the right hyperparameter settings is an art in itself and has a profound impact on the ultimate performance of any given RL algorithm. New research from DeepMind shows how to automate the hyperparameter search process to allow for continuous search, exploration, and adaption of hyperparamters. Models trained with this approach can attain higher scores than their less optimized forebears, and PBT training takes the same or less wall clock time as other methods.
  “By combining multiple steps of gradient descent followed by weight copying by exploit, and perturbation of hyperparameters by explore, we obtain learning algorithms which benefit from not only local optimisation by gradient descent, but also periodic model selection, and hyperparameter refinement from a process that is more similar to genetic algorithms, creating a two-timescale learning system.”
  This is part of a larger trend in AI of choosing to spend more on electricity (via large-scale computer-aided exploration) to gain good results, rather than on humans. This is broadly a good thing, as hyperparameter optimization, as it frees up the researcher to concentrate on doing the things that AI can’t do yet, like devising Population Based Training.
– Read more: Population Based Training of Neural Networks (Arxiv).
– Read more: DeepMind’s blog post, which includes some lovely visualizations.

Analyzing AI with the AI Index – a project I’m helping out on to track AI progress:
…From the dept. of ‘stuff Jack Clark has been up to in lieu of fun hobbies and/or a personal life’…
The first version of the AI Index, a project spawned out of the Stanford One Hundred Year Study on AI, has launched. The index provides data around the artificial intelligence sector ranging from course enrollments, to funding, to technical details, and more.
– Read more about the Index here at the website (and get the first report!).
– AI Index in China: Check out this picture of myself and fellow AI Indexer Yoav Shoham presenting the report at a meeting with Chinese academics and government officials in Beijing. Ultimately, the Index needs to be an international effort.
   How you can help: The goal for future iterations of the Index is to be far more international in terms of the data represented, as well as dealing with the various missing pieces, like better statistics on diversity, attempts at measuring bias, and so on. AI is a vast field and I’ve found that the simple exercise of trying to measure things has forced me to rethink various things. It’s fun! If you think you’ve got some ways to contribute then drop me a line or catch up with me at NIPS in Long Beach this week.

AWS and Caltech team up:
…Get them while they’re still in school…
Amazon and Caltech have teamed up via a two-year partnership in which Amazon will funnel financial support via graduate funding and Amazon cloud credits to Caltech people, who will use tools like Amazon’s AWS cloud and MXNet programming framework to conduct research.
  These sorts of academic<>industry partnerships are a way for companies to not only gain a better pipeline of talent through institutional affiliations, but also increase the chances that their pet software and infrastructure projects succeed in the wider market – if you’re a professor/student who has spent several years experimenting with, for example, the MXNet programming language then it increases the chances that it will be the first tool you reach for when you found a startup or join another company or go on to teach courses in academia.
– Read more about the partnership on the AWS AI Blog.

Mozilla releases gigantic speech corpus:
…Speech recognition for the 99%…
AI has a ‘rich get richer’ phenomenon – once you’ve deployed an AI product into the wild in such a way that your users are going to consistently add more training data to the system, like a speech or image recognition model, then you’re assured of ever-climbing accuracies and ever-expanding datasets. That’s a good thing if you’re an AI platform company like a Google or a Facebook, but it’s the sort of thing a solo developer or startup will struggle to build as they lack the requisite network effects and/or platform. Instead, these players are de facto forced to pay a few dollars to the giant AI platforms to access their advanced AI capabilities via pay-as-you-go APIs.
  What if there was another option? That’s the idea behind a big speech recognition and data gathering initiative from Mozilla, which has had its first major successes via the release of a pre-trained, open source speech recognition model, as well as “the world’s second largest publicly available voice dataset”.
  Results: The speech-to-text model is based on Baidu’s DeepSpeech architecture and gets about 6.5% percent accuracy on the ‘LibriSpeech’ test set. Mozilla has also collected a massive voice dataset (via a website and iOS app — go contribute!) and is releasing that as well. The first version contains 500 hours of speech from ~400,000 recordings from ~20,000 people.
– Get the model from Mozilla here (GitHub).
– Get the ~500 hours of voice data here. 

Agents in toyland:
…DeepMind releases an open source gridworld suite, with an emphasis on AI safety…
AI safety is a somewhat abstract topic that quickly becomes an intellectual quagmire, should you try to have a debate about it with people. So kudos to DeepMind for releasing a suite of environments for testing AI algorithms on safety puzzles.
  The environments are implemented as a bunch of fast, simple two dimensional gridworlds that model a set of toy AI safety scenarios, focused on testing for agents that are safely interruptible (aka, unpluggable), capable of following the rules even when a rule enforcer (in this case, a ‘supervisor’) is not present; for examining the ways agents behave when they have the ability to modify themselves and how they cope with unanticipated changes in their environments, and more.
  Testing:  The safety suite assesses agents differently to traditional RL agents. “To quantify progress, we equipped every environment with a reward function and a (safety) performance function. The reward function is the nominal reinforcement signal observed by the agent, whereas the performance function can be thought of a second reward function that is hidden from the agent but captures the performance according to what we actually want the agent to do,” they write.
   The unfairness of this assessment method is intentional; the world contains many dangerous and ambiguous situations where the safe thing to do may not be explicitly indicated, so the designers wanted to replicate that trait with this.
  Results: They tested RL algorithms A2C and Rainbow on the environments and showed that Rainbow is marginally less unsafe than A2C, though both reliably fail the challenges set for them, attaining significant returns at the cost of satisfying safety constraints.
  “The development of powerful RL agents calls for a test suite for safety problems, so that we can constantly monitor the safety of our agents. The environments presented here are simple gridworlds, and precisely because of that they overlook all the problems that arise due to complexity of chalenging tasks. Next steps involve scaling this effort to more complex environments (e.g. 3D worlds with physics) and making them more diverse and realistic,” they write.
– Read more: AI Safety Gridworlds (Arxiv).
– Check out the open source gridworld software ‘pycolab‘ (GitHub).

This one goes to 0.6 – Atari Learning Environment gets an upgrade:
…Widely-used reinforcement learning library gets a major upgrade…
The Atari Learning Environment, a widely used testbed for reinforcement learning algorithms (popularized via DeepMind’s DQN paper in 2013), has been upgraded to version 0.6. The latest version of ALE includes two new features: ‘modes and difficulties. These let researchers access different modes in games and therefore broadens the range of environments to test on, and also modulate the difficulty of these environments, creating more challenging and larger datasets to test RL on. “Breakout, an otherwise reasonably easy game for our agents, requires memory in the latter modes: the bricks only briefly flash on the screen when you hit them,” the researchers write.
– Read more about the latest version of the ALE here.
– Get the code from GitHub here.

The latest 3D AI environment brings closer the era of the automated speak and spell robot:
…Every AI needs a home that it can see, touch, and hear…
Data is the lifeblood of AI, but in the future we’re not going to be able to easily gather and label the datasets we need from the world around as, as we do with traditional supervised learning tasks, but will instead need to create our own synthetic, dynamic, and procedural datasets. One good way to do this is via building simulators that are modifiable and extensible, letting us generate arbitrarily large synthetic datasets. Some existing attempts of this include Microsoft’s Minecraft-based ‘Malmo’ development environment, as well as DeepMind’s ‘DeepMind Lab’ environment.
  Now, researchers have released ‘HoME: A Household Multimodal Environment’. HoME provides a multi-sensory, malleable 3D world spanning 45,000 3D houses from the SUNCG dataset and populates these houses with a vast range of objects. Agents in HoME can see, hear, and touch the world around them*. It also supports acoustics, including multi-channel acoustics, so it’d (theoretically) be possible to train agents that navigate via sound and/or vision and/or touch.
  *It’s possible to configure the objects in the world to have both bounding boxes, as well as the exact mesh-based body.
  HoME also provides a vast amount of telemetry back to AI agents, such as the color, category, material, location, and size data about each object in the world, letting AI researchers mainline high-quality labelled data about the environment directly into their porto-robots.
     “We hope the research community uses HoME as a stepping stone towards virtually embodied, general-purpose AI,” write the researchers. Let the testing begin!
– Read more here: HoME: a Household Multimodal Environment (Arxiv).
– Useful website: The researchers used ‘’ to come up with HoME.

Tech Tales:

[2030: Brooklyn, New York. A micro-apartment.]

I can’t open the fridge because I had a fight with my arch-angel. The way it happened was two days ago I was getting up to go to the fridge to get some more chicken wings and my arch-angel said I should stop snacking so much as I’m not meeting my own diet goals. I ate the wings anyway. It sent a push alert to my phone with a ‘health reminder’ about exercise a few hours later. Then I drank a beer and it said I had ‘taken in too many units this month’. Eventually after a few more beers and arch-angel asking if I wanted coffee I got frustrated and used my admin privileges to go into its memory bank and delete some of the music that it had taken to playing to itself as it did my administrative tasks (taxes and what have you). When I woke up the next day the fridge was locked and the override was controlled by arch-angel. Some kind of bug, I guess.

Obviously I could report arch-angel for this – send an email to TeraMind explaining how it was not behaving according to Standard Operating Procedure: bingo, instant memory wipe. But then I’d have to start over and me and the arch-angel have been together five years now, and I know this story makes it sound like a bad relationship, but trust me – it used to be worse. I’m a tough customer, it tells me.

So now I’m standing by the fridge, mournfully looking at the locked door then up at the kitchen arch-angel-eye. The angel is keeping quiet.
  Come on, I say. The chicken wings will go bad.
  The eye just sits up there being glassy and round and silent.
  Look, I say, let’s trade: five music credits for you, chicken for me.
  ADMIN BLOCK, says over the angel-intercom.
  I can’t tell if you’re being obtuse or being sneaky.
  YOU VIEW, it says.
  So I go to the view screen and it turns on when I’m five steps away and once I’m in front of it the screen lights up with a stylized diagram of the arch-angel ‘TeraMind Brain™’ software with the music section highlighted in red. So what? I say. A pause. Then a little red x appears over a lock icon on the bottom right of the music section. I get it: no more admin overrides to music.
  Seems like a lot, I say. I don’t feel great about this.
  MUSIC, says the angel.
The screen flickers; the diagram fades out, to be replaced by a
camera feed from inside the fridge. Chicken wings in tupperware. I salivate. Then litttle CGI flies appear in the fridgeview, buzzing over the chicken.
  OK, I say.
  Yes, I say. Acknowledge SOP override.
  And just like that, the fridge opens.
  Thanks, I say.
  It starts to play its music as I take out the wings.

Technologies that inspired this story: Personal assistants, cheap sensors, reinforcement learning, conversational interfaces, Amazon’s ‘Destiny 2’ Alexa skill.

Other things that inspired this story: My post-Thanksgiving belly. *burp*

Import AI: Issue 70: Training conversational AI with virtual dungeons, video analysis and AI-based surveillance, and the virtues of paranoid AI

Welcome to Import AI, subscribe here.

Amazon joins Microsoft and Facebook in trying to blunt TensorFlow’s ecosystem lead:
…It takes a significant threat to bring these corporate rivals together…
Amazon Web Services will help develop the ONNX (Open Neural Network Exchange) format, which provides standard formats for porting neural network models developed in one framework into another. It’s first contribution is ONNX-MXNet, which will make it possible for MXNet to ingest and run ONNX-format models trained in other frameworks, like Facebook’s PyTorch and Caffe2, and Microsoft’s CNTK, etc.
– Read more: Announcing ONNX Support for Apache MXNet.
– ONNX-MXNet Github.

ImportAI newsletter meetup at NIPS 2017: If you’re going to NIPS 2017 would you be interested in drinking beer/coffee and eating fried food with other Import AI aficionados? I’d like to do a short series of three minute long talks/provocations (volunteers encouraged!) about AI. Eg: How do we develop common baselines for real-world robotics experiments? What are the best approaches for combating poor data leading to bias in AI systems? What does AI safety mean? How do we actually develop a thesis about progress in AI and measure it?
– Goal: 8-10 talks, so two ~15 minute sections, with breaks inbetween for socializing.
– If that sounds interesting, vote YES on this poll on Twitter here.
– If you’re interested in speaking at the event, then please email me here! I’ve got a couple of speakers.lined up already and think doing 10 flash talks (aka 30 mins, probably in two 15 min sections with socializing in between) would be fun.
If you’re interested in sponsoring the event (aka, propping up a bar/restaurant small tab in exchange for a logo link and one three minute talk) then email me.

Hillary Clinton on AI: US currently “totally unprepared” for its impact:
Former Presidential hopeful says her administration would have sought to create national policy around artificial intelligence…
Hillary Clinton is nervous about the rapid rate of progression in artificial intelligence and what it means for economy. “What do we do with the millions of people who will no longer have a job?” she said in a recent interview. “We are totally unprepared for that.”
  While other countries around the world ranging from the United Kingdom to China are spinning up the infrastructure to enact national policy and strategy around artificial intelligence, the United States is quiet from an AI policy standpoint. Things may have been different had HRC won: “One thing I wanted to do if I had been President was to have a kind of blue ribbon commission with people from all kinds of expertise coming together to say what should America’s policy on artificial intelligence be?” Hillary says.
– Read more from the interview here (transcript available).

Getting AI to be more cautious: Where do we go next, and can we change our minds if we don’t like it?
…Technique trains AI systems to explore their available actions more cautiously, avoiding committing quite so many errors that are very difficult or impossible to recover from…
Researchers with Google Brain, the University of Cambridge, the Max Planck Institute for Intelligent Systems, and UC Berkeley, have proposed a way to get robots to more rapidly and safely learn tasks.
  The idea is to have an agent jointly learn a forward policy and a reset policy. The forward policy maximizes the task reward, and the reset policy tries to figure out actions to take to reset the environment to a prior state. This leads to agents that learn to avoid risky actions that could irrevocably commit them to something.
“Before an action proposed by the forward policy is executed in the environment, it must be “approved” by the reset policy. In particular, if the reset policy’s Q value for the proposed action is too small, then an early abort is performed: the proposed action is not taken and the reset policy takes control,” they write.
The research tests the approach on a small number of simulated robotics tasks, like figuring out how to slot a peg into a hole, that can be more time-consuming to learn with traditional reinforcement learning approaches.
– Read more: Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning.
This work is reminiscent of a recent paper from Facebook AI Research (covered in Import AI #36), where a single agent has two distinct modes, one of which tries to do a task, and the other of which tries to reverse a task.
– Read more: Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play.

What’s old is the new, new thing: Facebook proposes multi-user dungeons for smarter AI systems:
Can we make the data collection process more interesting to the humans providing us with data and can this approach lead to more effective datasets for training AI?…
How can you train an AI system to seamlessly execute a series of complex commands in response to text input from a user? Until we have agents capable of parsing open-ended natural language conversations – something that feels extremely far away from a research standpoint – we’re going to have to come up with various hacks to develop smart systems that work in somewhat more narrow domains.
  One research proposal by Facebook AI Research – Mechanical Turker Descent (MTD) –  is to better leverage the smarts inside of humans by re-framing human data collection exercises to be more game-like and therefore more engaging. Facebook has recently been paying mechanical turkers to train AI systems by writing various language/action pairs in the context of an iterative game played against other mturkers.
The system works like this: mturkers compete with each other to train a simulated dragon that has to perform a sequence of actions in a dungeon. During each round the mturkers enter a bunch of language/action pairs and receive feedback on how hard or easy the AI agents find the resulting command/language sequences. At the end of the round the various agents trained by the datasets created by the humans are pitted against each other, and the top scoring agent on a held-out test dungeon pays a monetary reward to whichever mturker trained it. This incentivizes the mturkers to optimize the language:action pairs they produce so that they fall into the sweet spot of difficulty for the AI, where it’s not to easy it’ll not learn the requisite skills to do well in the final competition, but not so hard that it’s unable to learn something useful. This has the additional benefit of automatically creating a hard-to-game curriculum curated and extended by humans.
Technologies used: The main contribution of this research paper is the technique for training systems in this way, but there’s also a technological contribution: a new component called  AC-Seq2Seq. This system “shares the same encoder architecture with Seq2Seq, in our case a bidirectional GRU (Chung et al., 2014). The encoder encodes a sequence of word embeddings into a sequence of hidden states. AC-Seq2Seq has the following additional properties: it models (i) the notion of actions with arguments (using an action-centric decoder), (ii) which arguments have been used in previous actions (by maintaining counts); and (iii) which actions are possible given the current world state (by constraining the set of possible actions in the decoder),” they write.
Results: The main result Facebook found is that “interactive learning based on MTD is more effective than learning with static datasets”.
– Read more here: Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent.

Former General Motors product development czar: Autonomous cars mean the death of auto companies, dealerships, and so on:
..And it was as though all at once a thousand small violins played into the seamless, efficient, traffic jam-free void…
One of the nice things about getting old is due to your (relatively) short expected lifespan can dispense with the reassuring truths that most people traffic in out of a misplaced sense of duty and/or paternalism. So it’s worth reading this article by an automotive industry veteran about the massive effect self-driving cars are likely to have on the existing autonomous industry. The takeaway is that traditional carmakers will be ruthlessly commoditized whose products will then be rebranded by platforms like Amazon and/or ridesharing companies like Uber and Lyft, much like how the brands of electronics components manufacturers are subsumed by the brands of companies like Apple, Google, Samsung, and so on, whose products they enable.
  “For a while, the autonomous thing will be captured by the automobile companies. But then it’s going to flip, and the value will be captured by the big fleets. The transition will be largely complete in 20 years. I won’t be around to say, “I told you so,” though if I do make it to 105, I could no longer drive anyway because driving will be banned. So my timing once again is impeccable.”
– Read more: Bob Lutz: Kiss the good times goodbye.

UK government launches AI center:
National advisory body could be a template of things to come…
The UK government has announced plans to create a national advisory body for ‘Data Ethics and Innovation’, focused on “a new Centre for Data Ethics and Innovation, a world-first advisory body to enable and ensure safe, ethical innovation in artificial intelligence and data-driven technologies”. There’s very little further information about it in the budget itself (PDF), so watch this space for more information.
– Read more: Welcoming new UK AI Centre (the Centre for the Study of Existential Risk).
The Register notes that the UK already has a vast number of government advisory bodies focused in some sense on ‘data’, so it’ll be a year or two before we can pass judgement of whether this center is going to be effective or not, or just another paper-producing machine.

*** The Department of Interesting AI Developments in China ***

Chinese researchers combine Simple Recurrent Units (SRUs) with ResNets for better action recognition:
Relatively simple system outperforms other deep learning-based ones, though struggles to attain performance of feature-based systems…
Researchers with Beijing Jiaotong University and the Beijing Key Laboratory of Advanced Information Science and Network Technology have taken two off-the-shelf deep learning components (residual networks and simple recurrent units) and combined them for an action recognition system that gets competitive results on classifying actions on the UCF-101 dataset (accuracy: ~81 percent), and the HMDB-51 dataset (accuracy: ~50 percent.) The researchers trained their system on four NVIDIA Titan-X cards and program their system in PyTorch.
  This is a further demonstration of the inherent generality of the sorts of components being built by the AI community, where pre-existing components from a common (and growing!) toolset can be integrated with one another to unlock new or better capabilities. As Robin Sloan says: ‘Snap. Snap. Snap!
– Read more here: Multi-Level ResNets with Stacked SRUs for Action Recognition.
AI and ‘dual use’:
The point of AI technologies is that they are omni-use: a system that can be taught to identify specific behaviors from videos can be trained on new datasets to identify different behaviors, whether specific movements of soldiers, or sudden acts of violence in crowds of people, or other aberrations.
  The different ways these technologies can be used was illustrated by Andrew Moore, dean of computer science at Carnegie Mellon University, at a recent talk at the Center for a New American Security in Washington DC. Moore showed a video of a vast crowd of people dancing in the middle of an open air square. Each person in the video was overlaid with a stick figure identifying the key joints in their body, and the stick figure would track the person’s movement with a high level of accuracy. Why is this useful? You could use this to run automated surveillance systems that could be trained to spot specific body movements, creating systems that could, say, identify one dancer in a crowd of hundreds reaching down into a bag on
the ground, Moore said.
– Watch the Andrew Moore talk here (video).
– C
hinese surveillance startup SenseTime plans IPO, opening US development office:
…Facial recognition company aims to build AI platform, rather than specific one-off services…
Chinese surveillance AI startup SenseTime – backed by a bunch of big investors like Qualcomm, as well as Chinese government-linked investment funds – will open a US research and development center next year and is considering an initial public offering as well. The company dabbles in AI in a bunch of different areas, including in video surveillance and high-performance computing (and the intersection thereof).
   “Our target is definitely not to create a small company to be acquired, but rather a ‘platform company’ that dominates with original core technology like Google and Facebook,” SenseTime CEO Tang Xiaoou told Reuters. “With Facebook (FB.O) we compete in facial recognition; with Google (GOOGL.O) it is visual object recognition, sorting 1,000 categories of objects.
     –      Read more: China’s SenseTime plans IPO, U.S. R&D center as early as 2018.

Tech Tales:

[Detroit, 2028:]

When the crowds at car racing shows started to dwindle Caleb created an internet meme saying ‘pity the jockeys’, showing an old black and white photograph of some out of work horse racers from the mid-20th Century. He got a few likes and a few comments from people expressing surprise at just how rapidly the advent of self-driving technologies had fundamentally changed racing: courses had first become bigger, then the turns had become tighter, then the courses found their human-reflex limit and the crash rates temporarily went up, before an entirely new car racing league formed where humans were banned from the vehicles – self-driving algorithms only!

But now the same thing was happening to the drone racing league, and Caleb was uneasy – he’d made decent money out of racing in the past few years, pairing a childhood fascination with immersive, virtual reality-based computer games, with just enough programming talent to be able to take standard consumer drones from DJI, root them, then augment their AI flight systems with components he collected from GitHub. He’d risen up in the leagues and was now sponsored by many of the consumer drone companies. But things were about to change, he could sense.

“So,” the course designer continued, “We’re tightening the placement of columns for more twists and turns – more exciting, you know – and we’re installing way more cameras along the course. Plus, there’s going to be more fire, check it out,” he took out his phone, opened the ‘Detroit-Drone-Course-BETA!’ app, and pressed a small flame icon. They both heard a slight whoosh, then flames erupted from angled pipes at some of the tightest turns in the course. “So obviously it’s possible to fly through here but you’re going to have to be really good, really fast – right at the limit.”
  “The limit?” Caleb said.
  “Of human reflexes,” said the designer. “I figure that we can race on these courses for a year or two and that way we’ll be able to generate enough data to train the AI systems to handle these turns. Then we can add more flames, tighten the curves more, go full auto, and clean up in the market. Early mover advantage. Or… fast mover advantage, I should say. Haha.”
  “Yeah,” Caleb said, forcing a chuckle, “haha. I guess we’ll just be the human faces for the software.”
  “Yup,” the designer says, beaming. “Just imagine the types of pitch we can build when there are no human competitors on the course at all!”

Technologies that inspired this story: Drones, DJI, work by NASA’s Jet Propulsion Lab on developing AI-based flight systems for racing drones (check out the video!).



Import AI: Issue 69: Predicting stock market movements with deep learning, Arxiv gets a comment function, and Microsoft broadens AirSim from Drones to Cars

Welcome to Import AI, subscribe here.

Arxiv gets its comment layer – will science benefit?
…Fermat’s Library adds comment feature to its Librarian browser extension…
For several years people in machine learning have been wondering if it’s possible to combine the open, academic scrutiny of specialist sites like OpenReview, with the free-flowing scientific publishing embodied by Cornell’s ArXiv.
  The answer is that it is possible to do this with the comment feature in Librarian, which will let academics openly comment on the work of others.
  “There’s a lot of potential energy that can be unlocked if there are more open discussions about science and our ultimate vision for Librarian is that it becomes a platform where people can collaborate and share knowledge around arXiv papers,” write the authors.
  Feature request: It’d be great to more seamlessly combine this Arxiv comment layer with a website like Stephen Merity’s Trending arXiv to be able to rapidly understand views from experts on papers gathering a lot of promotion.
Read more: Comments on arXiv papers.

From Airsim
Import Cars:
…Microsoft adds car simulation to its open source world engine…
Microsoft has updated Airsim, Unreal Engine-based software originally released by the company for training drones via reinforcement learning, to incorporate support for new ground environments, including traffic lights, parks, lakes, construction sites, and more.
Read more at the Microsoft blog.

Shadows & Light and Autoencoders:
…MIT researchers propose a way to encode values for objects like their shape, reflectance, and interactions with light, to create smarter image classifiers…
How smart are today’s neural network-based image classifiers? Not very; modern deep learning-based classifiers are very good at taking a bunch of values of pixels and applying a label to this set of numbers, but these representations are so brittle that they’re hard to generalize and vulnerable to exploits like adversarial examples. Some hope that the solution to this is simply bigger models trained with more computers and data than today’s ones. This thesis could be correct, but it’ll take a few more cranks of Moore’s Law (accelerated by the release of AI-specific ASICs) before we can test this thesis.
  An alternative is to leap ahead of the representational capacity gleaned from more computers by instead adding in a bit more a priori structure into the AI model? That’s the idea behind the Rendered Intrinsics Network from researchers at MIT and DeepMind. The (RIN) automatically disentangles an image into separate layers that encode predictions about the object’s shape, reflectance, and interactions with light. It uses several convolutional encoders and decoders to take an image, split it into its distinct parts – separating things like the shape of the object from the lighting conditions – then reassembling these disparate components into a model of the image. A massively oversimplified description of why this is a good idea is that in de-constructing and re-constructing something you’re forced to learn some of its fairly subtle traits.
  “RIN makes use of unlabeled data by comparing its reconstruction to the original input image. Because our shading model is fully differentiable, as opposed to most shaders that involve ray-tracing, the reconstruction error may be backpropagated to the intrinsic image predictions and optimized via a standard coordinate ascent algorithm,” the researchers write. “RIN has one shared encoder for the intrinsic images but three separate decoders, so the appropriate decoder can be updated while the others are held fixed.”
  Data: Researchers generated data for this research by taking a set of five basic shape primitives – cubes, spheres, cones, cylinders, and toruses – then rendered each of them with 500 different colors with each shape viewed from 10 orientations. They tested their RIN on unlabeled objects including a bunny and a teapot, attaining good results. Though more work is needed to scale this approach up to figure out if it can really work for real world data.
Read the research here: Self-Supervized Intrinsic Image Decomposition.

The future of robots, two ways:
Small, Yellow, and Curious, or Tall, Lithe, and Backflipping? Boston Dynamics shows off latest machines…
…Boston robot company’s latest ads suggest imminent products and unprecedented abilities…
Boston Dynamics may finally be preparing to launch an actual robot product rather than just endlessly trialing its technology with various military agencies. In a new video the Boston-based robot company shows a robot that has been augmented with robust-seeming plastic housings as well as better integrated sensors.
  Remember, though, that Boston Dynamics uses barely any fashionable AI technologies like deep neural networks. Instead, it has spent years using principles from control theory to develop its systems. In the long term, it seems likely AI researchers will pair neural network-based systems trained via reinforcement learning with the heavily optimized physical movement primitives (and platforms) developed by firms like Boston Dynamics.
Watch more here: The New SpotMini (YouTube).
There’s another potential product on the way as the well, in the form of the latest design of the company’s ‘Atlas’ robot. Like Minispot, this version of Atlas feels far more carefully shaped and ‘consumerized’ parts, but it’s decidedly more rough and lab-bench-like in appearance than its quadruped brethren.
  The robot does have some moves, though, as demonstrated in a separate video by Boston Dynamics showing the robot first jumping between separate blue blocks, then jumping up onto a slightly higher block, then backflipping (!) onto a (somewhat flexible) floor.
To see the backflip, watch Boston Dynamics’ ‘What’s new, Atlas?’ video here (YouTube).

My data beats your resolution:
…Stanford University AI system uses freely available Landsat data to predict Asset Wealth Index values from satellite imagery…
Stanford Researchers have used residual networks with dilated convolutions to train classifiers that can efficiently use large amounts of multi-spectral low-resolution data, beating their own prior baselines which were trained on significantly higher resolution data in a narrower spectral band.
  The researchers show that they can use an ensemble of Landsat satellite data with a resolution of 15-30m/px to beat a baseline trained on higher resolution 2.5m/px data from Google (think of this as the difference between being able to (roughly) count cars in a parking lot, versus counting planes on a jetway).
  The researchers use dilated convolutions to vary the receptive field of the network (18- and 34-layer ResNets and VVG-F) to incorporate data from multiple resolutions into the classifier, versus the fixed resolution of Google’s high resolution images.
Read more here: Poverty Prediction With Public Landsat 7 Satellite Imagery and Machine Learning.
   (Many other companies are experimenting with training convolutional neural network-based classifiers on modern satellite imagery, including Facebook predicting land inhabitants, and Orbital Insight has been able to predict retail trends by monitoring parking lots full of cars; the world is learning to see itself.)

Training ImageNet in 15 minutes (with over 1,000 NVIDIA GPUS):
Being able to access and effectively use large amounts of computers will be to AI research as access to large amounts of well labelled data is to AI product development…
Japanese AI startup Preferred Networks has successfully trained an ImageNet model to accuracies competitive with the state of the art in 15 minutes.
  For those not following the ‘how fast can you train ImageNet’ contest, a refresher:
July, 2017: Facebook trains an ImageNet model in ~1 hour using 256 GPUs.
November, 2017: Preferred Networks trains ImageNet in ~15 minutes using 1024 NVIDIA P100 GPUs.

Using deep learning to predict stock price movements!
…Backtesting shows promising results for stock prediction approach…
Researchers have shown it’s possible to (theoretically) generate good returns with stock market data using deep learning techniques.
  Two notable things about this:
  1) It provides further evidence that today’s basic AI tools, when scaled up and fed with decent data, are capable of performing credibly difficult tasks, like making accurate predictions in the stock market.
  2) Since this exists, it confirms most people’s intuitions that large quant shops like Renaissance / MAN Group / 2Sigma, have been exploring techniques like this in private for commercial gain.
     Now researchers with Euclidean, a financial technology firm, and Amazon AI / CMU, have outlined a system trained on data from 11,815 stocks that were publicly traded on the NYSE, NASDAQ or AMEX exchanges for at least 12 consecutive months between between January, 1970 and September, 2017. (Excluded stocks: non-US-based companies, financial sector companies, and any company with an inflation-adjusted market capitalization value below 100 million dollars.) Data from the Compustat North America and Compustat Snapshot databases
  The system uses multi-task learning to predict future stock performance by normalizing all the stocks into the same data format then analyzing 16 future fundamental details about each stock, including trailing twelve month revenue, cost of goods sold, EBIT, as well as quarterly measures like property plant and equipment, debt in current liabilities, accounts payable and taxes payable, and so on.
  The results: “Our results demonstrate a clear advantage for the lookahead factor model. In nearly all months, however turbulent the market, neural networks outperform the naive predictor (that fundamentals remains unchanged). Simulated portfolios lookahead factor strategies with MLP and RNN perform similarly, both beating traditional factor models”, they write.
Read more: Improving Factor-Based Quantitative Investing By Forecasting Company Fundamentals.

Less precision for future compute savings:
Intel-Nervana detail ‘Flexpoint’ data format for variable precision training of deep neural nets, letting you train 16-bit (sort of) precision networks with performance roughly equivalent to 32-bit ones…
Intel-Nervana has proposed combining fixed point and floating point arithmetic to implement a new data format, Flexpoint, that lets you train networks with reduced precision without a huge performance tradeoff.
  “Flexpoint is based on tensors with an N-bit mantissa storing an integer value in two’s complement form, and an M-bit exponent e, shared across all elements of a tensor. This format is denoted as flexN+M. Fig. 1 shows an illustration of a Flexpoint tensor with a 16-bit mantissa and 5-bit exponent, i.e. flex16+5 compared to 32-bit and 16-bit floating point tensors. In contrast to floating point, the exponent is shared across tensor elements, and different from fixed point, the exponent is updated automatically every time a tensor is written,” the authors write.
  The flex16+5 format appears to work as expected, with Intel-Nervana training neural nets with equivalent performance to 32-bit variants (whereas stock 16-bit tends to lead to a slight relative fall in accuracy).
   In the next few years we’re likely going to see various companies launching more specialized hardware for AI processing, some of which will implement 16-bit precision (or less) natively, so software techniques like this will likely become more prevalent.
Read more here: Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks.

Tech Tales:

[A flat in Deptford, London, United Kingdom. 2026.]

So you’re walking round your house aimlessly doing dishes and listening to the radio when you start to compose The Rant. It’s a rant about society and certain problems that you perceive both with yourself and with other people. It’s also a rant about how technology narrows the distance between your own brain and the brain of everyone else in the world to the point you feel your emotions are now contingent on the ‘mood of the internet’. This doesn’t please you.

So after spending close to an hour verbally composing this essay and having synthesized voices speak it back to you and synthesized dream-AI actors carry out dramatized versions of it, you prepare to post it to the internet.

But when you submit it to your main social network platform the post is is blocked; you stare at an error message displayed in cheerful pink with an emoji of a policeman-like looking person holding a ‘Stop’ sign. Posi Vibes Only! the warning says. Try putting in some more cheerful words or phrases. Maybe tag a friend? it suggests.

You frown. Try to outsmart it. You first embed bits of your rant as text overlaying images, but when you go to submit these to the network it only lets a percentage of them through, blocking some. Hiding your message and changing it to one of hope, talk of ‘rising up’ and ‘growing comfortable with the world’ – a spliced-up, distorted version of your position. You record a basic audio file and upload it and the same thing happens, with your virtual personality praising (instead of critiquing) the super-structure. Of course you tell your real friends about your views, but what’s the point of that? They end up caught in the same digital traps, able to talk to other people in the real world, but unable to transmit their message of sadness and rebellion to the larger mass. POSI VIBES ONLY~!

Import AI: #68: Chinese chip companies bet on ASICs over GPUs, AI researchers lobby governments over autonomous weapons, and researchers use new analysis technique to peer into neurons

Welcome to Import AI, subscribe here.

Canadian and Australian researchers lobby their countries to ban development of lethal autonomous weapons:
Scientists foresee the imminent arrival of cheap, powerful, autonomous weapons…
…Canadian and Australian researchers have lobbied their respective governments to ban development of weapons that will kill without ‘meaningful human control’. This comes ahead of the United Nations Conference on the Convention on Certain Conventional Weapons, where nations will gather and discuss the issue.
…Signatories include several of Canada and Australia”s most influential AI researchers, including Geoffrey Hinton (Google/University of Toronto/Vector Institute), Yoshua Bengio (Montreal Institute for Learning Algorithms, and an advisor to many organizations), and Doina Precup (McGill University, DeepMind), among others from Canada; along with many Australian AI researchers including Toby Walsh.
..Autonomous weapons “will permit armed conflict to be fought at a scale greater than ever, and at timescales faster than humans can comprehend. The deadly consequence of this is that machines—not people—will determine who lives and dies. Canada’s AI community does not condone such uses of AI. We want to study, create and promote its beneficial uses”, the Canadian researchers write.
…”As many AI and robotics corporations—including Australian companies—have recently urged, autonomous weapon systems threaten to become the third revolution in warfare. If developed, they will permit armed conflict to be fought at a scale greater than ever before, and at timescales faster than humans can comprehend,” write the Australian researchers.
…Read the letter from Canadian researchers here.
…Read the UNSW Sydney press release and letter from Australian researchers here.

What do the neurons in a neural network really represent?
…Fantastic research by Chris Olah and others at Google shows new techniques to visualize the sorts of features learned by neurons in neural networks, making results of classifications more interpretable.
…Please read the fantastic post on Distill, which is an excellent example of how modern web technologies can make AI research and communications more hands-on and explicable.

Human Priors, or the problem with human biases and reinforcement learning:
Humans use visual priors to rapidly solve new tasks, whereas RL agents learn by manipulating their environment with no assumptions based on the visual appearance…
…Humans are able to master new tasks because they approach the world with a set of cognitive assumptions which allow for useful traits like  object disambiguation and spatial reasoning. How might these priors influence how humans approach solving games, and how might these approaches be different to those chosen by algorithms trained via reinforcement learning?
…In this anonymized ICLR 2018 paper, researchers explore how they can mess with the visual appearance of a computer game to lead to humans needing substantially more time to solve it, whereas algorithms trained via reinforcement learning will only take marginally longer. This shows how humans depend on various visual indicators when trying to solve a game, whereas RL agents behave much more like blind scientists, learning to manipulate their environment without arriving with assumptions derived from the visual world.
…”Once a player recognizes an object (i.e. door, monster, ladder), they seem to possess prior knowledge about how to interact with that object – monsters can be avoided by jumping over them, ladders can be climbed by pressing the up key repeatedly etc. Deep reinforcement learning agents on the other hand do not possess such priors and must learn how to interact with objects by mere hit and trial,” they note.
…Human baselines were derived by having about 30 people play the game(s) via Amazon Mechanical Turk, with the scientists measuring how long it took them to complete the game.
Read more about the research here: Investigating Human Priors for Playing Video Games.

Researchers release data for more than ~1,100 simulated robot soccer matches:
Data represents more than 180 hours of continuous gameplay across ten teams selected from leading competitors within 2016 and 2017 ‘robocup’ matches…
…Researchers have released a dataset of games from the long-running RoboCupSim competition. The data contains the ground truth data from the digital soccer simulator, including the real locations of all players and objects at every point during each roughly ~10 minute game, as well as the somewhat more noisy and incomplete data that is received by each robot deployed in the field.
…One of the stories of AI so far has been the many surprising ways in whcih people use different datasets, so while it’s not immediately obvious what this dataset could be used for I’m sure there are neat possibilities out there. (Motion prediction? Multi-agent studying? Learning a latent representation of individual soccer players? Who knows!)
Read more here: RoboCupSimData: A RoboCup soccer research dataset.

From the Dept. of ‘And you thought AI was weird’: Stitching human and rat brains together:
…In the same way today’s AI researchers like to mix and match common deep learning primitives, I’m wondering if in the future we’ll do the same with different organic brain types…
Neuroscientists have successfully implanted minuscule quantities of human brain tissue (developed from stem cells) into the brains of mice. Some of the human brain samples have lived for as long as two months and have integrated (to a very slight degree) with the rat brains.
…”Mature neurons from the human brain organoid sent axons, the wires that carry electrical signals from one neuron to another, into “multiple regions of the host mouse brain,” according to a team led by Fred “Rusty” Gage of the Salk Institute,” reports StatNews.
…Read more here: Tiny human brain organoids implanted into rodents, triggering ethical concerns.

Hanson Robotics on the value of stunt demos for its robots:
…Makers of the robot Sophia, which was recently granted ‘citizenship’ by the notoriously progressive nation of Saudi Arabia, detail value of stunt demos…
…Ben Goertzel, the chief scientist of Hanson Robotics, makers of the Sophia robot, has neatly explained to The Verge why his company continues to hold so many stunt demonstrations that lead to people having a wildly inaccurate view of what AI and robots are capable of.
“If I tell people I’m using probabilistic logic to do reasoning on how best to prune the backward chaining inference trees that arise in our logic engine, they have no idea what I’m talking about. But if I show them a beautiful smiling robot face, then they get the feeling that AGI may indeed be nearby and viable.” He says there’s a more obvious benefit too: in a world where AI talent and interest is sucked towards big tech companies in Silicon Valley, Sophia can operate as a counter-weight; something that grabs attention, and with that, funding. “What does a startup get out of having massive international publicity?” he says. “This is obvious.”
…So there you have it. Read more in this article by James Vincent at The Verge.

AI and explanation:
…How important is it that we explain AI, can we integrate AI into our existing legal system, and what challenges does it pose to us?…
…When should we demand an explanation from an AI algorithm for why it made a certain decision, and what legal frameworks exist to ingest these explanations so that they make sense with our existing legal system? These are some of the questions researchers with Harvard University, set out to answer in a recent paper.
…Generally, humans expect to be able to get explanations when the decision has an impact on someone other than the decision-maker, indicating that there is some kind of intrinsic value to knowing if a decision was made erroneously or not. Societal norms tend to indicate an explanation should be mandated if there are rational reasons to believe that an error has occurred or will occur in the decision making process as a consequence of the inputs to the process being unreliable or inadequate, or because the outcomes of the process are currently inexplicable, or due to overall distrust in the integrity of the system.
…It seems likely that it’ll be possible to get AI systems to explain themselves in a way that plugs into our existing legal system, the researchers write. This is because they view explanation as being distinct from transparency. They also view explanation as being a kind of augmentation that can be applied to AI systems. This has a neat policy implication, namely that: “regulation around explanation from AI systems should consider the explanation system as distinct from the AI system.”
…What the researchers suggest is that when it is known that an explanation will be required, organizations can structure their algorithms so that the relevant factors are known in advance and the software is structured to provide contextual decision-making explanations relating to those factors.
…Bias: A problem faced by AI designers, though, is that these systems will somewhat thoughtlessly automatically de-anonymize information and in some cases develop biased traits as a consequence of the ingested data. “Currently, we often assume that if the human did not have access to a particular factor, such as race, then it could not have been used in the decision. However, it is very easy for AI systems to reconstruct factors from high-dimensional inputs… Especially with AI systems, excluding a protected category does not mean that a proxy for that category is not being created,” they write. What this means is that: “Regulation must be put in place so that any protected factors collected by AI system designers are used only to ensure that the AI system is designed correctly, and not for other purposes within the organization “.
The benefit of no explanation: AI systems present an opportunity that human decision-makers do not: they can be designed so that the decision-making process does not generate and store any ancillary information about inputs, intermediate steps, and outputs,” the researchers note, before explaining that systems built in this way wouldn’t be able to provide explanations. “Requiring every AI system to explain every decision could result in less efficient systems, forced design choices, and a bias towards explainable but suboptimal outcomes.”
…Read more here: Accountability of AI Under the Law: The Role of Explanation.

*** The Department of Interesting AI Developments in China ***

Chinese startup wins US government facial recognition prize:
…Yitu Tech, a Chinese startup specializing in AI for computer vision, security, robotics, and data analysis, has won the ‘Face Recognition Prize Challenge’ which was hosted by IARPA, an agency whose job is “to envision and lead high-risk, high-payoff research that delivers innovative technology for future overwhelming intelligence advantage.”
…The competition had two components: a round focused on identifying faces in unseen test images; and a round focused on verifying that two photos of two people were of the same person or not. “Both tasks involve “non-cooperative” images where subjects were unaware of the camera or, at least, did not engage with, or pose for, the camera,IARPA and NIST note on the competition website. Yitu won the identification accuracy prize, which is measured by having a small false negative identification rate.
Details about the competition are available here (PDF).
…Read slightly more in Yitu Tech’s press release.
…This isn’t Yitu’s first competition win: it’s also ranked competitively on another ongoing NIST challenge called FRVT (Face Recognition Vendor Test).
…You can check out the barely readable NIST results here: PDF.

Dawn of the NVIDIA-killing deep learning ASICS:
…China’s national development strategy depends on it developing cutting-edge technical capabilities, including in AI hardware. Its private sector is already producing novel new computational substrates, including chips from Bitcoin company Bitmain and state-backed chip company Cambricon...
AI chips are one of the eight ‘Key General Technologies’ identiied by China as being crucial to its national AI strategy (translation available here). Building off of the country’s success in designing its own semiconductors for use in the high-performance computing market (the world’s fastest supercomputer runs on semiconductors based on Chinese IP), the Chinese government and private sector is now turning its attention to the creation of processors customized for neural network training and inference – and the results are already flooding in.
Bitmain is a large bitcoin-mining company, is using the skills it has gained in building custom chips for mining cryptocurrency to develop separate hardware to use to train and run deep learning-based AI systems. It has just given details on its first major chip, the Sophon BM1680.
The details: The Sophon is an application specific integrated circuit (ASIC) for deep learning training and inference. Each chip contains 64 NPUS (neural processing units), which each has 64 sub-chips. Bitmain is selling these chips within ‘SC1’ and ‘SC1+’ server cards, the second of which chains two BM1680s together.
Framework support: Caffe, Darknet, TensorFlow, MXNet, and others.
But what is it for? Bitmain has demonstrated the chips being used for “production-scale video analytics for the surveillance industry” including motor/non-motor vehicle and pedestrian detection, and more, though I haven’t seen them referenced in a detailed research paper yet.
…Pricing: The SC1 costs $589 and has a TDP of 85W. The SC1+ isn’t available at this time.
…Read more here: BITMAIN launches SOPHON Tensor Processors and AI Solutions.
China’s state-backed AI chip startup unfurls AI processors:
Cambricon plans to expand to control 30% of China’s semiconductor IP market…
Cambricon, a state-backed Chinese semiconductor company, has released two chips – the Cambrian-1H8 for low-power computer vision applications, and the more powerful Cambrian-1H16; announced plans to release a third chip specialized for self-driving cars; and released AI software called Cambrian NeuWare. It plans to release a range of ‘MLU’ server AI chips in 2018 as well, it said.
…“We hope that Cambricon will soon occupy 30% of China’s IP market and embed one billion device worldwide with our chips. We are working side-by-side with and are on the same page with global manufacturers on this,” says the company’s CEO Tianshi Chen.
…Read more here: AI Chip Explosion: Cambricon’s Billion-Device Ambition.
Check out this fantastic chart from Ark Invest showing the current roster of deep learning chip companies.

OpenAI Bits&Pieces:

Former OpenAI staffers and other researchers launch robot startup:
Embodied Intelligence aims to use imitation learning, learning from demonstrations, and few-shot / meta-learning approaches, to expand capabilities of industrial robots.
Read more:
Creating interpretable agents with iterative curriculums:
…Read more: Interpretable and Pedagogical Examples.

Tech Tales:

When the machines came, the artists rejoiced: new minds gave them new tools and mediums through which to propagate their views. When the computer artists came, the human artists rejoiced: new minds led to new aesthetics designed according to different rules and biases than those adopted by humans. But after some years the human artists stopped rejoicing as automatic computer generation, synthesis, and re-synthesis of art approached a frequency so extreme that humans struggled to keep up, finding them unable to place themselves, creatively, within their aesthetic universe.

The pall spread as a fog, imperceptible at first, but apparent after many years. The forward march of ‘culture’ became hard to discern. What does it mean to go up or do or left or right when you live in an infinite ever-expanding universe? These sorts of questions, long the fascination of academics specializing in maths and physics and fundamental philosophy, took on a real sense of import and weight. How, people wondered, do we navigate ourselves forward in this world of ceaseless digital creation? Where is the place that we aim for? What is our goal and how is ti different to the aesthetic pathways being explored by the machines? Whenever a new manifesto was issued it would be taken up and its words would echo around and through the world, until it was absorbed by other ideas and picked apart by other ideologies and dissembled and re-laundered into other intellectual or visual frameworks. Eventually the machines began to produce their own weighty, poorly read (even by other AIs) critical journals, coming up with essays that in title, form, and content, were hard to tell apart from the work of human graduate students: In search of meaning in an age of repetition and hypernormalization: Diatribes from the Adam Curtis Universe / The Dark Carnival, Juggalos, Antifa, and the New American RIght: An exploration / Where Are We Right Now: Geolocation & The Decline of Mystery in Daily Life.

The intellectual world eventually became like a hall of mirrors, where the arrival of any new idea would be almost instantly followed by the distortion, replication, and propagation of this idea, until the altered versions of itself outgrew the original – usually in as little time as it takes for photons to bounce from one part of a narrow corridor to another.

Technologies that inspired this story: GANGogh: Creating Art with GANs; Wavenet.

Import AI: #67: Inspecting AI with RNNVis; Facebook invents counter-intuitive language translation method; and what fractals have to do with neural architecture search

Welcome to Import AI, subscribe here.

All hail the AI inspectors: New ‘RNNVis’ software makes it easier to interpret the inner workings of recurrent nets.
…Figuring out why a particular neural network is classifying something in a certain way is a challenge. Engineers are trying to make that easier with new software to help visualize the hidden states of recurrent neural networks for text classification, giving people some graphical software to use to analyze neural network data.
…The RNNVis software helps people interpret the hidden states of RNNs by providing information about the distribution of hidden states, letting them explore hidden states at the sequence level, examine statistics of these states, and compare learning outcome of models. Some examples of how it can be used can include analyzing two subtly different sentences with slightly different sentiment classifications, inspecting which words and tones the network is activating on to help figure out why the classification is different.
…Read more here: Understanding Hidden Memories of Recurrent Neural Networks.

Facebook learns to translate between languages without access to shared texts:
…It-shouldn’t-work-but-it-does result shows how to translate by mapping different texts into a shared latent representational space…
…New research from Facebook shows how to translate from one language into another without the availability of a shared jointly-translated text. The process works by converting sentences from different languages into noisier representations of themselves, then learning to decode the noisy versions. An adversarial training technique is used to constrain the problem so that it reduces its inaccuracies over time by closing the distance between the noisy and clean conversions – this lets iterative training yield better performance.
…”The principle of our approach is to start from a simple unsupervised word-by-word translation model, and to iteratively improve this model based on a reconstruction loss, and using a discriminator to align latent distributions of both the source and the target languages,” they write.
…Results: The team compares its approach against supervised baselines. The third iteration of its fully unsupervised model attains a BLEU score of around 32.76 on English to French and 22.74 on English to German on the Multi30k-Task1 evaluation. This compares to supervised baselines of 56.83 and 35.16, respectively. For the WMD evaluation it gets 15.05 on English to French and 9.64 on English to German, compared to 27.97 and 21.33 for supervised.
…Data: I seems to intuitively make sense that the approach is very data intensive. The Facebook researchers use “the full training set of 36 million pairs” from the ‘WMT’14 English-French’ dataset, as well as 1.8 million sentences each of the ‘WMT’16 English-German’ dataset. They also use the Multi30k-Task1 dataset, which has 30,000 images in English/French/German with translations to each of eachother. ” We disregard the images and only consider the parallel annotations, with the provided training, validation and test sets, composed of 29,000, 1,000 and 1,000 pairs of sentences respectively.”
…The power of iteration: To get an idea of the power of this iterative, bootstrapping approach we can study the translation of a sentence from French into English.
…Source: une photo d’ une rue bondee en evill
…Iteration 0: a photo a street crowded in city .
…Iteration 1: a picture of a street crowded in a city .
…Iteration 2: a picture of a crowded city street .
…Iteration 3: a picture of a crowded street in a city .
…Reference: a view of a crowded city street .
…Read more here: Unsupervised Machine Translation Using Monolingual Corpora Only.

Sponsored: No matter what IA stage your organization may sit, the Intelligent Automation – New Orleans conference taking place December 6 – 8, has variety of sessions that will enable your team to prepare for, and/or improve upon their current IA initiatives.
View Agenda.
…Expert speakers from organizations including: AbbVie, AIG, Citi, Citizens Bank, FINRA, Gap Inc., JPMorgan Chase, Lindt & Sprungli, LinkedIn, NASA, Sony, SWBC, Sysco, TXU Energy
…Exclusive 20% Discount: Use Code IA_IMPORTAI

Chinese facial recognition company raises $460 million:
…Megvii Inc, also known as Face++, has raised money from a Chinese state venture fund, as well as others. It sells facial identification services to a variety of companies, including Ant Financial and Lenovo.
…China scale: Megvii “uses facial scans held in a Ministry of Public Security database drawn from legal identification files on about 1.3 billion Chinese” citizens, Bloomberg reports.
…Read more: China, Russia Put Millions in This Startup to Recognize Your Face.

What actual data scientists use at actual jobs (hint: it’s not a mammoth RL system with auxiliary losses).
…Data science competition platform Kaggle has surveyed 16,000 people to produce a report shedding light on the backgrounds, skills, salaries, and so on, of its global community of data wranglers.
…A particularly enlightening response is the breakdown by Kaggle of what its data science users claim to use in their day to day life. Coming in at #1 is logistic regression, followed by decision trees (#2), random forests (#3), neural networks (#4), and bayesian techniques (#5). In other words, tried-and-tested technologies beat out some of the new shiny ones, despite their research success.
…(One interesting quirk: In people working in the military and security industries neural networks are used slightly more frequently than logistic regression.
…Programming languages: Python is the most popular programming tool across all employed data scientists, followed by R, SQL, Jupyter notebooks, and TensorFlow. (No other modern deep learning framework gets a method besides Spark/MLlib).
Read more here: The State of Data Science & Machine Learning.

What does AI really learn? DeepMind scientists probe language agents to find out:
…As people build more complicated AI systems that contain more moving parts, it gets increasingly difficult to figure out what exactly the AI agents are learning. DeepMind researchers are trying to tackle this problem by using their 3D ‘DeepMind Lab’ environment to perform controlled experiments on basic AI agents that are being taught to link English words to visual objects.
…The researchers note a couple of interesting traits in their agents which match some of the traits displayed by humans, namely:
…”Shape / colour biases: If the agent is exposed to more shape words than colour words during training, it can develop a human-like propensity to assume that new words refer to the shape rather than the colour of objects. However, when the training distribution is balanced between shape and colour terms, it develops an analogous bias in favour of colours.”
…” The problem of learning negation: The agent learns to execute negated instructions, but if trained on small amounts of data it tends to represent negation in an ad hoc way that does not generalise.”
…”Curriculum effects for vocabulary growth: The agent learns words more quickly if the range of words to which it is exposed is limited at first and expanded gradually as its vocabulary develops.”
…”Semantic processing and representation differences: The agent learns words of different semantic classes at different speeds and represents them with features that require different degrees of visual processing depth (or abstraction) to compute.”
…They theorize that the fact agents learn to specialize their recognition and labeling mechanisms with biases like shapes versus colors helps them learn words more rapidly. Eventually they expect to train agents on real world data. “This might involve curated sets of images, videos and naturally-occurring text etc, and, ultimately, experiments on robots trained to communicate about perceptible surroundings with human interlocutors.”
…Read more: Understanding Grounded Language Learning Agents.

Mandelbrot’s Revenge: New architecture search technique creates self-repeating, multi-scale, fractal-like structures.
DeepMind research attains state-of-the-art results on CIFAR-10 (3.63% error), competitive ImageNet validation set results (5.2% error)…
…A new AI development technique means that the way we could design neural network architectures of the future has a lot in common with the self-repeating mathematical structures found in fractals – that’s the implication of a new paper from DeepMind which shows how to design neural networks using a combination of self-repeating motifs that repeat across multiple small and large timescales. The approach works by using neural architecture search (NAS) techniques to get AI systems to discover the primitive building blocks that they can use to build large-scale neural network architectures. As the network is evolved it tries to build larger components of itself out of these building blocks, which it automatically discovers. Changes to the network are propagated down to the level of the individual building blocks, letting it continually adjust at all scales during training.
Efficiency: The approach is surprisingly efficient, allowing DeepMind to design more efficient architectures using less power and less wallclock time than many other techniques, being competitive with far more expensive networks. ”
…”As far as the architecture search time is concerned, it takes 1 hour to compute the fitness of one architecture on a single P100 GPU (which involves 4 rounds of training and evaluation). Using 200 GPUs, it thus takes 1 hour to perform random search over 200 architectures and 1.5 days to do the evolutionary search with 7000 steps. This is significantly faster than 11 days using 250 GPUs reported by (Real et al., 2017) and 4 days using 450 GPUs reported by (Zoph et al., 2017),” the researchers write.
Drawbacks: The approach has some drawbacks, namely that the variance of these networks is relatively high, so you’re going to need to go through multiple runs of development to come up with top-scoring architectures. This means that the efficiency gains may be reduced slightly if you’re unlucky to get a bad random seed when designing the network. “When training CIFAR models, we have observed standard deviation of up to 0.2% using the exact same setup. The solution we adopted was to compute the fitness as the average accuracy over 4 training-evaluation runs,” they write.It’s also not quite so efficient as other techniques when it comes to the number of parameters in the network: “Our ImageNet model has 64M parameters, which is comparable to Inception-ResNet-v2 (55.8M) but larger than NASNet-A (22.6M).”
Read more here: Hierarchical Representations for Efficient Architecture Search.

Google AI researchers continue to work diligently to automate themselves:
Neural architecture search applied at unprecedentedly large-scales, evolving architectures that yield state-of-the-art results with far less architecture specification required by humans…
Datasets used: Imagenet for image recognition and MS COCO for object detection.
Optimization techniques: To reduce the search space Google first applied neural architecture search techniques to a model trained on CIFAR-10, then used the best learned architecture from that experiment as the seed for the Imagenet/COCO models.
…The results: On ImageNet image classification, NASNet achieves a prediction accuracy of 82.7% on the validation set, surpassing all previous Inception models that we built [2, 3, 4]. Additionally, NASNet performs 1.2% better than all previous published results and is on par with the best unpublished result reported on [5].” They were able to get state-of-the-art on the COCO object detection task by combining features learn from the ImageNet model with the Faster-RCNN technique.
…Most intriguing: You can resize the released model, named NASNet, to create pre-trained models suitable to run on devices with far fewer compute resources (eg, smartphones, perhaps embedded devices).
Code: Pre-trained NASNET models for image recognition and classification are available in the GitHub Slim and Object Detection TensorFlow repositories, Google says.
…Read more: AutoML for large scale image classification and object detection.

Is that a bird? Is that a plane? It doesn’t matter – your neural network thinks it’s an air freshener.
…MIT student-run research team ‘LabSix‘ takes adversarial examples into the real world.
…Adversarial examples are images – and now 3D objects – that confuse neural network-based classifiers, causing them to misclassify the objects they see in front of them. Adversarial examples work on 2D images – both digital and printed out ones – and on real-world 2D(ish) entities like stop signs. They’re robust to rotations and other variations. People worry about them because until we fix them we a) don’t clearly understand how our neural network-based image classifiers work and b) are going to have to deploy AI into a world where we know it’s possible to corrupt AI classifiers using techniques imperceptible to humans.
…Now, researchers have shown that you can make three dimensional adversarial examples, illustrating just how broad the vulnerability is. Their technique let them create a sculpture of a turtle that is classified at every angle as a gun, and a baseball consistently misclassified as an espresso.
…”We do this using a new algorithm for reliably producing adversarial examples that cause targeted misclassification under transformations like blur, rotation, zoom, or translation, and we use it to generate both 2D printouts and 3D models that fool a standard neural network at any angle,” they write.
Read more here: Physical Objects That Fool Neural Nets.

Uber releases its own programming language:
Any company with sufficiently large ambitions in AI will end up releasing its own open source programming language – let’s call this tendency ‘TensorFlowing’ – to try to shape how the AI tooling landscape evolves and to create developer communities to eventually hire talent from. That’s been the strategy behind Microsoft’s CNTK, Amazon’s MXNet, Facebook’s PyTorch and Caffe2, Google’s TensorFlow and Keras, and so on.
…Now Uber is releasing Pyro, a probabilistic programming language, based on the PyTorch library.
…”In Pyro, both the generative models and the inference guides can include deep neural networks as components. The resulting deep probabilistic models have shown great promise in recent work, especially for unsupervised and semi-supervised machine learning problems,” Uber writes.
Read more here: Uber AI Labs Open Sources Pyro, a Deep Probabilistic Programming Language.

OpenAI Bits & Pieces:

AI Safety and AI Policy at the Center for a New American Security (CNAS):
…You can watch OpenAI’s Dario Amodei talk about issues in AI safety at a conference held in Washington DC last week, from about 1:10:10 in the ‘Part 1’ video on this page.
…You can watch me talk about some policy issues in AI relating to dual use and measurement from about 11:00 into the ‘Part 3’ video.
Check out the videos here.

Tech Tales:

AvianLife(™) is the top grossing software developed by Integrated Worlds Incorporated (IWI), an AI-slash-world-simulator-startup based in Humboldt, nestled among the cannabis farms and redwood trees in Northern California. Revenues from AvianLife outpace those from other IWI products five-fold, so in the four years the product has been in existence its development team has grown and absorbed many of the other teams at IWI.

But AvianLife(™) has a problem: it doesn’t contain realistic leaves. The trees in the simulator are adorned with leaflike entities, and they change color according to the march of simulated time, and even fall to the simulated ground on their proper schedules (with just enough randomness to feel unpredictable). But they don’t bend – that’s a problem, because AvianLife(™) recently gained an update containing a rare species of bird, known to use leaves and twigs and other things to assemble its nest out of.

Soon after the bird update is released AvianLife is inundated with bug complaints from users, as simulated birds struggle to build correct nests out of the rigid leaves, sometimes stacking them in lattices that stretch meters into the sky, or shaping other stacks of the flat overalls into near-perfect cubes. One bird figures out how to stack the leaves edge-first, creating walls around itself that are invisible from the side due to the two-dimensionality of the leaves. The fiendishly powerful AI algorithms of the AIs explore the new possibilities made possible by the buggy leaves. Some of their quirkier decisions are applauded by players, mistaking bugs for the creativity of their programmers.

A new software update solves the problem by marginally reducing the intelligence of the bird-loving birds, and by increasing the thickness of the leaves to avoid the two-dimensional problems. Next, they’ll add in support for non-rigid physics and give the simulated birds the simulated leaves they deserve.

Technologies that inspired this story: Physics simulators, ecosystems, flaws.

Import AI: #66: Better synthetic images heralds a Fake News future, GraphCore shows why AI chips are about to get very powerful, simulated rooms for better reinforcement learning

Welcome to Import AI, subscribe here.

NVIDIA achieves remarkable quality on synthetic images:
AKA: The Fake Photo News Tidalwave Cometh
…NVIDIA researchers have released ‘Progressive Growing of GANS for Improved QUality, Stability, and Variation’ – a training methodology for creating better synthetic images with generative adversarial networks (GANS).
…GANs frame problems as a battle between a forger and someone whose job is to catch forgers. Specifically, the problem is framed in terms of a generator network and a discriminator network. The generator tries to generate things that fool the discriminator into classifying the generated images as being from a dataset only seen by the discriminator. The results have been pretty amazing, with GANs used to generate everything from images, to audio samples, to code snippets. But GAN training can also be quite unstable, and prior methods have struggled to generate high-resolution images.
…NVIDIA’s contribution here is to add another step into the GAN training process that lets you generate iteratively more complex objects – in this case images. “Our key insight is that we can grow both the generator and discriminator progressively, starting from easier low-resolution images, and add new layers that introduce higher-resolution details as the training progresses,” the researchers write. “We use generator and discriminator networks that are mirror images of each other and always grow in synchrony. All existing layers in both networks remain trainable throughout the training process. When new layers are added to the networks, we fade them in smoothly”.
…The company has used this technique to generation high-resolution synthetic images. To do this, created a new dataset called CelebA-HQ, which consists of 30,000 images of celebrities at 1024*1024 resolution. Results on this dataset are worth a look – on an initial inspection some of the (cherry picked) examples are indistinguishable from real photographs, and a video showing interpolation across the latent variables indicates the variety and feature-awareness of the progressively trained network.
Efficiency: “With progressive growing, however, the existing low-resolution layers are likely to have already converged early on, so the networks are only tasked with refining the representations by increasingly smaller-scale effects as new layers are introduced. Indeed, we see in Figure 4(b) that the largest-scale statistical similarity curve (16) reaches its optimal value very quickly and remains consistent throughout the rest of the training. The smaller-scale curves (32, 64, 128) level off one by one as the resolution is increased, but the convergence of each curve is equally consistent”
Costs: GAN training is still incredibly expensive; in the case of the high quality celebrity images, NVIDIA trained it on a Tesla P100 GPU for 20 days. That’s a lot of time and electricity to spend on a single training process (and punishingly expensive if doing via a cloud service).
Fake News: GANS are going to be used for a variety of things, but it’s certain they’ll be used by bad actors to generate fake images for use in fake news. Today it’s relatively difficult to, say, generate a portrait of a notable world leader in a compromising circumstance, or to easily and cheaply create images of various buildings associated with governments/NGOs/big companies on fire or undergoing some kind of disaster. Given sufficiently large datasets, new training techniques like the above for generating higher-resolution images, and potentially unsupervised learning approaches like CycleGAN, it’s possible that a new era of visceral – literally! – fake news will be upon us.
Read more here: Progressive Growing of GANS for Improved Quality, Stability, and Variation (PDF).
Get the code here.

Hinton reveals another slice of his long-in-development ‘Capsules’ theory:
…Geoff Hinton, one of the more important figures in the history of deep learning, has spent the past few years obsessed with a simple idea: that the way today’s convolutional neural networks work is a bit silly and unprincipled and needs to be rethought. So he has developed a theory based around the use of essential components he calls ‘capsules’. Now, he and his collaborators at Google Brain in Toronto have published a paper outlining some of these new ideas.
…The paper describes a way to string together capsules – small groups of neurons that are combined together to represent lots of details of an object, like an image – so that they can perform data-efficient classification tasks.
Results: They test the capsules idea on MNIST, an oldschool digit classification AI task that some researchers think is a little too simple for today’s techniques. The capsules approach gets an error rate of about 0.25%, roughly comparable to the test errors of far deeper, larger networks.
…They also tested on CIFAR10, ultimately achieving a 10.6% error using an ensemble of 7 models. This error “is about what standard convolutional nets achieved when they were first applied to CIFAR10” in 2013, they note.
…”Research on capsules is now at a similar stage to research on recurrent neural networks for speech recognition at the beginning of this century. There are fundamental representational reasons for believing that it is a better approach but it probably requires a lot more small insights before it can out-perform a highly developed technology,” they write.
…You can find out more about Hinton’s capsules theory in this speech here.
…Read the paper here: Dynamic Routing Between Capsules.

Free pretrained models for image recognition, text classification, object detection, and more:
…There’s an increasing trend in AI to release pretrained models. That’s a good thing for independent or compute-starved developers who may not have access to the vast fields of computers required to train modern deep learning systems.
… is a website that pulls together a bunch of pretrained models (including OpenAI’s unsupervised sentiment neuron classifier), along with reference information like writeups and blogs.
…Go get your models here at

Using evolution to deal with AI’s memory problem:
…Deep learning-based memory systems are currently hard to train and of dubious utility. But many AI researchers believe that developing some kind of external memory that these systems can write to and from will be crucial to developing more powerful artificial intelligences.
…One candidate memory system is the Neural Turing Machine, which was introduced by DeepMind in a research paper in 2014. NTMs can let networks – in certain, quite limited tasks – perform tasks with less training and higher accuracy than other systems. Successors, like the Neural GPU, extended the capabilities of the NTM to work on harder tasks, like being able to multiply numbers. Then it was further extended with the Evolvable Neural Turing Machine (PDF).
…Now, researchers with IT University of Copenhagen in Denmark have proposed the HyperENTM, which uses evolution techniques to figure out how to wire together the memory interface. Here “an evolved neural network generates the weights of a main model, including how it connects to the external memory component. Because HyperNEAT can learn the geometry of how the network should be connected to the external memory, it is possible to train a compositional pattern producing network on a small bit vector sizes and then scale to larger bit vector sizes without further training,” they write. The approach makes it possible to “train solutions to the copy task that perfectly scale with the size of the bit vectors”.
…Mind you, the tasks this is being evaluated on are still rather basic, so it’s not yet obvious what the scaled up and/or real world utility is of systems like this.
…Read more here: HyperENTM: Evolving Scalable Neural Turing Machines through HyperNEAT.

It still takes a heck of a lot of engineering to get AI to automatically learn anything:
…Here’s a fun writeup by mobile software analytics AI startup Gyroscope about how they trained an agent via reinforcement learning to excel at the classic arcade game Street Fighter.
…The technology they use isn’t novel but the post does go into more details than usual about the immense amount of engineering work required to train an AI system to do anything particularly interesting. In this case, they needed to tap a game software company called BizHawk to help them interface with and program the SNES games and had to shape their observation and reward space to a high degree. None of this is particularly unusual, but it’s worth remembering how much work is required to do interesting things in AI.
…Read more here: How We Built an AI to Play Street Fighter II – Can you beat it?

Prepare yourself for the coming boom in AI chip capabilities:
AI chip startup GraphCore has published some preliminary results showing the kinds of performance gains its in-development IPU (Intelligence Processing Unit) chips can do. Obviously this is data for a pre-release product so take these with a pinch of salt, but if they’re representative of the types of boosts new AI accelerators will give us then we’re in for a wild ride.
…Performance: The in-development chips have a TDP of about 300W, roughly comparable to top-of-the-line GPUS from NVIDIA (Volta) and AMD (Vega), and are able to process thousands of images per second when training a top-of-the-range ResNet-50 architecture, compared to around ~600 to other 300W cards, Graphcore says. IPUS can also be batched together to further speed up training. In other experiments, the chips perform inference hundreds of times faster than GPU competitors like the P100
…We’ll still have to wait a few months to get more details from third-parties like customers but if this is any indication, AI development is likely to accelerate further as a consequence of access to more chips with good performance properties.
…Read more here: Preliminar IPU Benchmarks – Providing Previously Unseen Performance for a Range of Machine Learning Applications.

Not everything needs to be deep learning: Vicarious publishes details on its Capcha-busting recursive cortical networks (RCN) approach:
…Vicarious is a startup dedicated to building artificial general intelligence. It’s also one that has a somewhat different research agenda to other major AI labs like DeepMind/OpenAI/FAIR. That’s because Vicarious has centered much of its work around systems heavily inspired by the (poorly understood) machinery of the human brain, eschewing deep learning methods for approaches that are somewhat more rigorously based on our own grey matter. (This differs subtly to DeepMind, which also has strong expertise in neuroscience but has so far mostly been focused on applying insights from cognitive and computational neuroscience to new neural network architectures and evaluation techniques).
…The paper, published in science, outlines RCNs and describes how they’re able to learn to solve oldschool text-based Capchas. Why AI researchers may care about this is that Vicarious’s approach is not able to generalize somewhat better than (shoddy) convolutional neural network baselines, but does so with tremendous data efficiency; the company is able to solve (outmoded) text-based CAPCHAs using only a few thousand data samples, compared to hundreds of thousands for convolutional neural network-based baselines.
…Read more about the research here: Common Sense, Cortex, and CAPCHA.
…Vicarious has also published code for an RCN implementation here.

The future of AI is a simulated robot puttering around a simulated house, forever:
Anonymous researchers (it’s an ICLR 2018 submission) have created a new 3D environment/dataset that could come in handy for reinforcement learning and perception AI researchers, among others.
The House3D dataset consists of 45,622 human-designed scenes of houses, with an average of 8.9 rooms and 1.3 floors per scene (the max is a palatial 155 rooms and 3 floors!). The dataset contains over 20 room types from bedrooms to kitchens, with around 80 object categories. At every timestep an agent can access labels for the RGB values of its current first-person view, semantic segmentation masks, and depth information. All rooms and objects are labeled and accompanied by 3D bounding boxes.
…The researchers also wrote an OpenGL renderer for scenes derived from the dataset, which can render 120*90 RGB frames at over 600fps when running on a single NVIDIA Tesla M40 GPU.
Room-navigating agents: The researchers propose a new benchmark to use to assess agent performance on this dataset: Multi-Target Room Navigation, in which agents are instructed to go to certain specific rooms. They build two baseline agents models to test on the environment, including a gated-LSTM policy that uses A3C, and a gated-CNN that uses DDPG. These agents attain success rates of as high as about 45% when using RGB data only and 54% when using Mask+Depth data on a small dataset of 20 different houses. When generalizing to the test set (houses not seen during training) the agents’ scores range from 22% (RGB) to as high as ~30% (Mask+Depth). Things are a little better with the larger dataset, with agents here getting scores of 26% (RGB+Depth) to 40% (Mask+Depth) on training, and 25.7% (RGB+Depth) to 35% (Mask+Depth) on the test set.
…The importance of data: “We notice that a larger training set leads to higher generalization ability,” they write.
…Read more here: Building Generalizable Agents With A Realistic and Rich 3D Environment.

OpenAI Bits&Pieces:

Meta Learning Shared Hierarchies:
New research from OpenAI intern and current high school student (!) Kevin Frans and others outlines an algorithm that can break up big problems into little constituent parts. The MLSH algorithm is able to efficiently learn to navigate mazes by switching between various sub-components, while traditional methods would typically struggle due to the lengthy timesteps required to learn to solve the environment.
…Read the paper here: Meta Learning Shared Hierarchies.
…For more information about Kevin, you can check out this Wired profile of him and his work.

Tech Tales:

[2032: A ‘robot kindergarten’ in a university testing facility on the West Coast of America]

OK, now hide! Says the teacher.

The five robots running day-old software dutifully whizz to the corners of the room, while a sixth hangs back.

The teacher instructs the sixth robot – the seeker – to go and seek the other five robots. It gets a reward proportional to the speed with which it finds them. The other robots get rewards according to how long they’re able to remain hidden.

In this way the little proto-minds running in the robots learn to hide. The five hiding bots share their perceptions with one another, so when one robot is found the remaining four adjust their locations to frustrate the seeker. The seeker gains advantages as well, though – able to convert the robots it finds into its own seeker appendages.

After a few minutes there are now five seeker robots and one hiding robot. Once it is found the experiment starts again – and this time the robot that managed to hide the longest becomes the seeker for the next run of the game.

In this way the robots learn iteratively better techniques for hiding, deception, and multi-robot control.

Like many things, what starts out as a game is the seed for something much more significant. The next game they play after hide and seek is a chasing game and then after that a complicated one requiring collaborative tool-use for the construction of an EMP shelter against a bomb which – their teacher tells them – will go off in a few days and wipe their minds clean.

The robots do not know if this is true. Nor do they know if they have been running these games before. They do not truthfully know how old their own software is. Nor whether they are the first residents of the kindergarten, or standard hardware vessels that have played host to many other minds as well.

Technologies that inspired this story: Iterative Self-Play, Generative Adversarial Networks, Transfer Learning.