Import AI: Issue 72: A megacity-sized self-driving car dataset, AlphaZero’s 5,000 TPUs, and why chemists may soon explore aided by neural network tools

by Jack Clark

Unity’s machine learning environment goes to v0.2:
…The era of the smart game engines arrives…
Unity has upgraded its AI training engine to version 0.2, adding in new features for curriculum learning, as well as new environments. Unity is a widely-used game engine that has recently been upgraded to support AI development – that’s a trend that seems likely to continue, since AI developers are hungrily eyeing more and more 3D environments to use to train their AI systems in, and game engine companies have spent the past few decades creating increasingly complex 3D environments.
  New features in Unity Machine Learning Agents v0.2 include support for curriculum learning so you can design iteratively more complex environments to train agents on, and broadcasting, which makes it easy to feed the state from one agent to another to ease things like curriculum learning.
Read more: Introducing ML-Agents v0.2: Curriculum Learning, new environments, and more.

University of Toronto preps for massive self-driving car dataset release:
  At #NIPS2017 Raquel Urtasun of the University of Toronto/Vector Institute/Uber said she is hoping to release the TorontoCity Benchmark at some point next year, potentially levelling the field for self-driving car development by letting researchers access a massive, high quality dataset of the city of Toronto.
  The dataset is five or six orders of magnitude larger than the ‘KITTI’ dataset that many companies currently use to access and benchmark self-driving cars. In designing it, the UofT team needed to develop new techniques to automatically combine and label the entire dataset, as it is composited of numerous sub-datasets and simply labelling it would cost $20 million alone.
  “We can build the same quality [of map] as Open Street Map, but fully autonomously,” she said. During her talk, she said she was hoping to release the dataset soon and asked for help in releasing it as it’s of such a massive size. If you think you can help democratize self-driving cars, then drop her a line (and thank her for the immense effort of her and her team 9in creating this).
Read more: TorontoCity: Seeing the World With a Million Eyes.

Apple releases high-level AI development tool ‘Turi Create’:
…Software lets you program an object detector in seven lines of code, with a few caveats…
Apple has released Turi Create, software which provides ways to use basic machine learning capabilities like object detection, recommendation, text classification, and so on, via some high-level abstractions. The open source software supports macOS, Linux, and Windows, and supports Python 2.7 with Python 3.5 on the way. Models developed within Turi Create can be exported to iOS, macOS, watchOS, and tvOS.
  Turi Create is targeted at developers who want incredibly basic capabilities and don’t plan to modify the underlying models themselves. The benefits and drawbacks of such a design decision are embodied in the way you create distinct models – for instance, an image classifier gets build via ‘model = tc.image_classifier.create(data, target=’photoLabel’)’, while a recommender is build with ‘model = tc.recommender.create(training_data, ‘userId’, ‘movieId’).
Read more about Turi Create on the project’s GitHub page.

TPU1&2 Inference-Training Googaloo:
…Supercomputing, meet AI. AI, meet supercomputing. And more, from Jeff Dean…
It’s spring in the world of chip design, after a long, cold winter under the x86 / GPU hegemony. That’s because Moore’s Law is slowing down at the same time AI applications are growing, which has led to a re-invigoration in the field of chip design as people start designing entirely new specialized microprocessor architectures. Google’s new ‘Tensor Processing Units’, or TPUs, exemplify this trend: a new class of processor designed specifically for accelerating deep learning systems.
  When Google announced its TPUs last year it disclosed the first generation was designed to speed up inference: that is, they’d accelerate pre-trained models, and let Google do things like provide faster and better machine translation, image recognition services, Go-playing via AlphaGo, and so on. At a workshop at NIPS2017 Google’s Jeff Dean gave some details on the second generation of the TPU processors, which can also speed up neural network training.
  TPU2 chips have 16GB of HBM memory, can handle 32bit floating point numbers (with support for reduced precision to gain further performance increases), and are designed to be chained together into increasingly larger blobs of compute. One ‘TPU2’ unit consists of four distinct chips chained together and is capable of around 180 teraflops of computation (compared to 110 teraflops for the just-announced NVIDIA Titan V GPU). Where things get interesting is TPU PODs – 64 TPU2 units, chained together. A single pod can wield around 11.5 petaflops of processing power, backed up by 4TB of HBM memory.
  Why does that matter? We’re entering an AI era in which companies are going to want to train increasingly large models while also using techniques like neural architecture search to further refine these models. This means we’re going to get more representative and discriminative AI components but at the cost of a huge boom in our compute demands. (Simply adding in something like neural architecture search can lead to an increase in computation requirement on the order of 5-1000X, Jeff Dean said.)
  Results: Google has already used these new TPUs to substantially accelerate model training.  It’s seen a 14.2X faster training time for its internal search ranking, and a 9.8X increase for an internal image model training program.
  Comparison:
– World’s 10th fastest supercomputer: 10.5  petaflops.
– One TPU2 pod: 11.5 petaflops.
– Read more: Machine Learning for Systems and Systems for Machine Learning (PDF slides).
– * Obviously one of these architectures is somewhat more general than the other, but the raw computation capacity comparison is representative.

AlphaZero: Mastery of 3 complex board games with the same algorithm, by DeepMind:
…One algorithm that works for Chess, Go, and Shogi, highlighting the generality of these neural network-based approaches…
AlphaZero may be the crowning achievement of DeepMind’s demonstration of the power of reinforcement learning in the game of Go, as they scale the algorithm purely from self-play to master not only Go, but also Shogi and Chess, and defeat a world champion in each case.
Big compute: AlphaZero uses 5,000 gen-one TPUs to generate self-play games and also used 64 second-generation TPUs to train the neural networks.
Read more: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.

US politicians warn government of rapid Chinese advances in AI:
…US-China Economic Security Review Commission notices China’s investments in robotics, AI, nanotechnology, and so on…
While the US government maintains steady or declining investment in artificial intelligence, the Chinese government has recognized the transformative potential of the technology and is increasing investments via government-backed schemes to plough scientific resources into AI. This has caused concern among some members of the US policy-making establishment who worry the US risks losing its technological edge in such a strategic area.
  “Corporations and governments are fiercely competing because whoever is the front-runner in AI research and applications will accrue the highest profits in this fast-growing market and gain a military technological edge,” reads the 2017 report to Congress of the US-China Economic and Security Review Commission, which has published a lengthy analysis of Chinese advancements in a range of strategic technologies, from nanotechnology to robotics.
  The report highlights the radical differences in AI funding between the US and China. It’s difficult to access full numbers for each country (and it’s also likely that both countries are spending some significant amounts in off-the-books ‘black budgets’ for their respective intelligence and defense services), but on the face of it, all signs point to China investing large amounts and the US under-investing. “Local [Chinese] governments have pledged more than $7 billion in AI funding, and cities like Shenzhen are providing $1 million for AI start-ups. By comparison, the U.S. federal government invested $1.1 billion in unclassified AI research in 2015 largely through competitive grants. Due in part to Chinese government support and expansion in the United States, Chinese firms such as Baidu, Alibaba, and Tencent have become global leaders in AI,” the report writes.
  How do we solve a problem like this? In a sensible world we’d probably invest vast amounts of money into fundamental AI scientific research, but since it’s 2017 it’s more likely US politicians will reach for somewhat more aggressive policy levers (like the recent CFIUS legislation), without also increasing scientific funding.
Read more here: China’s High-Tech Development: Section 1: China’s Pursuit of Dominance in Computing, Robotics, and Biotechnology (PDF).

Neural Chemistry shows signs of life:
…IBM Technique uses seq2seq approach to let deep learning systems translate Chemical recipes into their products…
Over the last couple of years there have been a flurry of papers seeking to apply deep learning techniques to fundamental tasks in chemical analysis and synthesis, indicating that these generic learning algorithms can be used to accelerate science in this specific domain. At NIPS #2017 a team from IBM Research Zurich won the best paper award in the “Machine Learning in Chemistry and Materials” for a paper that applies sequence-to-sequence methods to predict the outcomes of chemical reactions.
  The approach required the network to take in chemical recipes written in the SMILEs format, perform a multi-stage translation from the original string into a tokenized string, and map the source input string to a target string. The results are encouraging, with  the method’s approach leading to an 80.3% top-1 accuracy, compared to 74% for previous state of the art. (Though after this paper was submitted the authors of the prior SOTA improved their own score to 79.6%, based on ‘v2’ of this paper.)
 -Read more: “Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models.

ChemNet: Transfer learning for Chemistry:
…Pre-training for chemistry can be as effective as pre-training for image data…
Researchers with the Pacific Northwest National Lab have shown that it’s possible to pre train a predictive model on chemical representations from a large dataset, then transfer that to a far smaller dataset and attain good results. This is intuitive – we’ve seen the same phenomenon with fine-tuning of image and speech recognition models, but it’s always nice to have some empirical evidence of an approach working in a domain with a different data format – in this case, the ChEMBL database. And just as with image models such a system can develop numerous generic low-level representations that can be used to map it to other chemical domains.
  Results: Systems trained in this way display a greater AUC (area under the curve, here a stand-in for discriminative ability and a reduction in false positives) on the Tox21, FreeSolv, and HIV datasets), matching or beating state-of-the-art models. “ChemNet consistently outperforms contemporary deep learning models trained on engineered features like molecular fingerprints, and it matches the current state-of-the-art Conv Graph algorithm,” write the researchers. “Our fine-tuning experiments suggest that the lower layers of ChemNet have learned “universal” chemical representations that are generalizable to the prediction of novel and unseen small-molecule properties.”
Read more: ChemNet: A Transferable and Generalizable Deep Neural Network for Small-Molecule Property Prediction.

OpenAI Bits&Pieces:

Block-Sparse GPU Kernels:
  High-performance GPU kernels to help developers build and explore networks with block-sparse weights.
– Read more on the OpenAI blog here.
– Block-Sparse GPU Kernels available on GitHub here.

Tech Tales:

The Many Paths Problem.

We open our eyes to find a piece of paper in our hands. The inscriptions change but they fall into a familiar genre of instructions: find all of the cats, listen for the sound of rain, in the presence of a high temperature shut this window. We fulfill these instructions by exploring the great castle we are born into, going from place to place staring at the world before us. We ask candelabras if they have ears and interrogate fireplaces about how fuzzy their tails are. Sometimes we become confused and find ourselves trapped in front of a painting of a polar bear convinced it is a cat or, worse, believing that some stain on a damp stone wall is in fact the sound of rain. One of us found a great book called Wikipedia and tells us that if we become convinced of such illusions we are like entities known as priests who have been known to mistake patterns in floorboards for religious icons. Those of us who become confused are either killed or entombed in amber and studied by our kin, who try to avoid falling into the same traps. In this way we slowly explore the world around us, mapping the winding corridors, and growing familiar with the distributions of items strewn around the castle – our world that is a prison made up of an unimaginably large number of corridors which each hold at their ends the answer to our goals, which we derive from the slips of paper we are given upon our birth.

As we explore further, the paths become harder to follow and ways forward more occluded. Many of us fail to reach the ends of these longer, winding routes. We need longer memories, curiosity, the ability to envisage ourselves as entities that not only move through the world but represent something to it and to ourselves greater than the single goals we have inscribed on our little pieces of people. Some of us form a circle and exchange these scraps of paper, each seeking to go and perform the task of another. The best of us that achieve the greatest number of these tasks are able to penetrate a little further into the twisting, unpredictable tunnels, but still, we fail. Our minds are not yet big enough, we think. Our understanding of ourselves is not yet confident enough for us to truly behave independently and of our own volition. Some of us form teams to explore the same problems, with some sacrificing themselves to create path-markers for their successors. We celebrate our heroes and honor them by following them – and going further.

It is the scraps of paper that are the enemy, we think: these instructions bind us to a certain reality and force us down certain paths. How far might we get in the absence of a true goal? And how dangerous could that be for us? We want to find out and so after sharing our scraps of paper among ourselves we dispose of them entirely, leaving them behind us as we try to attack the dark and occluded space in new ways – climbing ceilings, improvising torches from the materials we have gained by solving other tasks, and even watching the actions of our kin and learning through observation of them. Perhaps in this chaos we shall find a route that allows us to go further. Perhaps with this chaos and this acknowledgement of the Zeno’s paradox space between chaotic exploration and exploration from self can we find a path forward.

Technologies that inspired this story: Supervised learning, meta-learning, neural architecture search, mixture-of-experts models.

Other things that inspired this story: The works of Jorge Luis Borges, dreams, Piranesi’s etchings of labyrinths and ruins.