10 | December | 2018

Import AI: 124: Google researchers produce metric that could help us track the evolution of fake video news; $4000 grants for people to teach deep learning; creating aggressive self-driving cars.

by Jack Clark

Using AI to learn to design networks with multiple constraints:
…InstaNAS lets people add multiple specifications to neural architecture search…
In the past couple of years researchers have started to use various AI techniques such as reinforcement learning and evolution to use AI to design neural network architectures. This has already yielded numerous systems that display state-of-the-art performance on challenging tasks like image recognition, outperforming systems designed specifically by humans.
More recently, we’ve seen a further push to make such so-called ‘neural architecture search’ (NAS) systems efficient, and approaches like ENAS (Import AI #82) and SMASH (Import AI #56) have shown how to take systems that previously required hundreds of GPUs and fit them onto one or two GPUs.
Now, researchers are beginning to explore along another-axis of the NAS space: developing techniques that let them provide multiple objectives to the NAS system, letting them specify networks against different constraints. New research from National Tsing-Hua University in Taiwan and Google Research introduces InstaNAS, a system that lets people specify two categories of objectives as search targets, task-dependent objectives (eg, the accuracy in a given classification task) and architecture-level objectives (eg, latency/computational costs).
How it works: Training InstaNAS systems involves three phases of work: pre-training a one-shot model, then introducing a controller which learns to select architectures from the one-shot model with respect to each input instance (during this stage, “the controller and the one-shot model are being trained alternatively, which enforces the one-shot model to adapt to the distribution change of the controller”, the researchers write), and a final stage in which the system picks the controller which best satisfies the constraints, then the one-shot model is re-trained with that high-performing controller.
Results: Systems trained with InstaNAS achieve 48.9% and 40.2% average latency reduction on CIFAR-10 and CIFAR-100 against MobileNetV2 with comparable accuracy scores. Accuracies do take a slight hit (eg, the best accuracy on an InstaNAS system is approximately 95.7%, compared to 96.6% for a NAS-trained system.)
Why it matters: As we industrialize artificial intelligence we’re going to be offloading increasingly large chunks of AI development to AI systems themselves. The development and extension of NAS approaches will be crucial to this. Though we should bear in mind that there’s an implicit electricity<>human brain tradeoff we’re making here, and my intuition is that for some very large-scale NAS systems we could end up creating some hugely energy-hungry systems, which carry their own implicit (un-recognized) environmental externality.
Read more: InstaNAS: Instance-aware Neural Architecture Search (Arxiv).

New metrics to let us work out when Fake Video News is going to become real:
…With a little help from StarCraft 2!…
Google Brain researchers have proposed a new metric to give researchers a better way to assess the quality of synthetically generated videos. The motivation for this research is that today we lack effective ways to assess and quantify improvements in synthetic video generation, and the history of the deep learning subfield within AI has tended to show the progress in a domain improves once the research community settles on a standard metric and/or dataset to use to assess progress. (My pet theory for why this is: There are so many AI papers published these days that researchers need simple heuristics to tell them whether to invest time in reading something, and progress against a generally agreed upon shared dataset can be a good input for this – eg, ImageNet (Image Rec), Penn Treebank (NLU), Switchboard Hub 500 (Speech Rec). )
The metric: Frechet Video Distance (FVD): FVD has been designed to give scores that reflect not only the quality of the video, but also its temporal coherence – aka, the way things transition from frame to frame. FVD is built around what the researchers call a ‘Inflated 3D Convnet’, which has been used to solve tasks in other challenging video domains. Because this network is trained to spot actions in videos it contains useful feature relations that correspond to sequences of movements over time. FVD uses an Inflated 3D Convnet, trained on the Kinetics data set of human-centered YouTube videos, to let FVD characterize the difference between the temporal transitions seen in the synthetic videos, and between its own feature representations of physical movements derived from the real world.
The datasets: In tandem with FVD, the researchers introduce a new dataset based around StarCraft 2, a top-down real-time strategy game with lush, colorful graphics “to serve as an intermediate step towards real world video data sets.” These videos contain various different tasks in StarCraft 2 which are fairly self-explanatory – move unit to border; collect mineral shards; brawl; and road trip with medivac. The researchers provide 14,000 videos for each scenario.
Results: FVD seems to be a metric that more closely tracks the scores humans give when performing a qualitative evaluation of synthetic videos. “It is clear that FVD is better equipped to rank models according to human perception of quality”.
Why it matters: Synthetic videos are likely going to cause a large number of profound challenges in AI policy, as progression in this research domain yields immediate applications in the creation of automated propaganda. One of the most challenging things about this area – until now – has been the lack of available metrics to use to track progression here and thereby estimate when synthetic videos are likely going to become something ‘good enough’ for people to worry about in domains outside of AI research. “We believe that FVD and SCV will greatly benefit research in generative models of video in providing a well tailored, objective measure of progress,” they write.
Read more: Towards Accurate Generative Models of Video: A New Metric & Challenges (Arxiv).
Get the datasets from here.

Teaching self-driving cars how to drive aggressively:
…Fusing deep learning and model predictive control for aggressive robot cars…
Researchers with the Georgia Institute of Technology have created a self-driving car system that can successfully navigate a 1:5-scale ‘AutoRally’ vehicle along a dirt track at high speeds. This type of work paves the way for a future where self-driving cars can go off-road, and gives us indications for how militaries might be developing their own stealthy unmanned ground vehicles (UGVs).
  How it works: Fusing deep learning and model predictive control: To create the system, the researchers feed visual inputs from a monocular camera into either a static supervised classifier or a recurrent LSTM (they switch between the two according to the difficulty of the particular section of the map the vehicle is on) which use this information to predict where the vehicle is against a pre-downloaded map schematic. They then feed this prediction into a GPU-based particle filter which incorporates data from the vehicle IMU and wheel speeds to further predict where the vehicle is on the map.
Superhuman Wacky Races: The researchers test their system out on a complex dirt track on at the Georgia Tech Autonomous Racing Facility. This track “includes turns of varying radius including a 180 degree hairpin and S curve, and a long straight section”. The AutoRally car is able to “repeatedly beat the best single lap performed by an experienced human test driver who provided all of the system identification data.”
  Why it matters: Papers like this show how hybrid systems – where deep learning is doing useful work as a single specific component – are likely going to yield useful applications in challenging domains. I expect the majority of applied robotics systems in the future to use modular systems combining the best of human-specified systems as well as function approximating systems based on deep learning.
  Read more: Vision-Based High Speed Driving with a Deep Dynamic Observer (Arxiv).

What does a robot economy look like and what rules might it need?
…Where AI, Robotics, and Fully Automated Luxury Communism collide…
As AI has grown more capable an increasing number of people have begun to think about what the implications are for the economy. One of the main questions that people contemplate is how to effectively incorporate a labor-light capital-heavy AI-infused economic sector (or substrate of the entire economy) into society in such a way as to increase societal stability rather than reduce it. A related question is: What would an economy look like where an increasing chunk of economic activity happens as a consequence of semi-autonomous robots, many of whom are also providing (automated) services to each other? These are the questions that researchers with the University of Texas at Austin try to answer with a new paper interrogating the implications of a robot-driven economy.
Three laws of the robot economy: The researchers propose three potential laws for such a robot economy. These are:
– A robot economy has to be developed within the framework of the digital economy, so it can interface with existing economic systems. .
– The economy of robots must have internal capital that can support the market and reflect the value of the participation of robots in our society.
– Robots should not have property rights and will have to operate only on the basis of contractual responsibility, so that humans control the economy, not the machines.
Tools to build the robot economy: So, what will it take to build such a world? We’d likely need to develop the following tools:
– Create a network to track the status and implication of tasks given to or conducted by robots in accordance with the terms of a digital contract.
– A real-time communication system to let robots and people communicate together and with each-other.
– The ability to use “smart contracts” via the blockchain to govern these economic interactions. (This means that “neither the will of the parties to comply with their word nor the dependence on a third party (i. e. a legal system) is required).
  What does a robot economy mean for society? If we manage to make it through a (fairly unsteady, frightening) economic transition into a robot economy, then some very interesting things start to happen: “the most important fact is that in the long-term, intelligent robotics has the potential to overcome the physical limitations of capital and labor and open up new sources of value and growth”, write the researchers. This would provide the opportunity for vast economic abundance for all of mankind, if taxation and political systems can be adjusted to effectively distribute the dividends of an AI-driven economy.
  Why it matters: Figuring out exactly how society is going to be influenced by AI is one of the grand challenges of contemporary research into the impacts of AI on society. Papers like this suggest that such an economy will have very strange properties compared to our current one, and will likely demand new policy solutions.
  Read more: Robot Economy: Ready or Not, Here It Comes (Arxiv).

Want to teach others the fundamentals of deep learning? Want financial support? Apply for the Depth First Learning Fellowship!
…Applications open now for $4000 grants to help people teach others deep learning…
Depth First Learning, an AI education initiative from researchers at NYU, FAIR, DeepMind, and Google Brain, has announced the ‘Depth First Learning Fellowship’, sponsored by Jane Street.
How the fellowship works: Successful DFL Fellowship applicants will be expected to design a curricula and lead a DFL study group around a particular aspect of deep learning. DFL is looking for applicants with the following traits: mathematical maturity; effectiveness at scientific communication; ability to commit to ensure the DFL study sessions are useful; a general enjoyment of group learning.
Applications close on February 15th 2019.
Apply here (Depth First Learning).

Tired of classifying handwritten digits? Then try CURSIVE JAPANESE instead:
…Researchers release an MNIST-replacement; If data is political, then the arrival of cursive Japanese alongside MNIST broadens our data-political horizon…
For over two decades AI researchers have benchmarked the effectiveness of various supervised and unsupervised learning AI techniques against performance on MNIST, a dataset consisting of a multitude of heavily pixelated black-and-white handwritten digits. Now, researchers linked to the Center for Open Data in the Humanities, MILA in Montreal, the National Institute of Japanese Literature, Google brain, and a high-school in England (a young, major Kaggle winner!), have released “Kuzushiji-MNIST, a dataset which focuses on Kuzushiji (cursive Japanese), as well as two larger, more challenging datasets, Kuzushiji-49 and Kuzushiji-Kanji”.
Keeping a language alive with deep learning: One of the motivations for this research is to help people access Japan’s own past, as the cursive script used by this dataset is no longer taught in the official school curriculum. “Even though Kuzushiji had been used for over 1000 years, most Japanese natives today cannot read books written or published over 150 years ago,” they write.
  The data: The Kuzushiji dataset is made up of around ~300,000 Japanese books, transcribing some of them, and adding bounding boxes to them. The full dataset consists of 3,999 character types across 403,242 characters. The datasets being releases by the researchers were made as follows: “We pre-processed characters scanned from 35 classical books printed in the 18th century and organized the dataset into 3 parts: (1) Kuzushiji-MNIST, a drop-in replacement for the MNIST [16] dataset, (2) Kuzushiji-49, a much larger, but imbalanced dataset containing 48 Hiragana characters and one Hiragana iteration mark, and (3) Kuzushiji-Kanji, an imbalanced dataset of 3832 Kanji characters, including rare characters with very few samples.”
  Dataset difficulty: In tests the research demonstrate that these datasets are going to be more challenging for AI researchers to work with than MNIST itself – in baseline tests they show that many techniques that get above 99% classification accuracy on MNIST get between 95% and 98% on the Kuzushiji-MNIST drop-in, and scores only as high as around 97% for Kuzushiji-49.
Why it matters: Work like this shows how as people think more intently about the underlying data sources of AI they can develop new approaches that can let researchers do good AI research while also broadening the range of cultural artefacts that are easily accessible to AI systems and methodologies.
  Read more: Deep Learning for Classical Japanese Literature (Arxiv).

OpenAI Bits & Pieces:

Want to test how general your agents are? Try out CoinRun:
We’ve released a new training environment, CoinRun, which provides a metric for an agent’s ability to transfer its experience to novel situations.
Read more: Quantifying Generalization in Reinforcement Learning (OpenAI Blog).
Get the CoinRun code here (OpenAI Github).

Deadline Extension for OpenAI’s Spinning Up in Deep RL workshop:
We’ve extended the deadline for applying to participate in a Deep RL workshop, at OpenAI in San Francisco.
More details: The workshop will be held on February 2nd 2019 and will include lectures based on Spinning Up in Deep RL, a package of teaching materials that OpenAI recently released. Lectures will be followed by an afternoon hacking session, during which attendees can get guidance and feedback on their projects from some of OpenAI’s expert researchers.
Applications will be open until December 15th.
Read more about Spinning Up in Deep RL (OpenAI Blog).
Apply to attend the workshop by filling out this form (Google Forms).

Tech Tales:

Call it the ‘demographic time bomb’ (which is what the press calls it) or the ‘land of the living dead’ (which is what the tabloid press call it) or the ‘ageing population tendency among developed nations’ (which is what the economists call it), but I guess we should have seen it coming: old peoples’ homes full of the walking dead and the near-sleeping living. Cities given over to millions of automatons and thousands of people. ‘Festivals of the living’ attended solely by those made of metal. The slow replacement of conscious life in the world from organic to synthetic.

It started like this: most people in most nations stopped having as many children. Fertility rates dropped. Everywhere became like Japan circa 2020: societies shaped by the ever-growing voting blocs composed of the old people, and the ever-shrinking voting blocs composed of the young.

The young tried to placate the old people with robots – this was their first and most fatal mistake.

It began, like most world-changing technologies, with toys: “Fake Baby 3000” was one of the early models; an ultra-high-end doll designed for the young females of the ultra-rich. Then after that came “Baby Trainer”, a robot designed to behave like a newborn child, intended for the rich wannabe parents of the world who would like to get some practice on a synthetic-life before they birthed and cared for a real one. These robots were a phenomenal success and, much like the early 21st Century market for drones, birthed an ecosystem of ever-more elaborate and advanced automatons.

Half a decade later, someone had the bright idea of putting these robots in old people’s’ homes. The theory went like this: regular social interactions – and in particular, emotionally resonante ones – have a long history of helping to prevent the various medical degradation of old age (especially cognitive ones). So why not let old peoples’ hardwired paternal instincts do the job of dealing with ‘senescence-related health issues’, as one of the marketing brochures went? It was an instant success. Crowds of the increasingly large populations of old people began caring for the baby robots – and they started to live longer, with fewer of them going insane in their old age. And as they became healthier and more active, they were able to vote in elections for longer periods of time, and further impart their view of the world onto the rest of society.

Next, the old demanded that the robot babies be upgraded to robot children, and society obliged. Now the homes became filled with clanking metal kids, playing games on StairMasters and stealing ice from the kitchen to throw at eachother, finding the novel temperature sensation exciting. The old loved these children and – combined with ongoing improvements in healthcare – lived even longer. They taught the children to help them, and the homes of the old gained fabulous outdoor sculptures and meticulously tended lawns. Perhaps the AI kids were so bored they found this to be a good distraction? wrote one professor. Perhaps the AI kids loved their old people and wanted to help them? wrote another.

Around the world, societies are now on the verge of enacting various laws that would let us create robot adults to care for the ageing population. Metal people, working tirelessly in the service of their ‘parents’, standing in for the duties of the flesh-and-blood. Politics is demographics, and the demographics suggest the laws will be enacted, and the living-dead shall grow until they outnumber the dead-living.

Things that inspired this story: The robot economy, robotics, PARO the Therapeutic Robot, demographic time bombs, markets.

Import AI

December 10, 2018

Import AI: 124: Google researchers produce metric that could help us track the evolution of fake video news; $4000 grants for people to teach deep learning; creating aggressive self-driving cars.

by Jack Clark