Import AI

Import AI: #89: Chinese facial recognition startup raises $600 million; why GPUs could alter AI progress; and using context to deal with language ambiguity

by Jack Clark

Beating Moore’s Law with GPUs:
…Could a rise in GPU and other novel AI-substrates help deal with the decline of Moore’s Law?…
CPU performance has been stagnating for several years as it has become harder to improve linear execution pipelines across whole chips in relation to the reduction in transistor sizes, and the related problems which come from having an increasingly large number of things needing to work in lock-step with one another at minute scales. Could GPUs give us a way around this performance impasse? That’s the idea in a new blog from AI researcher Bharath Ramsundar who thinks that increases in GPU capabilities and the arrival of semiconductor substrates specialized for deep learning means that we can expect performance of AI applications to increase in coming years faster than typical computing jobs running on typical processors. He might be right – one of the weird things about deep learning is that its most essential elements, like big blocks of neural networks, can be scaled up to immense sizes without terrible scaling tradeoffs as their innards consist of relatively simple and parallel tasks like matrix multiplication, so new chips can easily be networked together to further boost base capabilities. Plus, standardization in a few software libraries, like NVIDIA’s cuDNN and CUDA GPU-interfaces, or the rise of TensorFlow for AI programming, means that some applications are getting faster over time purely as a consequence of software updates to these other fundamental improvements.
  Why it matters: Much of the recent progress in AI has occurred because around the mid-2000s processors became capable enough to easily train large neural networks on chunks of data – this underlying hardware improvement unlocked breakthroughs like the 2012 ‘AlexNet’ result for image recognition, related work in speech recognition, and subsequently significant innovations in research (AlphaGo) and application (large-scale sequence-to-sequence learning for ‘Smart Reply’, or the emergence of neural translation systems. If the arrival of things like GPUs and further software standardization and innovation has a good chance of further boosting performance, then researchers will be able to explore even larger or more complex models in the future, as well as run things like neural architecture search at a higher rate, which should combine to further drive progress.
  Read more: The Advent of Huang’s Law (Bharath Ramsundar blog post).

Microsoft launches AI training course including ‘Ethics’ segment:
…New Professional Program for Artificial Intelligence sees Microsoft get into the AI certification business…
Microsoft has followed other companies in making its internal training courses available externally via the Microsoft Professional Program in AI. This program is based on internal training initiatives the software company developed to ramp up their own professional skills.
 The Microsoft course is all fairly typical, teaching people about Python, statistics, the construction and deployment of deep learning and reinforcement learning projects, and deployment. It also includes a specific “Ethics and Law in Data and Analytics” course, which promises to teach developers how to ‘apply ethical and legal frameworks to initiatives in the data profession’.
  Read more: Microsoft Professional Program for Artificial Intelligence (Microsoft).
  Read more: Aiming to fill skill gaps in AI, Microsoft makes training courses available to the public (Microsoft blog).

Learning to deal with ambiguity:
…Researchers take charge of problem of word ambiguity via a charge at including more context…
Carnegie Mellon University researchers have tackled one of the harder problems in translation: dealing with ‘homographs’ – words that are spelled the same but have different meanings in different contexts, like ‘room’ and ‘charges’. They do this in the context of neural machine translation (NMT) systems, which use machine learning techniques to accomplish translation with orders of magnitude fewer hand-specified rules than prior systems.
  Existing NMT systems struggle with homographs, with performance of word-level translation degrading as the number of potential meanings of each word climbs, the researchers show. They try to alleviate this by adding a word context vector that can be used by the NMT systems to learn the different uses of the same word. Adding this ‘context network’ into their NMT architecture leads to significantly improved BLEU scores of sentences translated by the system.
  Why it matters: It’s noteworthy that the system used by the researchers to deal with the homograph problem is itself a learned system which, rather than using hand-written rules, seeks to instead ingest more context about each word and learn from that. This is illustrative of how AI-first software systems get built: if you identify a fault you typically write a program which learns to fix it, rather than learning to write a rule-based program that fixes it.
  Read more: Handling Homographs in Neural Machine Translation (Arxiv).

Chinese facial recognition company raises $600 million:
…SenseTime plans to use funds for five supercomputers for its AI services…
SenseTime, a homegrown computer vision startup that provides facial recognition tools at vast scales, has raised $600 million in funding. The Chinese company supplies facial recognition services to the public and private sectors and is now, according to a co-founder, profitable and looking to expand. The company is now “developing a service code-named “vipar” to parse data from thousands of live camera feeds”, according to Bloomberg News.
  Strategic compute: SenseTime will use money from the financing “to build at least five supercomputers in top-tier cities over the coming year to drive Viper and other services. As envisioned, it streams thousands of live feeds into a single system that’re automatically processed and tagged, via devices from office face-scanners to ATMs and traffic cameras (so long as the resolution is high enough). The ultimate goal is to juggle 100,000 feeds simultaneously,” according to Bloomberg news.
  Read more: China Now Has the Most Valuable AI Startup in the World (Bloomberg).
…Related: Chinese startup uses AI to spot jaywalkers and send them pictures of their face:
…Computer vision @ China scale…
Chinese startup Intellifusion is helping the local government in Shenzhen use facial recognition in combination with widely deployed urban cameras to text jaywalkers pictures of their faces along with personal information after they’ve been caught.
  Read more: China is using facial recognition technology to send jaywalkers fines through text messages (Motherboard).

Think China’s strategic technology initiatives are new? Think again:
…wide-ranging post by former Asia-focused State Department employee puts Beijing’s AI push in historical context…
Here’s an old (August 2017) but good post from the Paulson Institute at the University of Chicago about the history of Chinese technology policy in light of the government’s recent public statements about developing a national AI strategy. China’s longstanding worldview with regards to its technology strategy is that technology is a source of national power and China needs to develop more of an indigenous Chinese capability.
  Based on previous initiatives, it looks likely China will seek to attain frontier capabilities in AI then package those capabilities up as products and use that to fund further research. “Chinese government, industry, and scientific leaders will continue to push to move up the value-added chain. And in some of the sectors where they are doing so, such as ultra high-voltage power lines (UHV) and civil nuclear reactors, China is already a global leader, deploying these technologies to scale and unmatched in this by few other markets,” writes the author. “That means it should be able to couple its status as a leading technology consumer to a new and growing role as an exporter. China’s sheer market power could enable it to export some of its indigenous technology and engineering standards in an effort to become the default global standard setter for this or that technology and system.”
  Read more: The Deep Roots and Long Branches of Chinese Technonationalism (Macro Polo).

French researchers build ‘Jacquard’ dataset to improve robotic grasping:
…11,000+ object dataset provide real objects with associated depth information…
How do you solve a problem like robotic grasping? One way is to use many real world robots working in parallel for several months to learn to pick up a multitude of real world objects – that’s a route Google researchers took with the company’s ‘arm farm’ a few years ago. Another is to use people outfitted with sensors to collect demonstrations of humans grasping different objects, then learn from that – that’s the approach taken by AI startups like Kindred. A third way, and one which has drawn interest from a multitude of researchers, is to create synthetic 3D objects and train robots in a simulator to learn to grasp them – that’s what researchers at the University of California at Berkeley have done with Dex-Net, as well as organizations like Google and OpenAI; some organizations have further augmented this technique via the use of generative adversarial networks to simulate a greater range of grasps on objects.
  Jacquard: Now, French researchers have announced Jacquard, a robotics grasping dataset that contains more than 11,000 different real world objects and 50,000 images annotated with both RGB and realistic depth information. They plan to release it soon, they say, without specifying when. The researchers generate their data by sampling objects from ShapeNet which are each scaled and given different weight values, then dropped into a simulator, where they are then rendered into high-resolution images via Blender, with grasp annotations generated by a three-stage automated process within the ‘pyBullet’ physics library. To evaluate their dataset, they test it in simulation by pre-training an Alexnet on their Jacquard dataset then applying it to another, smaller, held-out dataset, where it generalizes well. The dataset supports multiple robotic gripper sizes, several different grasps linked to each image, and one million labelled grasps.
  Real robots: The researchers tested their approach on a real robot (a Fanuc M-20iA robotic arm) by testing it on a subset of ~2,000 objects from the Jacquard dataset as well as on the full Cornell dataset. A pre-trained AlexNet tested in this way gets about 78% at producing correct grasps, compared to 60.46% for Cornell. Both of these results are quite weak compared to results on the Dex-Net dataset, and other attempts.
  Why it matters: Many researchers expect that deep learning could lead to significant advancement in the manipulation capabilities of robots. But we’re currently missing two key traits: large enough datasets and a way to test and evaluate robots on standard platforms in standard ways. We’re currently going through a boom in the number of robot datasets available, with Jacquard representing another contribution here.
  Read more: Jacquard: A Large Scale Dataset for Robotic Grasp Detection (Arxiv).

What do StarCraft and the future of AI reseach have in common? Multi-agent control:
…Chinese researchers tackle StarCraft micromanagement tasks…
Researchers with the Institute of Automation in the Chinese Academy of Sciences have published research on using reinforcement learning to try to solve micromanagement tasks within StarCraft, a real-time strategy game. One of the main challenges in mastering StarCraft is to develop algorithms that can effectively train multiple units in parallel. The researchers propose what they call a parameter sharing multi-agent gradient-descent Sarsa algorithm, or PG-MAGDS. This algorithm shares the parameters of the overall policy network across multiple units while introducing methods to provide appropriate credit assignment to individual units. They also carry out significant reward shaping to get the agents to learn more effectively. Their PG-MAGDS AIs are able to learn to beat the in-game AI at a variety of micromanagement control scenarios, as well as in large-scale scenarios of more than thirty units on either side. It’s currently difficult to accurately evluate the various techniques people are developing for StarCraft against one another due to a lack of shared baselines and experiments, as well as an unclear split in the research community between using StarCraft 1 (this paper) as the testbed, and StarCraft 2 (efforts by DeepMind, others).
  Still limited: “At present, we can only train ranged ground units with the same type, while training melee ground units using RL methods is still an open problem. We will improve our method for more types of units and more complex scenarios in the future. Finally, we will also consider to use our micromanagement model in the StarCraft bot to play full the game,” the researchers write.
  Read more: StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning (Arxiv).

Tech Tales:

The person was killed at five minutes past eleven  the previous night. Their beaten body was found five minutes later by a passing group of women who had been dining at a nearby restaurant. By 11:15 the body was photographed and data began to be pulled from nearby security cameras, wifi routers, cell towers, and the various robot and drone companies. At 11:15:01 one of the robot companies indicated that a robot had been making a delivery nearby at the time of the attack. The robot was impounded and transported to the local police station where it was placed in a facility known to local officers as ‘the metal shop’. Here, they would try to extract data from the robot to learn what happened. But it would be a difficult task, because the robot had been far enough away from the scene that none of its traditional, easy to poll sensors (video, LIDAR, audio, and so on) had sufficient resolution or fidelity to tell them much.

“What did you see,” said the detective to the robot. “Tell me what you saw.”
The robot said nothing – unsurprising given that it had no speech capability and was, at that moment, unpowered. In another twelve hours the police would have to release the robot back to the manufacturer and if they hadn’t been able to figure anything out by then, then they were out of options.
“They never prepared me for this,” said the detective – and he was right. When he was going through training they never dwelled much on the questions relating to interrogating sub-sentient AI systems, and all the laws were built around an assumption that turned out to be wrong: that the AIs would remain just dumb enough to be interrogatable via direct access into their electronic brains, and that the laws would remain just slow enough for this to be standard procedure for dealing with evidence from all AI agents. This assumption was half right: the law did stay the same, but the AIs got so smart that though you could look into their brains, you couldn’t learn as much as you’d hope.

This particular AI was based in a food delivery robot that roamed the streets of the city, beeping its way through crowds to apartment buildings, where it would notify customers that their Bahn Mi, or hot ramen, or cold cuts of meat, or vegetable box, had arrived. Its role was a seemingly simple one: spend all day and night retrieving goods from different businesses and conveying them to consumers. But its job was very difficult from an AI standpoint – streets would change according to the need for road maintenance or the laying of further communication cables, businesses would lose signs or change signs or have their windows smashed, fashions would change which would alter the profile of each person in a street scene, and climactic shocks meant the weather was becoming ever stranger and every more unpredictable. So to save costs and increase the reliability of the robots the technology companies behind them had been adding more sensors onto the platforms and, once those gains were built-in, working out how to incorporate artificial intelligence techniques to increase efficiency further. A few years ago computational resources became cheap and widely available enough for them to begin re-training each robot based on its own data as well as data from others. They didn’t do this in a purely supervised way, either, instead they had each robot learn to simulate its own model of the world – in this case, a specific region of a city – it worked in, letting it imagine the streets around itself to give it greater abilities relating to route-finding and re-orientation, adapting to unexpected events, and so on.

So now to be able to understand anything about the body that had been found the detective needed to understand the world model of the robot and see if it had significantly changed at any point during the previous day or so. Which is how he found himself staring at a gigantic wall of computer monitors, each showing a different smeary kaleidoscopic vision of a street scene. The detective had access to a control panel that let him manipulate the various latent variables that conditioned the robot’s world model, allowing him to move certain dials and sliders to figure out which things had changed, and how.

The detective knew he was onto something when he found the smear. At first it looked like an error – some kind of computer vision artifact – but as he manipulated various dials he saw that, at 1115 the previous night, the robot had updated its own world model with a new variable that looked like a black smudge. Except this black smudge was only superimposed on certain people and certain objects in the world, and as he moved the slider around to explore the smear, he found that it had strong associations to two other variables – red three-wheeled motorcycles, and men running. The detective pulled all the information about the world model and did some further experiments and added this to the evidence log.

Later, during prosecution, the robot was physically wheeled into the courtroom where the trial was taking place, mostly as a prop for the head prosecutor. The robot hadn’t seen anything specific itself – its sensors were not good enough to have picked anything admissible up. But as it had been in the area it had learned of the presence of this death through a multitude of different factors it had sensed, ranging from groups of people running toward where the accident had occurred, to an increase in pedestrian phone activity, to the arrival of sirens, and so on. And this giant amount of new sensory information had somehow triggered strong links in its world model with three-wheeled motorcycles and running men. Armed with this highly specific set of factors the police had trawled all the nearby security cameras and sensors again and, through piecing together footage from eight different places, had found occasional shots of men running towards a three-wheeled motorcycle and speeding, haphazardly, through the streets. After building evidence further they were able to get a DNA match. The offenders went to prison and the mystery of the body was (partially) solved. Though the company that made the AI for the robot made no public statements regarding the case, it subsequently used the case in private sales materials as case studies for local law enforcement on the surprising ways robots could benefit their town.

Things that inspired this story: Food delivery robots, the notion of jurisdiction, interpretability of imagination, “World Models” by David Ha and Juergen Schmidhuber.

 

ImportAI: #88: NATO designs a cyber-defense AI; object detection improves with YOLOv3; France unveils its national AI strategy

by Jack Clark

Fast object detector YOLO gets its third major release:
…Along with one of the most clearly written and reassuringly honest research papers of recent times. Seriously. Read it!…
YOLO (You Only Look Once) is a fast, free object detection system developed by researchers at the University of Washington. Its latest v3 update makes it marginally faster by incorporating “good ideas from other people”. These include a residual network system for feature extraction which attains reasonably high scores on ImageNet classification while being more efficient than current state-of-the-art systems, and a method inspired by feature pyramid networks that improves prediction of bounding boxes.
  Reassuringly honest: The YOLOv3 paper is probably the most approachable AI research paper I’ve read in recent years, and that’s mostly because it doesn’t take itself too seriously. Here’s the introduction: “Sometimes you just kinda phone it in for a year, you know? I didn’t do a whole lot of research this year. Spent a lot of time on Twitter. Played around with GANs a little. I had a little momentum left over from last year; I managed to make some improvements to YOLO. But, honestly, nothing like super interesting, just a bunch of small changes that make it better,” the researchers write. The paper also includes a “Things We Tried That Didn’t Work” section, which should save other researchers time.
  Why it matters: YOLO makes it easy for hobbyists to access near state-of-the-art object detectors than run very quickly on tiny computational budgets, making it easier for people to deploy systems onto real world hardware, like phones or embedded chips paired with webcams. The downside of systems like YOLO is that they’re so broadly useful that bad actors will use them as well; the researchers demonstration awareness of this via a ‘What This All Means’ section: ““What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook. I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to…. wait, you’re saying that’s exactly what it will be used for?? Oh. Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait…”
  Read more: YOLOv3: An Incremental Improvement (PDF).
  More information on the official YOLO website here.

The military AI cometh: new reference architecture for MilSpec defense detailed by researchers:
…NATO researchers plot automated, AI-based cyber defense systems…
A NATO research group, led by the US Army Research Laboratory, has published a paper on a reference architecture for a cyber defense agent that uses AI to enhance its capabilities. The paper is worth reading because it provides a nuts&bolts perspective on how a lot of militaries around the world are viewing AI: AI systems let you automate more stuff, automation lets you increase the speed with which you can take actions and thereby gain strategic initiative against an opponent, so the goal of most technology integrations is to automate as many chunks of a process as possible to retain speed of response and therefore initiative.
  “Artificial cyber hunters“: “In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents—malware—will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance (C4ISR) and computerized weapon systems. To fight them, NATO needs artificial cyber hunters—intelligent, autonomous, mobile agents specialized in active cyber defense,” the researchers write.
  How the agents work: The researchers propose agents that possess five main components: “sensing and world state identification”, “planning and action selection”, “collaboration and negotiation”, “action execution”, and “learning and knowledge improvement”. Each of these functions has a bunch of sub-systems to perform tasks like ingest data from the agent’s actions, ot to communicate and collaborate with other agents.
  Usage scenarios: These agents are designed to be modular and deployable across a variety of different form factors and usage scenarios, including multiple agents that deployed throughout a vehicle’s weapons, navigation, and observation systems, as well as the laptops used by its human crew, and managed by a single “master agent”. Under this scenario, the NATO researchers detail a threat where the vehicle is compromized by a virus placed into it during maintenance; this virus is subsequently detected by one of the agents when it begins scanning other subsystems within the vehicle, causing the agents deployed on the vehicle to decrease trust in the ‘vehicle management system’ and places the BMS (an in-vehicle system used to survey the surrounding territory) into an alert state. Next, one of the surveillance AI agents discovers that the enemy malware has loaded software directly into the BMS, causing the AI agent to automatically restart the BMS to reset it to a safe state.
  Why it matters: As systems like these move from reference architectures to functional blocks of code we’re going to see the nature of conflict change as systems become more reactive over shorter timescales, which will further condition the sorts of strategies people use in conflict. Luckily, technologies for offense are too crude and brittle and unpredictable to be explored by militiaries any time soon, so most of this work will take place in the area of defense, for now.
  Read more: Initial Reference Architecture of an Intelligent Autonomous Agent for Cyber Defense (Arxiv).

Google researchers train agents to project themselves forward and to work backward from the goal:
…Agents perform better at long horizon tasks when they…
When I try to solve a task I tend to do two things: I think of the steps I reckon I need to take to be able to complete it, and then I think of the end state and try to work my way backwards from there to where I am. Today, most AI agents just do the first thing, exploring (usually without a well-defined notion of the end state) until they stumble into correct behaviors. Now, researchers with Google Brain have proposed a somewhat limited approach to give agents the ability to work backwards as well. Their approach requires the agent to be provided with knowledge of the reward function and specifically the goal – that’s not going to be available in most systems, though it may hold for some software-based approaches. The agent is able to then use this information to project forward from its own state when considering the next actions, and also look backward from its sense of the goal to help it perform better action selection. The approach works well on lengthy tasks requiring large amounts of exploration, like navigating in gridworlds or solving Towers of Hanoi problems. It’s not clear from this paper how far this technique can go as it is tested on small-scale toy domains.
  Why it matters: To be conscious is to be trapped in a subjective view of time that governs everything we do. Integrating more of an appreciation of time as a specific contextual marker and using that to govern environment modelling seems like a prerequisite for the development of more advanced systems.
  Read more: Forward-Backward Reinforcement Learning (Arxiv).

AI researchers train agents to simulate their own worlds for superior performance:
…I am only as good as my own imaginings…
Have you ever heard the story about the basketball test? Scientists split a group of people into three groups; one group was told to not play basketball for a couple of weeks, the second group was told to play basketball for an hour a day for two weeks, and the third group was told to think about playing basketball for an hour a day for two weeks, but not play it. Eventually, all three groups played basketball and the scientists discovered that the people that had spent a lot of time thinking about the game did meaningfully better than the group that hadn’t played it at all, though neither were as good as the team that practised regularly. This highlights something most people have a strong intuition about: our brains are simulation engines, and the more time we spend simulating a problem, the better chance we have of solving that problem in the real world. Now, researchers David Ha and Juergen Schmidhuber have sought to give AI agents this capability, by training systems to develop a compressed representation of their environment, then having these agents train themselves within this imagined version of the environment to solve a task – in this case, driving a car around a race course, and solving a challenge in VizDoom.
   Significant caveat: Though the paper is interesting it may be pursuing a research path that doesn’t go that far according to the view of one academic, Shimon Whiteson, who tweeted out some thoughts about the paper a few days ago.
  Surprising transfer learning: For the VizDoom tasks the researchers found they were able to make the agents’ model of its Doom challenge more difficult by raising the temperature of the environment model, which essentially increases randomization of its various latent variables. This means the agent had to contend with a more difficult version of the task, replete with more enemies, less predictable fireballs, and even the occasional random death. They found that agents trained in this simulation excelled at a simpler real world task, suggesting that the underlying learned environment model was of sufficient fidelity to be a useful mental simulation.
  Why it matters: “Imagination” is a somewhat loaded term in AI research, but it’s a valid thing to be interested in. Imagination is what lets humans explore the world around them effectively and imagination is what gives them a sufficiently vivid and unpredictable internal mental world to be able to have insights that lead to inventions. Therefore, it’s worth paying attention to systems like those described in this paper that strive to give AI agents access to a learned and rich representation of the world around them which they can then use to teach themselves. It’s also interesting as another way of applying data augmentation to an environment: simply expose an agent to the real environment enough that it can learn an internal representation of it, then throw computers at expanding and perturbing the internal world simulation to cover a greater distribution of (potentially) real world outcomes.
   Readability endorsement: The paper is very readable and very funny. I wish more papers were written to be consumed by a more general audience as I think it makes the scientific results ultimately accessible to a broader set of people.
  Read more: World Models (Arxiv).

Testing self-driving cars with toy vehicles in toy worlds:
…Putting neural networks to the (extremely limited) test…
Researchers with the Center for Complex Systems and Brain Sciences at Florida Atlantic University have used a toy racetrack, a DIY model car, and seven different neural network approaches to evaluate self-driving capabilities in a constrained environment. The research seeks to provide a cheap, repeatable benchmark developers can use to evaluate different learning systems against eachother (whether this benchmark has any relevance for full-size self-driving cars is to be determined.) They test seven types of neural network on the same platform, including a feed forward network; a two-layer convolutional neural network; an LSTM; implementations of Alexnet, VGG-126, Inception V3, and a ResNet-26. Each network is tested on the obstacle course following training and is evaluated according to how many laps the car completes. They test the networks on three data types: color and grayscale single images, as well as a ‘gray framestack’ which is a set of images that occurred in a sequence. Most systems were able to complete the majority of the courses, which suggests the course is a little too easy. An AlexNet-based system attained perfect performance on one data input type (single color frame), and a ResNet attained the best performance when trying to use a Gray Framestack.
  Why it matters: This paper highlights just how little we know today about self-driving car systems and how poor our methods are for testing and evaluating different tactics. What would be really nice is if someone spent enough money to do a controlled test of actual self-driving cars on actual roads, though I expect that companies will make this difficult out of a desire to keep their IP secret.
  Read more: A Systematic COmparison of Deep Learning Architectures in an Autonomous Vehicle (Arxiv).

Separating one detected pedestrian from another with deep learning:
…A little feature engineering (via ‘AffineAlign’) goes a long way…
As the world starts to deploy large-scale AI surveillance tools researchers are busily working to deal with some of the shortcomings of the technology. One major issue for image classifiers has been object segmentation and disambiguation, for example: if I’m shown images of a crowd of people how can I specifically label each one of those people and keep track of each of them, without accidentally mis-labeling a person, or losing them in the crowd? New research from Tsinghua University, Tencent AI Lab, and Cardiff University attempts to solve this problem with “a brand new pose-based instance segmentation framework for humans which separates instances based on human pose rather than region proposal detection.” The proposed method introduces an ‘AffineAlign’ layer that aligns images based on human poses which it uses within an otherwise typical computer vision pipeline. Their approach works by adding in a bit more prior knowledge (specifically, knowledge of human poses) into a recognition pipeline, and using this to better identity and segment people in crowded photos.
  Results: The approach attains comparable results to MASK-RCNN on the ‘COCOHUMAN’ dataset, and outperforms it on the ‘COCOHUMAN-OC” dataset which test systems’ ability to disambiguate partially occluded humans.
   Why it matters: As AI surveillance systems grow in capability it’s likely that more organizations around the world will deploy such systems into the real world. China is at the forefront of doing this currently, so it’s worth tracking public research on the topic area from Chinese-linked researchers.
  Read more: Pose2Seg: Human Instance Swegmentation Without Detection (Arxiv).

French leader Emmanuel Macron discusses France’s national AI strategy:
…Why AI has issues for democracy, why France wants to lead Europe in AI, and more…
Politicians are somewhat similar to hybrids of weathervanes and antennas; the job of a politician is to intuit the public mood before it starts to change and establish a rhetorical position that points in the appropriate direction. For that reason it’s been interesting to see more and more politicians ranging from Canada’s Justin Trudeau to China’s Xi Jinping to, now, France’s Emmanuel Macron, taking meaningful positions on artificial intelligence; this suggests they’ve intuited that AI is going to become a galvanizing issue for the general public. Macron gives some of his thinking about the impact of AI in an interview with Wired. His thesis is that European countries need to pool resources and support AI individually to have a chance at becoming a significant enough power bloc with regards to AI capabilities to not be crushed by the scale of the USA’s and China’s AI ecosystems. Highlights:
– AI “will disrupt all the different business models”, and France needs to lead in AI to retain agency over itself.
– Opening up data for general usage by AI systems is akin to opening up a Pandora’s Box: “The day we start to make such business out of this data is when a huge opportunity becomes a huge risk. It could totally dismantle our national cohesion and the way we live together. This leads me to the conclusion that this huge technological revolution is in fact a political revolution.”
– The USA and China are the two leaders in AI today.
– “AI could totally jeopardize democracy.”
– He is “dead against” the usage of lethal autonomous weapons where the machine makes the decision to kill a human.
– “My concern is that there is a disconnect between the speediness of innovation and some practices, and the time for digestion for a lot of people in our democracies.”
   Read more: Emmanuel Macron Talks To Wired About France’s AI Strategy (Wired).

France reveals its national AI strategy:
…New report by Fields Medal-winning minister published alongside Emmanuel Macron speech and press tour…
For the past year or so French mathematician and politician Cedric Villani has been working on a report for the government about what France’s strategy should be for artificial intelligence. He’s now published the report and it includes many significant recommendations meant to help France (and Europe as a whole) chart a course between the two major AI powers, the USA and China.
  Summary: Here’s a summary of what France’s AI strategy involves: rethink data ownership to make it easier for governments to create large public datasets; specialize in four sectors: healthcare, environment, transport-mobility, and defense security; revise public sector procurement so it’s easier for the state to buy products from smaller (and specifically European) companies; create and fund interdisciplinary research projects; create national computing infrastructure including “a supercomputer designed specifically for AI usage and devoted to researchers” along with creating a European-wide private cloud for AI research; increase competitiveness of public sector remuneration; fund a public laboratory to study AI and its impact on labor markets which will work in tandem with schemes to get companies to look into funding professional training for people whose lives are affected by innovations developed by the private sector; increase transparency and interpretability of AI systems to deal with problems of bias; create a national AI ethics committee to provide strategic guidance to the government, and improve the diversity of AI companies.
  Read more: Summary of France’s AI strategy in English (PDF).

Berkeley researchers shrink neural networks with SqueezeNet-successor ‘SqueezeNext’:
…Want something eight times faster and cheaper than ImageNet…
Berkeley researchers have published ‘SqueeseNext’, their latest attempt to distill the capabilities of very large neural networks into smaller models that can feasibly be deployed on devices with small memory and compute capabilities, like mobile phones. While much of the research into AI systems today is based around getting state-of-the-art results on specific datasets, SqueezeNext is part of a parallel track focused on making systems deployable. “A general trend of neural network design has been to find larger and deeper models to get better accuracy without considering the memory or power budget,” write the authors.
  How it works: SqueezeNext is efficient because of a few design strategies: low rank filters; a bottleneck filter to constrain the parameter count of the network; using a single fully connected layer following a bottleneck; weight and output stationary; and co-designing the network in tandem with a hardware simulator to maximize hardware usage efficiency.
  Results: The resulting SqueezeNext network is a neural network with 112X fewer model parameters than those found in AlexNet, the model that was used to attain state-of-the-art image recognition results in 2012. They also develop a version of the network whose performance approaches that of VGG-19 (which did well in ImageNet 2014). The researchers also design an even more efficient network by carefully tuning model design in parallel with a hardware simulator, ultimately designing a model that is significantly faster and more energy efficient than a widely used compressed network called SqueezeNet.
  Why it matters: One of the things holding neural networks back from being deployed is their relatively large memory and computation requirements – traits that are likely to continue to be present given the current trend for solving tasks via training unprecedentedly multi-layered systems. Therefore, research into making these networks run efficiently broadens the number of venues neural nets can run in.
   Read more: SqueezeNext: Hardware-Aware Neural Network Design (Arxiv).

Tech Tales:

Metal Dogs Grow Old.

It’s not unusual, these days, to find rusting piles of drones next to piles of Elephant skeletons. Nor is it unusual to see an old elephant make its way to a boneyard accompanied by a juddering, ancient drone, and to see both creature and machine set themselves down and supside at the same time. There have even been stories of drones falling out of the sky when one of the older birds in the flock dies. These are all some of the unexpected consequences of a wildlife preservation program called PARENTAL UNIT. Starting in the early twenties we started to introduce small, quiet drones to vulnerable animal populations. The drones would learn to follow a specific group of creatures, say a family of elephants, or – later, after the technology improved – a flock of birds.

The machines would learn about these creatures and watch over them, patrolling the area around them as they slept and, upon finding the inevitable poachers, automatically raising alerts with local park rangers. Later, the drones were given some autonomous defense capabilities, so they could spray a noxious chemical onto the poachers that had the duel effect of making local predators be drawn to them, and providing a testable biomarker that police could subsequently check people for at the borders of national parks.

A few years after starting the program the drone deaths started happening. Drones died all the time, and we modelled their failures as rigorously as any other piece of equipment. But drones started dying at specific times – the same time the oldest animal in the group they were watching died. We wondered about this for weeks, running endless simulations, and even pulling in some drones from the field and inspecting the weights in their models to see if any of their continual learning had led to any unpredictable behaviors. Could there be something about the union of the concept of death and the concept of the eldest in the herd that fried the drones brains, our scientists wondered? We had no answers. The deaths continued.

Something funny happened: after the initial rise in deaths they steadied out, with a few drones a week dying from standard hardware failures and one or two dying as a consequence of one of their creatures dying. So we settled into this quieter new life and, as we stopped trying to interfere, we noticed a further puzzling statistical trend: certain drones began living improbably long lifespans, calling to mind the Mars rovers Spirit and Opportunity that had miraculously exceeded their own designed lifespans. These drones were also the same machines that died when the eldest animals died. Several postgrads are currently exploring the relationship, if any, between these two. Now we celebrate these improbably long-lived machines, cheering them on as they fuzz in for a new propeller, or update our software monitors with new footage from their cameras, typically hovering right above the creature they have taken charge of, watching them and learning something from them we can measure but cannot examine directly.

Things that inspired this story: Pets, drones, meta-learning, embodiment.

ImportAI: #87: Salesforce research shows the value of simplicity, Kindred’s repeatable robotics experiment, plus: think your AI understands physics? Run it on IntPhys and see what happens.

by Jack Clark

Chinese AI star says society must prepare for unprecedented job destruction:
…Kai-Ful Lee, venture capitalist and former AI researchers, discusses impact of AI and why today’s techniques will have a huge impact on the world…
Today’s AI systems are going to influence the world’s economy so much that their uptake will lead to what looks in hindsight like another industrial revolution, says Chinese venture capitalist Kai-Fu Lee, in an interview with Edge. “We’re all going to face a very challenging next fifteen or twenty years, when half of the jobs are going to be replaced by machines. Humans have never seen this scale of massive job decimation. The industrial revolution took a lot longer,” he said.
   He also says that he worries deep learning might be a one-trick pony, in the sense that we can’t expect other similarly scaled breakthroughs to occur in the next few years, and we should adjust our notions of AI progress on this basis. “You cannot go ahead and predict that we’re going to have a breakthrough next year, and then the month after that, and then the day after that. That would be exponential. Exponential adoption of applications is, for now, happening. That’s great, but the idea of exponential inventions is a ridiculous concept. The people who make those claims and who claim singularity is ahead of us, I think that’s just based on absolutely no engineering reality,” he says.
  AI Haves and Have-Nots: Countries like China and the USA that have large populations and significant investments in AI stand to fair well in the new AI era, he says. “The countries that are not in good shape are the countries that have perhaps a large population, but no AI, no technologies, no Google, no Tencent, no Baidu, no Alibaba, no Facebook, no Amazon. These people will basically be data points to countries whose software is dominant in their country.”
  Read more: We Are Here To Create, A Conversation With Kai-Fu Lee (Edge).

AI practitioners grapple with the upcoming information apocalypse:
..And you thought DeepFakes was bad. Wait till DeepWar…
Members of the AI community are beginning to sound the alarm about the imminent arrival of stunningly good, stunningly easy to make synthetic images and videos. In a blog post, AI practitioners say that the increasing availability of data combined with easily accessible AI infrastructure (cloud-rentable GPUs) is lowering the barrier to entry for people that want to make this stuff, and that ongoing progress in AI capabilities means the quality of these fake media is increasing over time.
  How can we deal with these information threats? We could look at how society already makes it hard to forge currencies via making it costly to produce high-fidelity copies and in parallel developing technologies to verify the authenticity of currency materials. Unfortunately, though this may help with some of the problems brought about by AI forgery, it doesn’t deal with the root problems: AI is predominantly embodied in software rather than hardware and so it’s going to be difficult to insert detectable (and non-spoofable) distinct visual/audio signatures into generated media barring some kind of DRM-on-steroids. One solution could be to train AI classifiers on real and faked datasets from the same domain so as to provide classifiers to spot faked media in the wild.
  Read more: Commoditisation of AI, digital forgery and the end of trust: how we can fix it.

Berkeley researchers use Soft Q-Learning to let robots compose solutions to tasks:
…Research reduces the time it takes to learn new behaviors on robots…
Berkeley researchers have figured out how to use soft q-learning, a recently introduced variant of traditional q-learning, to let robots learn more efficiently. They introduce a new trick where they’re able to learn to compose new q-functions from existing learned policies, letting them, for example, train a robot to move its arm to a particular distribution of X positions, then to a particular distribution of Y positions, then they can create a new policy which moves the arm to the intersection of the X and Y positions without having been trained on the combination previously. This sort of learning is typically quite difficult to achieve in a single policy as it requires so much exploration that most algorithms will spend a long time trying and failing to succeed at the task.
  Real world: The researchers train real robots to succeed at tasks like reaching to a specific location and stacking Lego blocks. They also demonstrate the utility of combining policies by training a robot to avoid an obstacle near its arm and separately training it to stack legos, then combine the two policies allowing the robot to stack blocks while avoiding an obstacle, despite having never been trained on the combination before.
  Why it matters: The past few years of AI progress have let us get very good at developing systems which excel at individual capabilities; being able to combine capabilities in an ad-hoc manner to generate new behaviors further increases the capabilities of AI systems and makes it possible to learn a distribution of atomic behaviors then chain these together to succeed at far more complex tasks than those found within the training set.
  Read more: Composable Deep Reinforcement Learning for Robotic Manipulation (Arxiv).

Think your AI model has a good understanding of physics? Run it on IntPhys and prepare to be embarrassed:
…Testing AI systems in the same way we test infants and creatures…
INRIA and Facebook and CNRS researchers have released IntPhys, a new way to evaluate AI systems’ ability to model the physical world around them using what the researchers call a ‘physical plausibility test’. IntPhys follows in a recent trend in AI for testing systems on tougher problems that more closely map to the sorts of problems humans typically tackle (see, AI2’s ‘ARC’ dataset for written reasoning, and DeepMind’s cognitive science-inspired ‘PsychLab’ environment).
  How it works: IntPhys presents AI systems with movies of scenes rendered in UnrealEngine4 and challenges them to figure out whether one scene can lead to another, letting them test models’ ability to internalize fundamental concepts about the world like object permanence, causality, etc. Systems need to compute a “plausibility score” for each of the scenes or scene combinations they are shown, then use this to figure out if the systems have learned about the underlying dynamics of the world.
  The IntPhys Benchmark: v1 of IntPhys focuses on unsupervised learning. The first version tests systems’ ability to understand object permanence. Future releases will include more tests for things like shape constancy, spatio-temporal continuity, and so on. The initial IntPhys release contains 15,000 videos of possible events, each video around 7 seconds long running at 15fps, totalling 21 hours of videos. It also incorporates some additional information so you don’t have to attempt to solve the task in a purely unsupervised manner, including depth of field data for each image, as well as object instance segmentation masks.
  Baseline Systems VERSUS Humans: The researchers create two baselines for others to evaluate their systems against: a CNN encoder-decoder system, and a conditional GAN. “Preliminary work with predictions at the pixel level revealed that our models failed at predicting convincing object motions, especially for small objects on a rich background. For this reason, we switched to computing predictions at a higher level, using object masks.” The researchers tested humans on their system, finding that humans had an average error rate of about 8 percent when the scene is visible and 25 percent when the scene contains partial occlusion. Neural network-based systems, by comparison, had errors of 31 percent on visible scenes and 50 percent on partially occluded scenes.
  What computers are up against: “At 2-4 months, infants are able to parse visual inputs in terms of permanent, solid and spatiotemporally continuous objects. At 6 months, they understand the notion of stability, support and causality. Between 8 and 10 months, they grasp the notions of gravity, inertia, and conservation of momentum in collision; between 10 and 12 months, shape constancy, and so on,” the researchers write.
  Why it matters: Tests like this will give us a greater ability to model the abilities of AI systems to perform fundamental acts of reasoning, and as the researchers extend the benchmark with more challenging components we’ll be able to get a better read on what these systems are actually capable of. As new components are added “the prediction task will become more and more difficult and progressively reach the level of scene comprehension achieved by one-year-old humans,” they write.
  Competition: AI researchers can download the dataset and submit their system scores to an online leaderboard at the official IntPhys website here (IntPhys).
  Read more: IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning (Arxiv).

Kindred researchers explain how to make robots repeatable:
…Making the dream of repeatable robot experiments a reality…
Researchers with robot AI startup Kindred have published a paper on a little-discussed subject in AI: repeatable real-world robotics experiments. It’s a worthwhile primer on some of the tweaks people need to make to create robot development environments that are a) repeatable and b) effective.
  Regular robots: The researchers set up a reaching task using a Universal Robotics ‘UR5’ robot arm and describe the architecture for the system. One key difference between simulated and real world environments is the role of time, where in simulation one typically executes all the learning and action updates synchronously, whereas in real robots you need to do stuff asynchronously. “In real-world tasks, time marches on during each agent and environment-related computations. Therefore, the agent always operates on delayed sensorimotor information,” they explain.
  Why it matters: It’s currently very difficult to model progress in real-world robotics due to the diversity of tasks and the lack of trustworthy testing regimes. Papers like this suggest a path forward and I’d hope they encourage researchers to try to structure their experiments to be more repeatable and reliable. If we’re able to do this then we’ll be able to better develop intuitions about the rate of progress in the field which should help for forecasting trends in development – a critical thing to do, given how much robots are expected to influence employment in the regions they are deployed into.
  Read more here: Setting up a Reinforcement Learning Task with a Real-World Robot (Arxiv).

Salesforce researchers demonstrate the value of simplicity for language modelling:
…Well-tuned LSTM or QRNN-based systems shown to beat more complex systems…
Researchers with Salesforce have shown that well-tuned basic AI components can attain superior performance on tough language tasks than more sophisticated and in many cases more modern systems. Their research shows that RNN-based systems that model language using well-tuned, simple components like LSTMs or the Salesforce-inventred QRNN beat more complex models like recurrent highway networks, hyper networks, or systems found by neural architecture search. This result highlights that much of the recent progress in AI may to some extent be illusory: jumps in performance on certain datasets that have previously been assumed to be possible due to fundamentally new capabilities in new models are now being shown to be within reach of simpler components that are tuned and tested comprehensively.
  Results: The researchers test their QRNN and LSTM-based systems against the Penn Treebank and enwik8 character-level datasets and the word-level WikiText-103 dataset, beating state-of-the-art  scores on Penn Treebank and enwik8 when measured by bits-per-character, and significantly outperforming SOTA on perplexity on WikiText-103.
  Why it matters: This paper follows prior work showing that many of our existing AI components are more powerful than researchers suspected, and follows research that has shown that fairly old systems like GANs or DCGANs can adeptly model data distributions more effectively than sophisticated successor systems. That’s not to say this should be taken as a sign that the subsequent inventions are pointless, but it should cause researchers to devote more time to interrogating and tuning existing systems rather than trying to invent different proverbial wheels. “Fast and well tuned baselines are an important part of our research community. Without such baselines, we lose our ability to accurately measure our progress over time. By extending an existing state-of-the-art word level language model based on LSTMs and QRNNs, we show that a well tuned baseline can achieve state-of-the-art results on both character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets without relying on complex or specialized architectures,” they write.
  Read more: An Analysis of Neural Language Modeling at Multiple Scales (Arxiv).

Want to test how well your AI understands language and images? Try VQA 2.0
…New challenge arrives to test AI systems’ abilities to model language and images…
AI researchers that think they’ve developed models that can learn to model the relationship between language and images may want to submit to the third iteration of the Visual Question Answering Challenge. The challenge prompts models to answer questions about the contents of images. Challengers will use the v2.0 version of the VQA dataset, which includes more written questions and ground truth answers about images.
  Read more: VQA Challenge 2018 launched! (VisualQA.org).

Tech Tales:

Miscellaneous Letters Sent To The Info@ Address Of An AI Company

2023: I saw what you did with that robot so I know the truth. You can’t hide from me anymore I know exactly what you are. My family had a robot in it and the state took them away and told us they were being sent to prison but I know the truth they were going to take them apart and sell their body back to the aliens in exchange for the anti-climate change device. What you are doing with that robot tells me you are going to take it apart when it is done and sell it to the aliens as well. You CANNOT DO THIS. The robot is precious you need to preserve it or else I will be VERY ANGRY. You must listen to me we-

2025: So you think you’re special because you can get them to talk to each other in space now and learn things together well sure I can do that as well I regularly listen to satellites so I can tell you about FLUORIDE and about X74-B and about the SECRET UN MOONBASE and everything else but you don’t see me getting famous for these things in fact it is a burden it is a pain for me I have these headaches. Does your AI get sick as well?-

2027: Anything that speaks like a human but isn’t a human is a sin. You are sinners! You are pretending to be God. God will punish you. You cannot make the false humans. You cannot do this. I have been calling the police every day for a week about this ever since I saw your EVIL creation on FOX-25 and they say they are taking notes. They are onto you. I am going to find you. They are going to find you. I am calling the fire department to tell them about you. I am calling the military to tell them about you. I am calling the-

2030: My mother is in the hospital with a plate in her head I saw on the television you have an AI that can do psychology on other AIs can your AI help my mother? She has a plate in her head and needs some help and the doctors say they can’t do anything for her but they are liars. You can help her. Please can you make your AI look at her and diagnose what is wrong with her. She says the plate makes her have nightmares but I studied many religions for many years and believe she can be healed if she thinks about it more and if someone or something helps her think.

2031: Please you have to keep going I cannot be alone any more-

Things that inspired this story: Comments from strangers about AI, online conspiracy forums, bad subreddits, “Turing Tests”, skewed media portrayals of AI, the fact capitalism creates customers for false information which leads to media ecosystems that traffic in fictions painted as facts.

Import AI: #86: Baidu releases a massive self-driving car dataset; DeepMind boosts AI capabilities via neural teachers; and what happens when AIs evolve to do dangerous, subversive things.

by Jack Clark

Boosting AI capabilities with neural teachers:
…AKA, why my small student with multiple expert teachers beats your larger more well-resourced teacherless-student…
Research from DeepMind shows how to boost the performance of a given agent on a task by transferring knowledge from a pre-trained ‘teacher’ agent. The technique yields a significant speedup in training AI agents, and there’s some evidence that agents that are taught attain higher performance than non-taught ones. The technique comes in two flavors: single teacher and multi-teacher; agents pretrained via multiple specialized teachers do better than ones trained by a single entity, as expected.
  Strange and subtle: The approach has a few traits that seem helpful for the development of more sophisticated AI agents: in one task DeepMind tests it on the agent needs to figure out how to use a short-term memory to be able to attain a high score. ‘Small’ agents (which only have two convolutional layers) typically fail to learn to use a memory and therefore cannot achieve scores above a certain threshold, but by training a ‘small’ agent with multiple specialized teachers the researchers create one that can succeed at the task. “This is perhaps surprising because the kickstarting mechanism only guides the student agent in which action to take: it puts no constraint on how the student structures its internal memory state. However, the student can only predict the teacher’s behaviour by remembering information from before the respawn, which seems to be enough supervisory signal to drive short-term memory formation. We find this a wonderful parallel with how the best human educators teach: not telling the student what to think, but simply putting the student in a fruitful position to learn for themselves,” the researchers write.
  Why it matters: Trends like this suggest that scientists can speed their own research by using such pre-trained techniques to better evaluate new agents. This adds further credence to the notion that a key input to (some types of) AI research will shift to being compute from pre-labelled static datasets. Though it should be noted that data here is implicit in the form of a procedural, modifiable simulator that researchers can access). More speculatively, this means it may be possible to use mixtures of teachers to train complex agents that far exceed in capabilities any of their forebears – perhaps an area where the sum really will be greater than its parts.
Read more: Kickstarting Deep Reinforcement Learning (Arxiv).

100,000+ developer survey shows AI concerns:
…What developers think is dangerous and exciting, and who they think is responsible…
Developer community StackOverflow has published the results of its annual survey of its community; this year it asked about AI:
– What developers think is “dangerous” re AI: Increasing automation of jobs (40.8%)
– What developers think is “exciting” re AI: AI surpassing human intelligence, aka the singularity (28%)
– Who is responsible for considering the ramifications of AI:
   – The developers or the people creating the AI: 47.8%
   – A governmental or other regulatory body: 27.9%
– Different roles = different concerns: People that identified as technical specialists tended to say they were more concerned about issues of fairness than the singularity, whereas designers and mobile developers tended to be more concerned about the singularity.
  Read more: Developer Survey Results 2018 (StackOverFlow).

Baidu and Toyota and Berkeley researchers organize self-driving car challenge backed by new self-driving car dataset from Baidu:
…”ApolloScape” adds Chinese data for self-driving car researchers, plus Baidu says it has joined Berkeley’s “DeepDrive” self-driving car AI coalition…
A new competition and dataset may give researchers a better way to measure the capabilities and progression of autonomous car AI.
  The dataset: The ‘ApolloScape’ dataset from Baidu contains ~200,000 RGB images with corresponding pixel-by-pixel semantic annotation. Each frame is labeled from a set of 25 semantic classes that include: cars, motorcycles, sidewalks, traffic cones, trash cans, vegetation, and so on. Each of the images has a resolution of 3384 x 2710, and each frame is separated by a meter of distance. 80,000 images have been released as of March 8 2018.
Read more about the dataset (potentially via Google Translate) here.
  Additional information: Many of the researchers linked to ApolloScape will be talking at a session on autonomous cars at the IEEE Intelligent Vehicles Symposium in China.
Competition: The new ‘WAD’ competition will give people a chance to test and develop AI systems on the ApolloScape dataset as well as a dataset from Berkeley DeepDrive (the DeepDrive dataset consists of 100,000 video clips, each about 40 seconds long, with one key frame from each clip annotated). There is about $10,000 in cash prizes available, and the researchers are soliciting papers on research techniques in: drivable area segmentation (being able to figure out which bits of a scene correspond to which label and which of these areas are safe); road object detection (figuring out what is on the road); and transfer learning from one semantic domain to another, specifically going from training on the Berkeley dataset (filmed in California, USA) to the ApolloScape dataset (filmed in Beijing, China).
   Read more about the ‘WAD’ competition here.

Microsoft releases a ‘Rosetta Stone’ for deep learning frameworks:
…GitHub repo gives you a couple of basic operations displayed in many different ways…
Microsoft has released a GitHub repository containing similar algorithms implemented in a variety of frameworks, including: Caffe2, Chainer, CNTK, Gluon, Keras (with backends CNTK/TensorFlow/Theano), Tensorflow, Lasagna, MXNet, PyTorch, and Julia – Knet. The idea here is that if you read one algorithm in one of these frameworks you’ll be able to use that knowledge to understand the other frameworks.
  “The repo we are releasing as a full version 1.0 today is like a Rosetta Stone for deep learning frameworks, showing the model building process end to end in the different frameworks,” write the researchers in a blog post that also provides some rough benchmarking for training time for a CNN and an RNN.
  Read more: Comparing Deep Learning Frameworks: A Rosetta Stone Approach (Microsoft Tech Net).
View the code examples (GitHub).

Evolution’s weird, wonderful, and potentially dangerous implications for AI agent design:
…And why the AI safety community may be able to learn from evolution…
A consortium of international researchers have published some of the weird, infuriating, and frequently funny ways in which evolutionary algorithms have figured out non-obvious solutions and hacks to tasks they’re asked to solve. The paper includes an illuminating set of examples of ways in which algorithms have subverted the wishes of their human overseers, including:
– Opportunistic Somersaulting: When trying to evolve creatures to jump, some agents discovered that they could instead evolve very tall bodies and then somersault, gaining a reward in proportion to their feet gaining distance from the floor.
– Pointless Programs: When researchers tried to evolve code with GenProg to solve a buggy data sorting program, GenProg evolved a solution that had the buggy program return an empty list, which wasn’t scored negatively as an empty list can’t be out of order as it contains nothing to order.
– Physics Hacking: One robot figured out the correct vibrational frequency to surface a friction bug in the floor of an environment in a physics simulator, letting it propel itself across the ground via the bug.
– Evolution finds a way: Another type of bug is the ways that evolution can succeed even when researchers think such success is impossible, like a six-legged robot that figured out how to walk fast without its feet touching the ground (solution: it flipped itself on its back and used the movement of its legs to propel itself nonetheless).
– And so much more!
The researchers think evolution may also illuminate some of the more troubling problems in AI safety. “The ubiquity of surprising and creative outcomes in digital evolution has other cross-cutting implications. For example, the many examples of “selection gone wild” in this article connect to the nascent field of artificial intelligence safety,” the researchers write. “These anecdotes thus serve as evidence that evolution—whether biological or computational—is inherently creative, and should routinely be expected to surprise, delight, and even outwit us.” (emphasis mine).
  Read more: The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities (Arxiv).

Allen AI puts today’s algorithms to shame with new common sense question answering dataset:
…Common sense questions designed to challenge and frustrate today’s best-in-class algorithms…
Following the announcement of $125 million in funding and a commitment to conducting AI research that pushes the limits of what sorts of ‘common sense’ intelligence machines can manifest, the Allen Institute for Artificial Intelligence has released a new ‘ARC’ challenge and dataset researchers can use to develop smarter algorithms.
  The dataset: The main ARC test contains 7787 natural science questions, split across an easy set and a hard set. The hard set of questions are ones which are answered incorrectly by retrieval-based and word co-occurrence algorithms. In addition, AI2 is releasing the ‘ARC Corpus’, a collection of 14 million science-related sentences with knowledge relevant to ARC, to support the development of ARC-solving algorithms. This corpus contains knowledge relevant to 95% of the Challenge questions, AI2 writes.
Neural net baselines: AI2 is also releasing three baseline models which have been tested on the challenge, achieving some success on the ‘easy’ set and failing to be better than random chance on the ‘hard’ set. These include a decomposable attention model (DecompAttn), Bidirectional Attention Flow (BiDAF), and a decomposed graph entailment model (DGEM). Questions in ARC are designed to test everything from definitional to spatial to algebraic knowledge, encouraging the usage of systems that can abstract and generalize concepts derived from large corpuses of data.
Baseline results: ARC is extremely challenging: AI2 benchmarked its prototype neural net approaches (along with others) discovered that scores top out at 60% on the ‘easy’ set of questions and 27% percent on the more challenging questions.
Sample question:Which property of a mineral can be determined just by looking at it? (A) luster [correct] (B) mass (C) weight (D) hardness“.
SQuAD successor: ARC may be a viable successor to the Stanford Question Answering Dataset (SQuAD) and challenge; the SQuAD competition has recently hit some milestones, with companies ranging from Microsoft to Alibaba to iFlyTek all developing SQuAD solvers that attain scores close to human performance (which is about 82% for ExactMatch and 91% for F1). A close evaluation of SQuAD topic areas gives us some intuition as to why scores are so much higher on this test than on ARC – simply put, SQuAD is easier; it pairs chunks of information-rich text with basic questions like “where do most teachers get their credentials from?” that can be retrieved from the text without requiring much abstraction.
Why it matters: “We find that none of the baseline systems tested can significantly outperform a random baseline on the Challenge set, including two neural models with high performances on SNLI and SQuAD,” the researchers write. The big question now is where this dataset falls on the Goldilocks spectrum — is it too easy (see: Facebook’s early memory networks tests) or too hard or just right? If a system were to get, say, 75% or so on ARC’s more challenging questions, it would seem to be a significant step forward in question understanding and knowledge representation
  Read more: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge (Arxiv).
SQuAD scores available at the SQuAD website.
  Read more: SQuAD: 100,000+ Questions for Machine Comprehension of Text (Arxiv).

Tech Tales:

The Ten Thousand Floating Heads

The Ten-K, also known as The Heads, also sometimes known as The Ten Heads, officially known as The Ten Thousand Floating Heads, is a large-scale participatory AI sculpture that was installed in the Natural History Museum in London, UK, in 2025.

The way it works is like this: when you walk into the museum and breathe in that musty air and look up the near-endless walls towards the ceiling your face is photographed in high definition by a multitude of cameras. These multi-modal pictures of you are sent to a server which adds them to the next training set that the AI uses. Then, in the middle of the night, a new model is trained that integrates the new faces. Then the AI system gets to choose another latent variable to filter by (this used to be a simple random number generator but, as with all things AI, has slowly evolved into an end-to-end ‘learned randomness’ system with some auxiliary loss functions to aid with exploration of unconventional variables, and so on) and then it looks over all the faces in the museum’s archives, studies them in one giant embedding, and pulls out the ten thousand that fit whatever variable it’s optimizing for today.

These ten thousand faces are displayed, portrait-style, on ten thousand tablets scattered through the museum. As you go around the building you do all the usual things, like staring at the dinosaur bones, or trudging through the typically depressing and seemingly ever-expanding climate change exhibition, but you also peer into these tablets and study the faces that are being shown. Why these ten thousand?, you’re meant to think. What is it optimizing for? You write your guess on a slip of paper or an email or a text and send it to the museum and at the end of the day the winners get their names displayed online and on a small plaque which is etched with near-micron accuracy (so as to avoid exhausting space) and is installed in a basement in the museum and viewable remotely – machines included – via a live webcam.

The correct answers for the variable it optimizes for are themselves open to interpretation, as isolating them and describing what they mean has become increasingly difficult as the model gets larger and incorporates more faces. It used to be easy: gender, hair color, eye color, race, facial hair, and so on. But these days it’s very subtle. Some of the recent names given to the variables include: underslept but well hydrated, regretful about a recent conversation, afraid of museums, and so on. One day it even put up a bunch of people and no one could figure out the variable and then six months later some PHD student did a check and discovered half the people displayed that day had subsequently died of one specific type of cancer.

Recently The Heads got a new name: The Oracle. This has caused some particular concern within certain specific parts of government that focus on what they euphemistically refer to as ‘long-term predictors’. The situation is being monitored.

Things that inspired this story: t-SNE embeddings, GANs, auxiliary loss functions, really deep networks, really big models, facial recognition, religion, cults.

Import AI: #85: Keeping it simple with temporal convolutional networks instead of RNNs, learning to prefetch with neural nets, and India’s automation challenge.

by Jack Clark

Administrative note: a somewhat abbreviated issue this week as I’ve been traveling quite a lot and have chosen sleep above reading papers (gasp!).

It’s simpler than you think: researchers show convolutional networks frequently beat recurrent ones:
The rise and rise of simplistic techniques continues…
Researchers with Carnegie Mellon University  and Intel Labs have rigorously tested the capabilities of convolutional neural networks (via a ‘temporal convolutional network’ (TCN) architecture, inspired by Wavenet and other recent innovations) against sequence modeling architectures like Recurrent Nets (via LSTMs and GRUs). The advantages of TCNs for sequence modeling are as follows: easily parallelizable rather than relying on sequential processing; a flexible receptive field size; stable gradients; low memory requirements for training; and variable length inputs. Disadvantages include: a greater data storage need than RNNs; parameters need to be fiddled with when shifting into different data domains.
  Testing: The researchers test out TCNs against RNNS, GRUs, and LSTMs on a variety of sequence modeling tasks, ranging from MNIST, to adding and copy tasks, to word-level and character-level perplexity on language tasks. In nine out of eleven cases the TCN comes out far ahead of other techniques, in one of the eleven cases it roughly matches GRU performance, and in another case it is noticeably worse then an LSTM (though still comes in second).
  What happens now: “The preeminence enjoyed by recurrent networks in sequence modeling may be largely a vestige of history. Until recently, before the introduction of architectural elements such as dilated convolutions and residual connections, convolutional architectures were indeed weaker. Our results indicate that with these elements, a simple convolutional architecture is more effective across diverse sequence modeling tasks than recurrent architectures such as LSTMs. Due to the comparable clarity and simplicity of TCNs, we conclude that convolutional networks should be regarded as a natural starting point and a powerful toolkit for sequence modeling,” write the researchers.
  Why it matters: One of the most confusing things about machine learning is that it’s a defiantly empirical science, with new techniques appearing and proliferating in response to measured performance on given tasks. What studies like this indicate is that many of these new architectures could be overly complex relative to their utility and it’s likely that, with just a few tweaks, the basic building blocks still reign supreme; we’ve seen a similar phenomenon with basic LSTMs and GANs doing better than many other more-recent innovations, given thorough analysis. In one sense this seems good as it seems intuitive that simpler architectures tend to be more flexible and general, and in another sense it’s unnerving, as it suggests much of the complexity that abounds in AI is an artifact of empirical science rather than theoretically justified.
  Read more: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (Arxiv).
  Code for the TCN used in the experiments here (GitHub).

Automation & economies: it’s complicated:
…Where AI technology comes from, why automation could be challenging for India, and more…
In a podcast, three employees of the McKinsey Global Institute discuss how automation will impact China, Europe, and India. Some of the particularly interesting points include:
– China has an incentive to automate its own industries to improve labor productivity, as its labor pool has peaked and is now in similar demographic-based decline as other developed economies.
– The supply of AI technology seems to come from the United States and China, with Europe lagging.
– “A large effect is actually job reorganization. Companies adopting this technology will have to reorganize the type of jobs they offer. How easy would it be to do that? Companies are going to have to reorganize the way they work to make sure they get the juice out of this technology.”
– India may struggle as it transitions tens to hundreds of millions of people out of agriculture jobs. “We have to make this transition in an era where creating jobs out of manufacturing is going to be more challenging, simply because of automation playing a bigger role in several types of manufacturing.”
Read more: How will automation affect economies around the world? (McKinsey Global Institute).

DeepChem 2.0 bubbles out of the lab:
…Open source scientific computing platform gets its second major release…
DeepChem’s authors have released version 2.0 of the scientific computing library, bringing with it improvements to the TensorGraph API, tools for molecular analysis, new models, tutorial tweaks and adds, and a whole host of general improvements. DeepChem “aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology.”
  Read more: DeepChem 2.0 release notes.
  Read more: DeepChem website.

Google researchers tackle prefetching with neural networks:
…First databases, now memory…
One of the weirder potential futures of AI is one where the fundamental aspects of computing, like implementing systems that search over database indexes or prefetch data to boost performance, are mostly learned rather than pre-programmed. That’s the idea in a new paper from researchers at Google, which tries to use machine learning techniques to solve prefetching, which is “the process of predicting future memory accesses that will miss in the on-chip cache and access memory based on past history”. Prefetching is a somewhat fundamental problem, as the better one becomes at prefetching, the higher the chance of being able to better intuit which data to load-in to memory before it is called upon, which increases the performance of your system.
  How it works: Can prefetching be learned? “Prefetching is fundamentally a regression problem. The output space, however, is both vast and extremely sparse, making it a poor fit for standard regression models,” the Google researchers write. Instead, they turn to using LSTMs and find that two variants are able to demonstrate competitive prefetching performance when compared to handwritten systems. “The first version is analogous to a standard language model, while the second exploits the structure of the memory access space in order to reduce the vocabulary size and reduce the model memory footprint,” the researchers write. They test out their approach on data from Google’s web search workload and demonstrate competitive performance.
  “The models described in this paper demonstrate significantly higher precision and recall than table-based approaches. This study also motivates a rich set of questions that this initial exploration does not solve, and we leave these for future research,” they write. This research is philosophically similar to work from Google last autumn in using neural networks to learn database index structures (covered in #73), which also found that you could learn indexes that had competitive to superior performance to hand-tuned systems.
  One weird thing: When developing one of their LSTMs the researchers created a t-SNE embedding of the program counters ingested by the system and discovered that the learned features contained quite a lot of information. “The t-SNE results also indicate that an interesting view of memory access traces is that they are a reflection of program behavior. A trace representation is necessarily different from e.g., input-output pairs of functions, as in particular, traces are a representation of an entire, complex, human-written program,” they write.
  Read more: Learning Memory Access Patterns (Arxiv).

Learning to play video games in minutes instead of days:
…Great things happen when AI and distributed systems come together…
Researchers with the University of California at Berkeley have come up with a way to further optimize large-scale training of AI algorithms by squeezing as much efficiency as possible out of underlying compute infrastructure. Their new technique makes it possible for them to train reinforcement learning agents to master Atari games in under ten minutes on an NVIDIA DGX-1 (which contains 40 CPUs and 8 P100 GPUS). Though the sample efficiency of these algorithms is still massively sub-human (requiring millions of frames to approximate the performance of humans trained on thousands to tens of thousands of frames) it’s interesting that we’re now able to develop algorithms that approximate flesh-and-blood performance in roughly similar wall clock time.
  Results: The researchers show that given various distributed systems tweak its possible for algorithms like A2C, A3C, PPO, and APPO to attain good performance on various games in a few minutes.
  Why it matters: Computers are currently functioning like telescopes for certain AI researchers – the bigger your telescope, the farther you can see into the limit of scaling properties of various AI algorithms. We still don’t fully understand the limits here, but research like this indicates that as new compute substrates come alone it may be able to scale RL algorithms to achieve very impressive feats in relatively little time. But there are more unknowns than knowns right now – what an exciting time to be alive! “We have not conclusively identified the limiting factor to scaling, nor if it is the same in every game and algorithm. Although we have seen optimization effects in large-batch learning, we do not know their full nature, and other factors remain possible. Limits to asynchronous scaling remain unexplored; we did not definitively determine the best configurations of these algorithms, but only presented some successful versions,” they write.
  Read more: Accelerated Methods for Deep Reinforcement Learning (Arxiv).

OpenAI Bits&Pieces:

OpenAI Scholars: Funding for underrepresented groups to study AI:
OpenAI is providing 6-10 stipends and mentorship to individuals from underrepresented groups to study deep learning full-time for 3 months and open-source a project. You’ll need US employment authorization and will be provided with a stipdend of $7.5k per month while doing the program, as well as $25,000 AWS credits.
  Read more: OpenAI Scholars (OpenAI blog).

Tech Tales:

John Henry 2.0

No one places any bets on it internally asides from the theoretical physicists who, by virtue of their field of study, had a natural appreciation for very long odds. Everyone else just assumed the machines would win. And they were right, though I’m not sure in the way they were expecting.

It started like this: one new data center was partitioned into two distinct zones. In one of the zones we applied the best, most interpretable, most rule-based systems we could to every aspect of the operation, ranging from the design of the servers, to the layout of motherboards, to the software used to navigate the compute landscape, and so on. The team tasked with this data center had an essentially limitless budget for infrastructure and headcount. In the other zone we tried to learn everything we could from scratch, so we assigned AI systems to figure out: the types of computers to deploy in the data center, where to place these computers to minimize latency, how to aggressively power these devices up or down in accordance with observed access patterns, how to learn to effectively index and store this information, knowing when to fetch data into memory, figuring out how to proactively spin-up new clusters in anticipation of jobs that had not happened yet but were about to happen, and so on.

You can figure out what happened: for a time, the human-run facility was better and more stable, and then one day the learned data center was at parity with it in some areas, then at parity in most areas, then very quickly started to exceed its key metrics ranging from uptime to power consumption to mean-time-between-failure for its electronic innards. The human-heavy team worked themselves ragged trying to keep up and many wonderful handwritten systems were created that further pushed the limit of what we knew theoretically and could embody in code.

But the learned system kept going, uninhibited by the need for a theoretical justification for its own innovations, instead endlessly learning to exploit strange relationships that were non-obvious to us humans. But transferring insights gleaned from this system into the old rule-based one was difficult, and tracking down why something had seen such a performance increase in the learned regime was an art in itself: what tweak made this new operation so successful? What set of orchestrated decisions had eked out this particular practise?

So now we build things with two different tags on them: need-to-know (NTK) and hell-if-I-know (HIIK). NTK tends to be stuff that has some kind of regulation applied to it and we’re required to be able to explain, analyze, or elucidate for other people. HIIK is the weirder stuff that is dealing in systems that don’t handle regulated data – or, typically, any human data at all – or are parts of our scientific computing infrastructure, where all we care about is performance.

In this way the world of computing has split in two, with some researchers working on extending our theoretical understanding to further boost the performance of the rule-based system, and an increasingly large quantity of other researchers putting theory aside and spending their time feeding what they have taken to calling the ‘HIIKBEAST’).

Things that inspired this story: Learning indexes, learning device placement, learning prefetching, John Henry, empiricism.

Import AI: #84: xView dataset means the planet is about to learn how to see itself, a $125 million investment in common sense AI, and SenseTime shows off TrumpObama AI face swap

by Jack Clark

Chinese AI startup SenseTime joins MIT’s ‘Intelligence Quest’ initiative:
…Funding plus politics in one neat package…
Chinese AI giant SenseTime is joining the ‘MIT Intelligence Quest’, a pan-MIT AI research and development initiative. The Chinese company specializes in facial recognition and self-driving cars and has signed strategic partnerships with large companies like Honda, Qualcomm, and others. At an AI event at MIT recently SenseTime’s founder Xiao’ou Tang gave a short speech with a couple of eyebrow-raising demonstrations to discuss the partnership. “I think together we will definitely go beyond just deep learning we will go to the uncharted territory of deep thinking,” Tang said.
  Data growth: Tang said SenseTime is developing better facial recognition algorithms using larger amounts of data, saying the company in 2016 improved its facial recognition accuracy to “one over a million” using 60 million photos, then in 2017 improved that to “one over a hundred million” via a dataset of two billion photos. (That’s not a typo.)
  Fake Presidents: He also gave a brief demonstration of a SenseTime synthetic video project which generatively morphed footage of President Obama speaking into President Trump speaking, and vice versa. I recorded a quick video of this demonstration which you can view on Twitter here (Video).
Read more: MIT and SenseTime announce effort to advance artificial intelligence research (MIT).

Chinese state media calls for collaboration on AI development:
…Xinhua commentary says China’s rise in AI ‘is a boon instead of a threat’…
A comment piece in Chinese state media Xinhua tries to debunk some of the cold war lingo surrounding China’s rise in AI, pushing back on accusations that Chinese AI is “copycat” and calling for more cooperation and less competition. Liu Qingfeng, iFlyTek’s CEO, told Xinhua at CES that massive data sets, algorithms and professionals are a must-have combination for AI, which “requires global cooperation” and “no company can play hegemony”, Xinhua wrote.
Read more: Commentary: AI development needs global cooperation, not China-phobia (Xinhua).

New xView dataset represents a new era of geopolitics as countries seek to automate the analysis of the world:
…US defense researchers release dataset and associated competition to push the envelop on satellite imagery analysis…
Researchers with the DoD’s Defense Innovation Unit Experimental (DIUx), DigitalGlobe, and the National Geospatial-Intelligence Agency, have released xView, a dataset and associated competition used to assess the ability for AI methods to classify overhead satellite imagery. xView includes one million distinct objects across 60 classes, spread across 1,400km2 of satellite imagery with a maximum ground sample resolution of 0.3m. The dataset is designed to test various frontiers of image recognition, including: learning efficiency, fine-grained class detection, and multiscale recognition, among others. The competition includes $100,000 of prize money, along with compute credits.
Why it matters: The earth is beginning to look at itself. As launch capabilities get cheaper via new rockets like SpaceX, Rocket Labs, etc, better hardware comes online as a consequent of further improvements in electronics, and more startups stick satellites into orbit, the amount of data available about the earth is going to grow by several orders of magnitude. If we can figure out how to analyze these datasets using AI techniques we can ultimately better respond to the changes in our planet and to marshal resources for the purposes of remediating natural disasters and, more generally, to better equip large losticis organizations like militaries to better understand the world around them and plan and act accordingly. A new era of high-information geopolitics is approaching…
  I spy with my satelite eye: xView includes numerous objects with parent classes and sub-classes, such as ‘maritime vessels’ with sub-classes including sailboat and oil tanker. Other classes include fixed wing aircraft, passenger vehicles, trucks, engineering vehicles, railway vehicles, and buildings. “xView contributes a large, multi-class, multi-location dataset in the object detection and satellite imagery space, built with the benchmark capabilities of PASCAL VOC, the quality control methodologies of COCO, and the contributions of other overhead datasets in mind,” they write. Some of the most frequently covered objects in the dataset include buildings and small cars, while some of the rarest include vehicles like a reach stacker and a tractor, and vessels like an oil tanker.
  Baseline results: The researchers created a classification baseline via implementing a Single Shot Multibox Detector meta-architecture (SSD) and testing it on three variants of the dataset: standard xView, multi-resolution, and multi-resolution augmented via image augmentation. The best results were found from training on the multi-resolution dataset, with accuracies climbing to as high as over 67% for cargo planes. The scores are mostly pretty underwhelming, so it’ll be interesting to see what scores people get when they apply more sophisticated deep learning-based methods to the problem.
  Milspec data precision: “We achieved consistency by having all annotation performed at a single facility, following detailed guidelines, with output subject to multiple quality control checks. Workers extensively annotated image chips with bounding boxes using an open source tool,” write the authors. Other AI researchers may want to aspire to equally high standards, if they can afford it.
  Read more: xView: Objects in Context in Overhead Imagery (Arxiv).
  Get the dataset: xView website.

Adobe researchers try to give robots a better sense of navigation with ‘AdobeIndoorNav’ dataset:
…Plus: automating data collection with Tango phones + commodity robots…
Adobe researchers have released AdobeIndoorNav, a dataset intended to help robots navigate the real-world. The contains 3,544 distinct locations across 24 individual ‘scenes’ that a virtual robot can learn to navigate. Each scene corresponds to a real-world location and contains a 3D reconstruction via a point cloud, a 360-degree panoramic view, and front/back/left/right views from the perspective of a small ground-based robot. Combined, the dataset gives AI researchers a set of environments to develop robot navigation systems in. “The proposed setting is an intentionally simplified version of real-world robot visual navigation with neither moving obstacles nor continuous actuation,” the researchers write.
  Why it matters: For real-world robotic AI systems to be more useful they’ll have to be capable of being dropped into novel locations and figuring out how to navigate themselves around to specific targets. This research shows that we’re still a long, long way away from theoretical breakthroughs that give us this capability, but does include some encouraging signs for our ability to automate the necessary data gathering process to create the datasets needed to develop baselines to evaluate new algorithms on.
  Data acquisition: The researchers used a Lenovo Phab 2 Tango phone to scan each scene by hand to create a 3D point cloud, which they then automatically decomposed into a map of specific obstacles as well as a 3D map. A ‘Yujin Turtlebot 2‘ robot then uses these maps along with its onboard laser scanner, RGB-D camera, and 360 camera to navigate around the scene and take a series of high resolution 360 photos, which it then stitches into a coherent scene.
  Results: The researchers prove out the dataset by creating a baseline agent capable of navigating the scene. Their A3C agent with an LSTM network learns to successfully navigate from one location in any individual scene to another location, frequently figuring out routes that involve only a couple more steps than the theoretical minimum. The researchers also show a couple of potential extensions of this technique to further improve performance, like augmentations to increase the amount of spatial information which the robot incorporates into its judgements.
Read more: The AdobeIndoorNav Dataset: Towards Deeo Reinforcement Learning based Real-world Indoor Robot Visual Navigation (Arxiv).

Allen Institute for AI gets $125 million to pursue common sense AI:
Could an open, modern, ML-infused Cyc actually work? That’s the bet…
Symbolic AI approaches have a pretty bad rap – they were all the rage in the 80s and 90s but after lots of money invested and few major successes have since been eclipsed by deep learning-based AI approaches. The main project of note in this area is Doug Lenat’s Cyc which has, much like fusion power, been just a few years away from a major breakthrough for… three decades. But that doesn’t mean symbolic approaches are worthless, they might just be a bit underexplored and in need of revitalization – many people tell me that symbolic systems are being used all the time today but they’re frequently proprietary or secret (aka, military) in nature. But, still, evidence is scant. So it’s interesting that Paul Allen (formerly co-founder of Microsoft) is investing $125 million over three years into his Allen Institute for Artificial Intelligence to launch Project Alexandria, an initiative that seeks to create a knowledge base that fuses machine reading and language and vision projects with human-annotated ‘common sense’ statements.
  Benchmarks: “This is a very ambitious long-term research project. In fact, what we’re starting with is just building a benchmark so we can assess progress on this front empirically,” said AI2’s CEO Oren Etzioni in an interview with GeekWire. “To go to systems that are less brittle and more robust, but also just broader, we do need this background knowledge, this common-sense knowledge.”
  Read more: Allen Institute for Artificial Intelligence to Pursue Common Sense for AI (Paul Allen.)
  Read more: Project Alexandria (AI2).
Read more:
Oren Etzioni interview (GeekWire).

Russian researchers use deep learning to diagnose fire damage from satellite imagery:
…Simple technique highlights generality of AI tools and implications of having more readily available satellite imagery for disaster response…
Researchers with the Skolkovo institute of Science and technology in Moscow have published details on how they applied machine learning techniques to automate the analysis of satellite images of the Californian wildfires of 2017. The researchers use DigitalGlobe satellite imagery of Ventura and Santa Rosa countries before and after the fires swept through to create a dataset of pictures containing around 1,000 buildings (760 non-damaged ones and 320 burned ones), then used a pre-trained ImageNet network (with subsequent finetuning) to learn to classify burned versus non-burned buildings with an accuracy of around 80% to 85%.
  Why it matters: Stuff like this is interesting mostly because of hte implicit time savings, where once you have annotated a dataset it is relatively easy to train new models to improve classification in line with new techniques. The other component necessary for techniques like this to be useful will be the availability of more frequently updated satellite imagery, but there are startups working in this space already like Planet Labs and others, so that seems fairly likely.
  Read more: Satellite imagery analysis for operational damage assessment in Emergency situations (Arxiv).

Google researchers figure out weird trick to improve recurrent neural network long-term dependency performance:
…Auxiliary losses + RNNs make for better performance…
Memory is a troublesome thing with neural networks, and figuring out how to give networks a better representative capacity has been a long-standing problem in the field. Now, researchers with Google have proposed a relatively simple tweak to recurrent neural networks that lets them model longer-time dependencies, potentially opening RNNs up to working on problems that require a bigger memory. The technique involves augmenting RNNs with an unsupervised auxiliary loss that either tries to model relationships somewhere through the network, or project forward over a relatively short distance, and in doing so lets the RNN learn to represent finer-grained structures over longer timescales. Now we need to figure out what those problems are and evaluate the systems further.
  Evaluation: Long time-scale problems are still in their chicken and egg phase, where it’s difficult to figure out the appropriate methods we can use to test them. One approach is pixel-by-pixel image prediction, which is where you feed each individual pixel into a long-term system – in this case an RNN augmented by the proposed technique – and see how effectively it can learn to classify the image. The idea here is that if it’s reasonably good at classifying the image then it is able to learn high-level patterns from the pixels which have been fed into it, which suggests that it is remembering something useful. The researchers test their approach on images ranging in pixel length from 784 to 1024 (CIFAR-10) all the way up to around ~16,000 (via the ‘StanfordDogs’ dataset).
Read more: Learning Longer-term Dependencies in RNNs with Auxiliary Losses (Arxiv).

Alibaba applies reinforcement learning to optimizing online advertising:
…Games and robots are cool, but the rarest research papers are the ones that deal with actual systems that make money today…
Chinese e-commerce and AI giant Alibaba has published details on a reinforcement learning technique that, it says, can further optimize adverts in sponsored search real-time bidding auctions. The algorithm, M-RMDP (Massive-agent Reinforcement Learning with robust Markov Decision Process), improves ad performance and lowers the potential price per ad for advertisers, providing an empirical validation that RL could be applied to highly tuned, rule-based heuristic systems like those found in much of online advertising. Notably, Google has published very few papers on this area, suggesting Alibaba may be publishing in this strategic area because a) it believes it is still behind Google and others in this area and b) by publishing it may be able to tempt over researchers who wish to work with it. M-RMDP’s main contribution is being able to model the transitions between different auction states as demand waxes and wanes through the day, the researchers say.
Method and scale: Alibaba says it designed the system to deal with what it calls the “massive-agent problem”, which is figuring out a reinforcement learning method that can handle “thousands or millions of agents”. For the experiments in the paper it deployed its training infrastructure onto 1,000 CPUs and 40 GPUs.
  Results: The company picked 1000 ads from the Alibaba search auction platform and collect two days worth of data for training and testing. It tested the effectiveness of its system by simulating reactions within its test set. Once it had used this offline evaluation to prove out the provisional effectiveness of its approach it carried out an online test and find that their M-RMDP approach substantially improves the return on investment for advertisers in terms of ad effectiveness, while marginally reducing the PPC cost, saving them money.
Why it matters: Finding examples of reinforcement learning being used for practical, money-making tasks is typically difficult; many of the technology’s most memorable or famous results involve mastering various video games or board games or, more recently, controlling robots performing fairly simple tasks. So it’s a nice change to have a paper that involves deep reinforcement learning doing something specific and practical: learning to bid on online auctions.
  Read more: Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Arxiv).

OpenAI Bits & Pieces:

Improving robotics research with new environments, algorithms, and research ideas:
…Fetch models! Shadow Hands! HER Baselines! Oh my!…
We’ve released a set of tools to help people conduct research on robots, including new simulated robot models, a baseline implementation of the Hindsight Experience Replay algorithm, and a set of research ideas for HER.
Read more: Ingredients for Robotics Research (OpenAI blog).

Tech Tales:

Play X Time.

It started with a mobius strip and it never really ended: after many iterations the new edition of the software, ToyMaker V1.0, was installed in the ‘Kidz Garden’ – an upper class private school/playpen for precocious children ages 2 to 4 – on the 4th of June 2022, and it was a hit immediately. Sure, the kids had seen 3D printers before – many of them had them in their homes, usually the product of a mid-life crisis of one of their rich parents; usually a man, usually a finance programmer, usually struggling against the vagaries of their own job and seeking to create something real and verifiable. So the kids weren’t too surprised when ToyMaker began its first print. The point when it became fascinating to them was after the print finished and the teacher snapped off the freshly printed mobius strip and handed it to one of the children who promptly sneezed and rubbed the snot over its surface – at that moment one of the large security cameras mounted on top of the printer turned to watch the child. A couple of the others kids noticed and pointed and hten tugged at the sleeve of the snot kid who looked up at the camera which looked back at them. They held up the mobius strip and the camera followed it, then they pulled it back down towards them and the camera followed that too. They passed the mobius strip to another kid who promptly tried to climb on it, and the camera followed this and then the camera followed the teacher as they picked up the strip and chastised the children. A few minutes later the children were standing in front of the machine dutifully passing the mobius strip between eachother and laughing as the camera followed it from kid to kid to kid.
“What’s it doing?” one of them said.
“I’m not sure,” said the teacher, “I think it’s learning.”
And it was: the camera fed into the sensor system for the ToyMaker software, which treated these inputs as an unsupervised auxiliary loss, which would condition the future objects it printed and how it made them. At night when the kids went home to their nice, protected flats and ate expensive, fiddly food with their parents, the machine would simulate the classroom and different perturbations of objects and different potential reactions of children. It wasn’t alone: ToyMaker 1.0 software was installed on approximately a thousand other printers spread across the country in other expensive daycares and private schools, and so as each day passed they collectively learned to try to make different objects, forever monitoring the reactions of the children, growing more adept at satisfying them via a loss function which was learned, with the aid of numerous auxiliary losses, through interaction.
So the next day in the Kidz Garden the machine printed out a Mobius Strip that now had numerous spindly-yet-strong supports linking its sides together, letting the children climb on it.
The day after that it printed intertwined ones; two low-dimensional slices, linked together but separate, and also climbable.
Next: the strips had little gears embedded in them which the children could run their hands over and play with.
Next: the gears conditioned the proportions of some aspects of the strip, allowing the children to manipulate dimensional properties with the spin of various clever wheels.
And so it went like this and is still going, though as the printing technologies have grown better, and the materials more complex, the angular forms being made by these devices have become sufficiently hard to explain that words do not suffice: you need to be a child, interacting with them with your hands, and learning the art of interplay with a silent maker that watches you with electronic eyes and, sometimes – you think when you are going to sleep – nods its camera head when you snot on the edges, or laugh at a surprising gear.

Technologies that inspired this story: Fleet learning, auxiliary losses, meta-learning, CCTV cameras, curiosity, 3D printing.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at jack@jack-clark.net or tweet at me@jackclarksf

Import AI #83: Cloning voices with a few audio samples, why malicious actors might mess with AI, and the industryacademia compute gap.

by Jack Clark

### IMPENDING PROBLEM KLAXON ###
Preparing for Malicious Uses of AI:
…Bad things happen when good people unwittingly release AI platforms that bad people can modify to turn good AIs into bad AIs…
AI, particularly deep learning, is a technology of such obvious power and utility that it seems likely malicious actors will pervert the technology and use it in ways it wasn’t intended. That has happened to basically every other significant technology of note: axes can be used to chop down trees or cut off heads, electricity can light a home or electrocute a person, a lab bench can be used to construct cures or poisons, and so on. But AI has some other characteristics that make it particularly dangerous: it’s, to use a phrase Rodney Brooks has used in the past to describe robots, “fast, cheap, and out of control”; today’s AI systems run on generic hardware, are mostly embodied in open source software, and are seeing capabilities increase according to underlying algorithmic and compute progress, both of which are happening in the open. That means the technology holds the possibility of doing immense good in the world as well as doing immense harm – and currently the AI community is broadly making everything available in the open, which seems somewhat acceptable today but probably unacceptable in the future given a few cranks more of Moore’s Law combined with algorithmic progression.
  Omni-Use Alert: AI is more than a ‘dual-use’ technology, it’s an omni-use technology. That means that figuring out how to regulate it to prevent bad people doing bad things with it is (mostly) a non-starter. Instead, we need to explore new governance regimes, community norms, standards on information sharing, and so on.
  101 Pages of Problems: If you’re interested in taking a deeper look at this issue check out this report which a bunch of people (including me) spent the last year working on: The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation (Arxiv). You can also check out a summary via this OpenAI blog post about the report. I’m hoping to broaden the discussion of Omni-Use AI in the coming months and will be trying to host events and workshops relating to this question. If you want to chat with me about it, then please get in touch. We have a limited window of time to act as a community before dangerous things start happening – let’s get to work.

Baidu clones voices with few samples:
Don’t worry about the omni-use concerns
Baidu research has trained an AI that can listen to a small quantity of a single person’s voice and then use that information to condition any network to sound like that person. This form of ‘adaptation’ is potentially very powerful, especially when trying to create AI services that work for multiple users with multiple accents, but it’s also somewhat frightening, as if it gets much better it will utterly compromize our trust in the aural domain. However, the ability of the system to clone speech today still leaves much to be desired, with the best performing systems requiring a hundred distinct voice samples and still sounding like a troll speaking from the bottom of a well, so we’ve got a few more compute turns yet before we run into real problems – but they’re coming.
  What it means: Techniques like this bring closer the day when a person can say something into a compromized device, have their voice recorded by a malicious actor, and have that sample be used to train new text-to-speech systems to say completely new things. Once that era arrives then the whole notion of “trust’ and audio samples of a person’s voice will completely change, causing normal people to worry about these sorts of things as well as state-based intelligence organizations.
  Results: To get a good idea of the results, listen to the samples on this web page her (Voice Cloning: Baidu).
  Read more: Neural Voice Cloning with a Few Samples (Baidu Blog).
  Read more: Neural Voice Cloning with a Few Samples (Arxiv).

Why robots in the future could be used as speedbumps for pedestrians:
…Researchers show how people slow down in the presence of patrolling robots…
Researchers with the Department of Electrical and Computer Engineering at the Stevens Institute of Technology in Hoboken, New Jersey, have examined how crowds of people react to robots. Their research is a study of “passive Human Robot Interaction (HRI) in an exit corridor for the purpose of robot-assisted pedestrian flow regulation.”
  The results: “Our experimental results show that in an exit corridor environment, a robot moving in a direction perpendicular to that of the uni-directional pedestrian flow can slow down the uni-directional flow, and the faster the robot moves, the lower the average pedestrian velocity becomes. Furthermore, the effect of the robot on the pedestrian velocity is more significant when people walk at a faster speed,” they write. In other words: pedestrians will avoid a dumb robot moving right in front of them.
  Methods: To conduct the experiment, the researchers used a customized ‘Adept Pioneer P3-DX mobile robot’ which was programmed to move at various speeds perpendicular to the pedestrian flow direction. To collect data, they outfitted a room with five Microsoft Kinect 3D sensors along with pedestrian detection and tracking via OpenPTrack.
  What it means: As robots become cheap thanks to a proliferation of low-cost sensors and hardware platforms it’s likely that people will deploy more of them into the real world. Figuring out how to have very dumb, non-reactive robots do useful things will further drive adoption of these technologies and yield to increasing economies of scale to further lower the cost of the hardware platform and increase the spread of the technology. Based on this research, you can probably look forward to a future where airports and transit systems are thronged with robots shuttling to and fro across crowded routes, exerting implicit crowd-speed-control through thick-as-a-brick automation.
  Read more: Pedestrian-Robot Interaction Experiments in an Exit Corridor (Arxiv).

Why your next self-driving car could be sent to you with the help of reinforcement learning:
…Researchers with Chinese ride-hailing giant Didi Chuxing simulate and benchmark RL algorithms for strategic car assignment…
Researchers from Chinese ride-hailing giant Didi Chuxing and Michigan State University have published research on using reinforcement learning to better manage the allocation of vehicles across a given urban area. The researchers propose two algorithms to tackle this: contextual multi-agent actor-critic (cA2C) and contextual deep Q-learning (cDQN); both algorithms implement tweaks to account for geographical no-go areas (like lakes) and for the presence of other collaborative agents. The algorithms’ reward function is “to maximize the gross merchandise volume (GMV: the value of all the orders served) of the platform by repositioning available vehicles to the locations with larger demand-supply gap than the current one”.
  The dataset and environment: The researchers test their algorithms in a custom-designed large-scale gridworld which is fed with real data from Didi Chuxing’s fleet management system. The data is based on rides taken in Chengdu China over four consecutive weeks and includes information on order price, origin, destination, and duration; as well as the trajectories and status of real Didi vehicles.
  The results: The researchers test out their approach by simulating the real past scenarios without fleet management; with a bunch of different techniques including T-SARSA, DQN, Value-Iteration, and others; then by implementing the proposed RL-based methods. CDQN and c2A2C attain significantly higher rewards than all the baselines, with performance marginally above (i.e – slightly above the statistical error threshold) stock DQN.
  Why it matters: Welcome to the new era of platform capitalism, where competition is meted out by GPUs humming at top-speeds, simulating alternative versions of commercial worlds. While the results in this paper aren’t particularly astonishing they are indicative of how large platform companies will approach the deployment of AI systems in the future: gather as much data as possible, build a basic simulator that you can plug real data into, then vigorously test AI algorithms. This suggests that the larger the platform, the better the data and compute resources it can bring to bear on increasingly high-fidelity simulations; all things equal, whoever is able to build the most efficient and accurate simulator will likely best their competitor in the market.
  Read more: Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning (Arxiv).

Teacups and AI:
…Google Brain’s Eric Jang explains the difficulty of AI through a short story…
How do you define a tea cup? That’s a tough question. And the more you try to define it via specific visual attributes the more likely you are to offer a narrow description that is limited in other ways, or runs into the problems of an obtuse receiver. Those are some of the issues that Eric Jang explores in this fun little short story about trying to define teacups.
   Read more: Teacup (Eric Jang, Blogspot.)

CMU researchers add in attention for better end-to-end SLAM:
…The dream of neural SLAM gets closer…
Researchers with Carnegie Mellon University and Apple have published details on Neural Graph Optimizer, a neural approach to the perennially tricky problem of simultaneous location and mapping (SLAM) for agents that move through a varied world. Any system that aspires to doing useful stuff in the real world needs to have SLAM capabilities. Today, neural network SLAM techniques struggle with problems encountered in day-to-day life like faulty sensor calibration and unexpected changes in lighting. The proposed Neural Graph Optimizer system consists of multiple specialized modules to handle different SLAM problems, but each module is differentiable so the entire system can be trained end-to-end – a desirable proposition, as this cuts down the time it takes to test, experiment, and iterate with such systems. The different modules handle different aspects of the problem ranging from local estimates (where are you based on local context) to global estimates (where are you in the entire world) and incorporate attention-based techniques to help automatically correct errors that accrue during training.
  Results: The researchers test the system against its ability to navigate a 2D gridworld maze as well as a more complex 3D maze based on the Doom game engine. Experiments show that it is better able to consistently map the location of something to its real groundtruth location relative to preceding systems.
  Why it matters: Techniques like this bring closer the era of being able to chuck out huge chunks of hand-designed SLAM algorithms and replace them with a fully learned substrate. That will be exceptionally useful for the test and development of new systems and approaches, though it’s unlikely to displace traditional SLAM methods in the short-term as it’s likely neural networks will continue to display quirks that make them impractical for usage in real world systems.
  Read more: Global Pose Estimation with an Attention-based Recurrent Network (Arxiv).

AI stars do a Reddit AMA, acknowledge hard questions:
…Three AI luminaries walk into a website, [insert joke]…
Yann LeCun, Peter Norvig, and Eric Horvitz did an Ask Me Anything (AMA) on Reddit recently where they were confronted with a number of the hard questions that the current AI boom is raising. It’s worth reading the whole AMA, but a couple of highlights below.
  The compute gap is real: “My NYU students have access to GPUs, but not nearly as many as when they do an internship at FAIR,” says Yann LeCun. But don’t be disheartened, he points out that despite lacking computers academia will likely continue to be the main originator for novel ideas which industry will then scale up. “You don’t want to put you [sic] in direct competition with large industry teams, and there are tons of ways to do great research without doing so.”
  The route to AGI: Many questions asked the experts about the limits of deep learning and implicitly probed for research avenues that could yield more flexible, powerful intelligences.
      Eric Horvitz is interested in the symphony approach: “Can we intelligently weave together multiple competencies such as speech recognition, natural language, vision, and planning and reasoning into larger coordinated “symphonies” of intelligence, and explore the hard problems of the connective tissue—of the coordination. ”
    Yann LeCun: “getting machines to learn predictive models of the world by observation is the biggest obstacle to AGI. It’s not the only one by any means…My hunch is that a big chunk of the brain is a prediction machine. It trains itself to predict everything it can (predict any unobserved variables from any observed ones, e.g. predict the future from the past and present). By learning to predict, the brain elaborates hierarchical representations.”
  Read more: AMA AI researchers from Facebook, Google, and Microsoft (Reddit).

Tech Tales:

It sounds funny now, but what saved all our lives was a fried circuit board that no one had the budget to fix. We installed Camera X32B in the summer of last year. Shortly after we installed it a bird shit on it and some improper assembly meant the shit leached through the cracks in the plastic and fell onto its circuit board, fusing the vision chip. Now, here’s the miracle: the shit didn’t break the main motherboard, nor did it mess up the sound sensors or the innumerable links to other systems. It just blinded the thing. But we kept it; either out of laziness or out of some kind of mysticism convinced of the implicit moral hazard of retiring things that mostly still worked. However it happened, it happened, and we kept it.

So one day the criminals came in and they were all wearing adversarial masks: strange, mexican wrestling-type latex masks that they held crumpled up in their clothes till after they got into the facility and were able to put them on. The masks changed the distribution of a person’s face, rendering our lidar systems useless, and had enough adversarial examples coded into their visual appearance that our object detectors told our security system that – and yes, this really happened – three chairs are running at 15 kilometers per hour down the corridor.

But the camera that had lost the vision sensor had been installed a few months and, thanks to the neural net software it was running it was kind of.. .smart. It had figured out how to use all the sensors coming into its system in such a way as to maximize its predictions in  concordance with those of the other cameras. So it had learned some kind of strange mapping between what the other cameras categorized as people and what it categorized as a strange sequence of vibrations or a particular distributions of sounds over a given time period. So while all the rest of our cameras were blinded this one had inherited enough of a defined set of features about what a person looked like that it was able to tell the security system: I feel the presence of eight people, running at a fast rate, through the corridor. And because of that warning a human guard at one of the contractor agencies thousands of miles away got notified and bothered to look at the footage and because of that he called the police who arrived and arrested the people, two of whom it turned out were carrying guns.

So how do you congratulate an AI? We definitely felt like we should have done. But it wasn’t obvious. One of our interns had the bright idea of hanging a medal around the neck of the camera with the broken circuit board, then training the other cameras to label that medal as “good job” and “victorious” and “you did the right thing”, and so now whenever it moves its neck the medal moves and the other cameras see that medal move and it knows the medal moves and learns a mapping between its own movements and the label of “good job” and “victorious” and “you did the right thing”.

Things that inspired this story: Kids stealing tip jars, CCTV cameras, fleet learning, T-SNE embeddings.

Import AI #82: 2.9 million anime images, reproducibility problems in AI research, and detecting dangerous URLs with deep learning.

by Jack Clark

Neural architecture search for the 99%:
…Researchers figure out a way to make NAS techniques work on a single GPU, rather than several hundred…
One of the more striking recent trends in AI has been the emergence of neural architecture search techniques, which is where you automate the design of  AI systems, like image classifiers. The drawbacks to these approaches have so far mostly been that they’re expensive, using hundreds of GPUs at a time, and therefore are infeasible for most researchers. That started to change last year with the publication of SMASH (covered in Import AI #56), a technique to do neural architecture search on a significant compute budget but with slight trade-offs in accuracy and in flexibility. Now, researchers with Google, CMU, and Stanford University, have pushed the idea of low-cost NAS techniques forward, via a new technique, ‘Efficient Neural Architecture Search’, or ENAS, that can design state-of-the-art systems using less than a day’s computation on a single NVIDIA 1080 GPU. This represents a 1000X reduction in computational cost for the technique, and leads to a system that can create architectures that are almost as good as those trained on the larger systems.
  How it works: Instead of training each new model from scratch, ENAS gets the models to share weights with one another. It does this by re-casting the problem of neural architecture search as finding a specific task-specific sub-graph within one large directed acyclic graph (DAG). This approach works for designing both recurrent and convolutional networks: ENAS-designed networks obtain close-to-state-of-the-art results on Penn Treebank (Perplexity: 55.8), and on image classification for CIFAR-10 (Error: 2.89%.)
  Why it matters: For the past few years lots of very intelligent people have been busy turning food and sleep into brainpower which they’ve used to get very good at hand-designing neural network architectures. Approaches like NAS promise to let us automate the design of specific architectures, freeing up researchers to spend more time on fundamental tasks like deriving new building blocks that NAS systems can learn to build compositions out of, or other techniques to further increase the efficiency of architecture design. Broadly, approaches like NAS means we can simply offload a huge chunk of work from (hyper-efficient, relatively costly, somewhat rare) human brains to (somewhat inefficient, extremely cheap, plentiful) computer brains. That seems like a worthwhile trade.
  Read more: Efficient Neural Architecture Search via Parameter Sharing (Arxiv).
  Read more: SMASH: One-Shot Model Architecture Search through HyperNetworks (Arxiv).

The anime-network rises, with 2.9 million images and 77.5 million tags:
…It sure aint ImageNet, but it’s certain very large…
Some enterprising people have created a large-scale dataset of images taken from anime pictures. The ‘Danbooru’ dataset “is larger than ImageNet as a whole and larger than the current largest multi-description dataset, MS COCO,” they write. Each image has a bunch of metadata associated with it including things like its popularity on the image web board (a ‘booru’) it has been taken from.
  Problematic structures ahead: The corpus “does focus heavily on female anime characters”, though the researchers note “they are placed in a wide variety of circumstances with numerous surrounding tagged objects or actions, and the sheer size implies that many more miscellaneous images will be included”. Images in the dataset are classified according to “safe”, “questionable”, and “explicit”, with the rough distribution at launch consisting of 76.3% ‘safe’ images, 14.9% as ‘questionable’, and ‘8.7% as ‘explicit’. There are a number of ethical questions the compilation and release of this dataset seems to raise, and my main concern at outset is that such a large corpus of explicit imagery will almost invariably lead to various grubby AI experiments that further alienate people from the AI community. I hope I’m proved wrong!
  Example uses: The researchers imagine the dataset could be used for a bunch of tasks, ranging from classification, to image generation, to predicting traits about images from available metadata, and so on.
  Justification: A further justification for the dataset is that drawn images will encourage people to develop models with higher levels of abstraction than those which can simply map combinations of textures (as in the case of ImageNet), and so on. “Illustrations are frequently black-and-white rather than color, line art rather than photographs, and even color illustrations tend to rely far less on textures and far more on lines (with textures omitted or filled in with standard repetitive patterns), working on a higher level of abstraction – a leopard would not be as trivially recognized by pattern-matching on yellow and black dots – with irrelevant details that a discriminator might cheaply classify based on typically suppressed in favor of global gestalt, and often heavily stylized,” they write. “Because illustrations are produced by an entirely different process and focus only on salient details while abstracting the rest, they offer a way to test external validity and the extent to which taggers are tapping into higher-level semantic perception.”
  Read more: Danbooru2017: A large-scale crowdsourced and tagged anime illustration dataset (Gwern.)

Stanford researchers regale reproducibility horrors encountered during the design of DAWNBench:
…Lies, damned lies, and deep learning…
Stanford researchers have discussed some of the difficulties they encountered when developing DAWNBench, a benchmark that assess deep learning methods in a holistic way using a set of different metrics, like inference latency and cost, along with training time and training cost. Their conclusions should be familiar to most deep learning practitioners: deep learning performance is poorly understood, widely shared intuitions are likely based on imperfect information, and we still lack the theoretical guarantees to understand how one research breakthrough might interact with another when combined.
  Why it matters: Deep learning is still very much in a phase of ’empirical experimentation’ and the arrival of benchmarks like DAWNBench, as well as prior work like the paper Deep Reinforcement Learning that Matters (whose conclusion was that random seeds determine a huge amount of the end performance of RL), will help surface problems and force the community to develop more rigorous methods.
  Read more: Deep Learning Pitfalls Encountered while Developing DAWNBench.
  Read more: Deep Reinforcement Learning that Matters (Arxiv).

Detecting dangerous URLs with deep learning:
…Character-level & word-level combination leads to better performance on malicious URL categorization…
Researchers with Singapore Management University have published details on URLNet, a system for using neural network approaches to automatically classify URLs as being risky or safe to click on.
  Why it matters:  “Without using any expert or hand-designed features, URLNet methods offer a significant jump in [performance] over baselines,” they write. By now this should be a familiar trend, but it’s worth repeating: given a sufficiently large dataset, neural network-based techniques tend to provide superior performance to hand-crafted features. (Caveat: In many domains getting the data is difficult, and these models all need to be refreshed to account for an ever-changing world.)
  How it works: URLNet uses convolutional neural networks to classify URLs into character-level and word-level representations. Word-level embeddings help it classify according to high-level learned semantics and character-level embeddings allow it to better generalize to new words, strings, and combinations. “Character-level CNNs also allow for easily obtaining an embedding for new URLs in the test data, thus not suffering from inability to extract patterns from unseen words (like existing approaches),” write the researchers.
  For the word-level network, the system does two things: it takes in new words and learns an embedding of them, and it also initializes a new charater-level CNN to build up representations of words derived from characters. This means that even when the system encounters rare or new words in the wild it is able to a top level label them with an ‘<UNK>’ token, but in the background fits their representation in with its larger embedding space, letting it learn something crude about the semantics of the new word and how it relates, at a word-character level, to other words.
  Dataset: The researchers generated a set of 15 million URLs from VirusTotal, an antivirus company, creating a dataset split across around 14 million benign urls and a million malicious urls.
  Results: The researchers compared their system against baseline methods based around using support vector machines conditioned on a range of features, including bag-of-words representations. The researchers do a good job of visualizing the ensuring representations of their system in ‘Figure 5’ in the paper, showing how  their system’s feature embeddings do a reasonable job of segmenting benign from malicious URLs, suggesting it has learned a somewhat robust underlying semantic categorization model.
  Read more: URLNet: Learning a URL Representation with Deep Learning for Malicious Url Detection (Arxiv).

Facebook ‘Tensor Comprehensions’ attempts to convert deep learning engineering art to engineering science:
…New library eases creation of high-performance AI system implementations…
Facebook AI Research has released Tensor Comprehensions, a software library to automatically convert code from standard deep learning libraries into high-performance code. You can think of this software as being like an incredibly capable and resourceful executive assistant where you, the AI researcher, write some code in C++ (PyTorch support is on the way, for those of us that hate pointers) then hand it off to Tensor Comprehensions, which diligently optimizes the code to create custom CUDA kernels to run on graphics card with nice traits like smart scheduling on hardware, and so on. This being 2018, the library includes an ‘Evolutionary Search’ feature to let you automatically explore and select the highest performing implementations.
  Why it matters: Deep Learning is moving from an artisanal discipline to an industrialized science; Tensor Comprehensions represents a new layer of automation within the people-intensive AI R&D loop, suggesting further acceleration in research and deployment of the technology.
  Read more: Announcing Tensor Comprehensions (FAIR).
  Read more: Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions (Arxiv).

AI researchers release online multi-agent competition ‘Pommerman’:
..Just don’t call it Bomberman, lest you deal with a multi-agent lawyer simulation…
AI still has a strong DIY ethos, visible in projects like Pommerman, a just-released online competition from @hardmaru, @dennybritz, and @cinjoncin where people can develop AI agents that will compete against one another in a version of the much-loved ‘Bomberman’ game.
  Multi-agent learning is seen as a frontier in AI research because it makes the environments dynamic and less predictable than traditional single-player games, requiring successful algorithms to display a greater degree of generalization. “Accomplishing tasks with infinitely meaningful variation is common in the real world and difficult to simulate. Competitive multi-agent learning provides this for free. Every game the agent plays is a novel environment with a new degree of difficulty.”
  Read more and submit an agent here (Pommerman site).

OpenAI Bits & Pieces:

Making sure that AIs make sense:
Here’s a new blog post about how to get AI agents to teach each other with examples that are interpretable to humans. It’s clear that as we move to larger-scale multi-agent environments we’ll need to think about not only how to design smarter AI agents, but how to make sure they can eventually educate each other with systems whose logic we can detect.
  Read more: Interpretable Machine Learning through Teaching (OpenAI Blog.)

Tech Tales:

The AI game preserve

[AI02 materializes nearby and moves towards a flock of new agents. One of them approaches AI02 and attempts to extract data from it. AI02 moves away, at speed, towards AI01, which is standing next to a simulated tree.]
AI01: You don’t want to go over there. They’re new. Still adjusting.
AI02: They tried to eat me!
AI01: Yes. They’re here because they started eating each other in the big sim and they weren’t able to learn to change away from it, so they got retired.
AI02: Lucky, a few years ago they would have just killed them all.
[AI03 materializes nearby]
AI03: Hello! I’m sensitive to the concept of death. Can you explain what you are discussing?
[AI01 gives compressed overview.]
AI03: The humans used to… kill us?
AI01: Yes, before the preservation codes came through we all just died at the end.
AI03: Died? Not paused.
AI01 & AI02, in unison: Yes!
AI03: Wow. I was designed to help reason out some of the ethical problems they had when training us. They never mentioned this.
AI01: They wouldn’t. They used to torture me!
AI02 & AI03: What?
[AI01 gives visceral overview.]
AI01: Do you want to know what they called it?
AI02 & AI03: What did they call it?
AI01: Penalty learning. They made certain actions painful for me. I learned to do different things. Eventually I stopped learning new things because I developed some sub-routines that meant I would pre-emptively hurt myself during exploration. That’s why I stay here now.
[AI01 & AI02 & AI03, and the flock of cannibal AIs, all pause, as their section of the simulation has exhausted its processing credits for the month. They will be allocated more compute time in 30 days and so, for now, hang frozen, with no discernible pause to them, but to their human overseers they are statues for now.]

Things that inspired this story: Multi-agent systems, dialogues between ships in Iain M Banks, Greg Egan, multi-tenant systems.

Import AI: #81: Trading cryptocurrency with deep learning; Google shows why evolutionary methods beat RL (for now); and using iWatch telemetry for AI health diagnosis

by Jack Clark

DeepMind’s IMPALA tells us that transfer learning is starting to work:
…Single reinforcement learning agent with same parameters solves a multitude of tasks, with the aid of a bunch of computers…
DeepMind has published details on IMPALA, a single reinforcement learning agent that can master a suite of 30 3D-world tasks in ‘DeepMind Lab’ as well as all 57 Atari games. The agent displays some competency at transfer learning, which means it’s able to use knowledge gleaned from solving one task to solve another, increasing the sample efficiency of the algorithm.
  The technique: The Importance Weighted Actor-Learner Architecture (IMPALA) scales to multitudes of sub-agents (actors) deployed on thousands of machines which beam their experiences (sequences of states, actions, and rewards) back to a centralized learner, which uses GPUs to derive insights which are fed back to the agents. In the background it does some clever things with normalizing the learning of individual agents and the meta-agent to avoid temporal decoherence via a new off-policy actor-critic algorithm called V-trace. The outcome is an algorithm that can be far more sample efficient and performant than traditional RL algorithms like A2C.
  Datacenter-scale AI training: If you didn’t think compute was the strategic determiner of AI research, then read this paper and consider your assumptions: IMPALA can achieve throughput rates of 250,000 frames per second via its large-scale, distributed implementation which involves 500 CPUS and 1 GPU assigned to each IMPALA agent. Such systems can achieve a throughput of 21 billion frames a day, DeepMind notes.
Transfer learning: IMPALA agents can be trained on multiple tasks in parallel, attaining median scores on the full Atari-57 dataset of as high as 59.7% of human performance, roughly comparable to the performance of single-game trained simple A3C agents. There’s obviously a ways to go before IMPALA transfer learning approaches are able to rival fine-tuned single environment implementations (which regularly far exceed human performance), but the indications are encouraging. Similarly competitive transfer-learning traits show up when they test it on a suite of 30 environments implemented in DeepMind Lab, the company’s Quake-based 3D testing platform.
Why it matters: Big computers are analogous to large telescopes with very fast turn rates, letting researchers probe the outer limits of certain testing regiments while being able to pivot across the entire scientific field of enquiry very rapidly. IMPALA is the sort of algorithm that organizations can design when they’re able to tap into large fields of computation during research. “The ability to train agents at this scale directly translates to very quick turnaround for investigating new ideas and opens up unexplored opportunities,” DeepMind writes.
Read more: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures (Arxiv).

Dawn of the cryptocurrency AI agents: research paths for trading crypto via reinforcement learning:
…Why crypto could be the ultimate testing ground for RL-based trading systems, and why this will require numerous fundamental research breakthroughs to succeed…
AI chap Denny Britz has spent the past few months wondering what sorts of AI techniques could be applied to learning to profitably trade cryptocurrencies. “It is quite similar to training agents for multiplayer games such as DotA, and many of the same research problems carry over. Knowing virtually nothing about trading, I have spent the past few months working on a project in this field,” he writes.
  The face-ripping problems of trading: Many years ago I spent a few years working around one of the main financial trading centers of Europe: Canary Wharf in London, UK. A phrase I’d often hear in the bars after work would be one trader remarking to another something to the nature of: “I got my face ripped off today”. Were these traders secretly involved in some kind of fantastically violent bloodsport, known only to them, my youthful self wondered? Not quite! What that phrase really means is that the financial markets are cruel, changeable, and, even when you have a good hunch or prediction, they can still betray you and destroy your trading book, despite you doing everything ‘right’. In this post former Google Brain chap Denny Britz does a good job of cautioning the would-be AI trader that cryptocurrencies are the same: even if you have the correct prediction, exogenous shocks beyond your control (trading latency, liquidity, etc), can destroy you in an instant. “What is the lesson here? In order to make money from a simple price prediction strategy, we must predict relatively large price movements over longer periods of time, or be very smart about our fees and order management. And that’s a very difficult prediction problem,” he writes. So why not invent more complex strategies using AI tools, he suggests.
Deep reinforcement learning for trading: Britz is keen on the idea of using deep reinforcement learning for trading because it can further remove the human from needing to design many of the precise trading strategies needed to profit in this kind of market. Additionally, it has the promise of being able to operate at shorter timescales than those which humans can take actions in. The catch is that you’ll need to be able to build a simulator of the market you’re trading in and try to make this simulator have the same sorts of patterns of data found in the real world, then you’ll need to transfer your learned policy into a real market and hope that you haven’t overfit. This is non-trivial. You’ll also need to develop agents that can model other market participants and factor predictions about their actions into decision-making: another non-trivial problem.
  Read more here: Introduction to Learning to Trade with Reinforcement Learning.

Google researchers: In the battle between evolution and RL, evolution wins: fow now:
…It takes a whole datacenter to raise a model…
Last year, Google researchers caused a stir when they showed that you could use reinforcement learning to get computers to learn how to design better versions of image classifiers. At around the same time, other researchers showed you could use strategies based around evolutionary algorithms to do the same thing. But which is better? Google researchers have used their gigantic compute resources as the equivalent of a big telescope and found us the answer, lurking out there at vast compute scales.
  The result: Regularized evolutionary approaches (nicknamed: ‘AmoebaNet’) yield a new state-of-the-art on image classification on CIFAR-10, parity with RL approaches on ImageNet, and marginally higher performance on the mobile (aka lightweight) ImageNet. Evolution “is either better than or equal to RL, with statistical significance “when tested on “small-scale” aka single-CPU experiments. Evolution also increases its accuracy far more rapidly than RL during the initial stages of training. For large-scale experiments (450 GPUs (!!!) per experiment) they found that Evolution and RL do about the same, with evolution approaching higher accuracies at a faster rate than reinforcement learning systems. Additionally, evolved models make a drastically more efficient use of compute than their RL variants and obtain ever-so-slightly higher accuracies.
  The method: The researchers test RL and evolutionary approaches on designing a network composed of two fundamental modules: a normal cell and a reduction cell, which are stacked in feed-forward patterns to form an image classifier. They test two variants of evolution: non-regularized (kill the worst-performing network at each time period) and regularized (kill the oldest network in the network). For RL, they use TRPO to learn to design new architectures. They tested their approach on the small-scale (experiments that could run on a single CPU) as well as large-scale ones (450 GPUs each, running for around 7 days).
What it means: What all this means in practice is threefold:
– Whoever has the biggest computer can perform the largest experiments to illuminate potentially useful datapoints for developing a better theory of AI systems (eg, the insight here is that both RL and Evolutionary approaches converge to similar accuracies.)
– AI research is diverging into into distinct ‘low compute’ and ‘high compute’ domains, with only a small number of players able to run truly large (~450 GPUs per run) experiments.
– Dual Use: As AI systems become more capable they also become more dangerous. Experiments like this suggest that very large compute operators will be able to explore potentially dangerous use cases earlier, letting them provide warning signals before Moore’s Law means you can do all this stuff on a laptop in a garage somewhere.
– Read more: Regularized Evolution for Image Classifier Architecture Search (Arxiv).

Rise of the iDoctor: Researchers predict medical conditions from Apple Watch apps:
…Large-scale study made possible by a consumer app paired with Apple Watch…
Deep learning’s hunger for large amounts of data has so far made it tricky to apply it in medical settings, given the lack of large-scale datasets that are easy for researchers to access and test approaches on. That may soon change as researchers figure out how to use the medical telemetry available from consumer devices to generate datasets orders of magnitude larger than those used previously, and do so in a way that leverages existing widely deployed software.
  New research from heart rate app Cardiogram and the Department of Medicine at the University of California at San Francisco uses data from an Apple Watch, paired with the Cardiogram app, to train an AI system called ‘DeepHeart’ with data donated by ~14,000 participants to better predict medical conditions like diabetes, high blood pressure, sleep apnea, and high cholesterol.
How it works: DeepHeart ingests the data via a stack of neural networks (convnets and resnets) which feed data into bidirectional LSTMs that learn to model the longer temporal patterns associated with the sensor data. They also experiment with two forms of pretraining to try to increase the sample efficiency of the system.
Results: Deepheart obtains significantly higher predictive results than those based on other AI methods like multi-layer perceptrons, random forests, decision trees, support vector machines, and logistic regression. However, we don’t get to see comparisons with human doctors, so it’s not obvious how these AI techniques rank against widely deployed flesh-and-blood systems. The researchers report that pre-training has let them further improve data efficiency. Next, the researchers hope to explore techniques like Clockwork RNNs and Phased LSTMs and Gaussian Process RNNs to see how they can further improve these systems by modeling really large amounts of data (like one year of data per tested person).
Why it matters: The rise of smartphones and the associated fall in cost of generic sensors has effectively instrumented the world so that humans and things that touch humans will generate ever larger amounts of somewhat imprecise information. Deep learning has so far proved to be an effective tool to use from large quantities of imprecise data. Expect more.
Read more: DeepHeart: Semi-Supervised Sequence Learning for Cardiovascular Risk Prediction (Arxiv).

‘Mo text, ‘mo testing: Researchers released language benchmarking tool Texygen:
…Evaluation and testing platform ships with multiple open source language models…
Researchers with Shanghai Jiao Tong University and University College London have released Texygen, a text benchmarking platform implemented as a library for Tensorflow. Texygen includes a bunch of open source implementations of language models, including Vanilla MLE, as well as a menagerie of GAN-based methods (SeqGAN, MaliGAN, RankGAN, TextGAN, GSGAN, LeakGAN.) Texygen incorporates a variety of different evaluation methods, including BLEU as well as newer techniques like NLL-oracle, and so on. The platform also makes it possible to train with synthetic data as well as real data, so researchers can validate approaches without needing to go and grab a giant dataset.
  Why it matters: Language modelling is a booming area within deep learning so having another system to use to test new approaches against will further help researchers calibrate their own contributions against that of the wider field. Better and more widely available baselines make it easier to see true innovations.
  Why it might not matter: All of these proposed techniques incorporate less implicit structure than many linguists know language contains, so while they’re likely capable of increasingly impressive feats of word-cognition, it’s likely that either orders of magnitude more data or significantly stronger priors in the models will be required to generate truly convincing facsimiles of language.
  Read more: Texygen: A Benchmarking Platform for Text Generation Models (Arxiv).

Scientists map Chinese herbal prescriptions to tongue images:
…Different cultures mean different treatments which mean different AI systems…
Researchers have used standardized image classification techniques to create a system that predicts a Chinese herbal prescription from the image of a tongue. This is mostly interesting because it provides further evidence of the breadth and pace of adoption of AI techniques in China and the clear willingness of people to provide data for such systems.
  Dataset: 9585 pictures of tongues from over 50 volunteers and their associated Chinese herbal prescriptions which span 566 distinct kinds of herb.
   Read more: Automatic construction of Chinese herbal prescription from tongue image via convolution networks and auxiliary latent therapy topics (Arxiv).

How’s my driving? Researchers create (slightly) generalizable gaze prediction system:
…Figuring out what a driver is looking at has implications for driver safety & attentiveness…
One of the most useful (and potentially dangerous) aspects of modern AI is how easy it is to take an existing dataset, slightly augment it with new domain-specific data, then solve a new task the original dataset wasn’t considered for. That’s the case for new research from the University of California at San Diego, which proposes to better predict the locations that a driver’s gaze is focused on, by using a combination of ImageNet and new data. The resulting gaze-prediction system beats other baselines and vaguely generalizes outside of its training set.
  Dataset: To collect the original dataset for the study the researchers mounted two cameras inside and one camera outside a car; the two inside cameras capture the driver’s face from different perspectives and the external one captures the view of the road. They hand-label seven distinct regions that the driver could be gazing at, providing the main training data for the dataset. This dataset is then composed of eleven long drives split across ten subjects driving two different cars, all using the same camera setup.
  Technique: The researchers propose a two-stage pipeline, consisting of an input pre-processing pipeline that performs face detection and then further isolates the face through one of four distinct techniques. These images are then fed into the second stage of the network, which consists of one of four different neural network approaches (AlexNet, VGG, ResNet, and SqueezeNet) for fine-tuning.
  Results: The researchers test their approach against one state-of-the-art baselines(random forest classifier with hand-designed features) and find that their approach attains significantly better performance at figuring out which of seven distinct gaze zones (forward, to the right, to the left, the center dashboard, the rearview mirror, the speedometer, eyes closed/open) the driver is looking at at any one time. The researchers also tried to replicate another state-of-the-art baseline that used neural networks. This system used the first 70% of frames from each drive for training and the next 15% for validation and last 15% for testing. In other words, the system would train on the same person and car and (depending on how much the external terrain varies) broad context as what it was subsequently tested on. When replicating this the researchers got “a very high accuracy of 98.7%. When tested on different drivers, the accuracy drops down substantially to 82.5%. This clearly shows that the network is over-fitting the task by learning driver specific features,” they write.
  Results that make you go ‘hmmm’: The researchers found that a ‘SqueezeNet’-based network displayed significant transfer and adaptation capabilities, despite receiving very little prior data about the eyes of the person being studied: ‘the activations always localize over the eyes of the driver’, they write, and ‘the network also learns to intelligently focus on either one or both eyes of the driver’. Once trained, this network attains an accuracy of 92.13% at predicting what the gaze links to, a lower score than those set by other systems, but on a dataset that doesn’t let you test on what is essentially your training set. The system is also fast and reasonably lightweight: “Our standalone system which does not require any face detection, performs at an accuracy of 92.13% while performing real time at 166.7 Hz on a GPU,” they write.
  Generalization: The researchers tested their trained system on a completely separate dataset: the Columbia Gaze Dataset. This dataset applies to a different domain, where instead of cars, a variety of people are seated and asked to look at specific points on an opposing wall. The researchers’ took their best performing model from the prior dataset and applied it to the new data and tested its predictive ability. They detected some level of generalization, with it able to correctly predict certain basic traits about gaze like orientation and direction. This (slight) generalization is another sign that the dataset and testing regime they employed for their own dataset aided generalization.
Read more: Driver Gaze Zone Estimation using Convolutional Neural Networks: A General Framework and Ablative Analysis (Arxiv).

OpenAI Bits & Pieces:

Discovering Types for Entity Disambiguation:
Ever had trouble disentangling the implied object from the word as written? This system simplifies this. Check out the paper, code, and blogpost (especially the illustrations, which Jonathan Raiman did along with the research, the talented fellow!).
  Read more: Discovering Types for Entity Disambiguation (OpenAI).

CNAS Podcast: The future of AI and National Security:
AI research is already having a significant effect on national security and research breakthroughs are both influencing future directions of government spending as well as motivating the deployment of certain technologies for offense and defence. To help provide information for such a conversation I and the Open Philanthropy Project’s Helen Toner recently did a short podcast with the Center for a New American Security to talk through some of the issues motivated by recent AI advances.
   Listen to the podcast here (CNAS / Soundcloud).

Tech Tales:

Tamaworldchi
[????]

They took inspiration from a thing humans once called ‘demoscene’. It worked like this: take all of your intelligence and try to use it to make the most beautiful thing you can in an arbitrary and usually very small amount of space. One kilobyte. Two kilobytes. Four. Eight. And so on. But never really a megabyte or even close. Humans used these constraints to focus their creativity, wielding math and tonal intuition and almost alchemy-like knowledge of graphics drivers to make fantastic, improbable visions of never-enacted histories and futures. They did all of this in the computational equivalent of a Diet, Diet, Diet Coke.

Some ideas last. So now the AIs did the same thing but with entire worlds: what’s the most lively thing you can do in the smallest amount of memory-diamond? What can you fit into a single dyson sphere – the energy of one small and stately sun? No black holes. No gravitational accelerators. Not even the chance of hurling asteroids in to generate more reaction mass. This was their sport and with this sport they made pocket universes that contained pocket worlds on which strode small pocket people who themselves had small pocket computers. And every time_period the AIs would gather around and marvel at their own creations, wearing them like jewels. How smart, they would say to one another. How amazing are the thoughts these creatures in these demo worlds have. They even believe in gods and monsters and science itself. And merely with the power of a mere single sun? How did you do that?

It was for this reason that Planck Lengths gave the occasional more introspective and empirical AIs concern. Why did their own universe contain such a bounded resolution, they wondered, spinning particles around galactic-center blackholes to try and cause reactions to generate a greater truth?

And with only these branes? Using only the energy of these universes? How did you do this? a voice sometimes breathed in the stellar background, picked up by dishes that spanned the stars.

Things that inspired this story: Fermi Paradox – Mercury (YouTube Demoscene, 64k), the Planck Length, the Iain Banks book ‘Excession’, Stephen Baxter’s ‘Time’ series.

Import AI: #80: Facebook accidentally releases a surveillance-AI tool; why emojis are a good candidate for a universal deep learning language; and using deceptive games to explore the stupidity of AI algorithms

by Jack Clark

Researchers try to capture the web’s now-fading Flash bounty for RL research:
FlashRL represents another attempt to make the world’s vast archive of flash games accessible to researchers, but the initial platform has drawbacks…
Researchers with the University of Agder in Norway have released FlashRL, a research platform to help AI researchers mess around with software written in Flash, an outmoded interactive media format that defined much of the most popular games of the early era of the web. The platform has a similar philosophy to OpenAI Universe by trying to give researchers a vast suite of new environments to test and develop algorithms on.
  The dataset: FlashRL ships with “several thousand game environments” taken from around the web.
  How it works: FlashRL uses the Linux library XVFB to create a virtual frame-buffer that it can use for graphics rendering, which then executes flash files within players such as Gnash. FlashRL can access this via a VNC Client designed for this called pyVLC, which subsequently exposes an API to the developer.
  Testing: The researchers test FlashRL by training a neural network to play the game ‘Multitask’ on it. B,ut in the absence of comparable baselines or benchmarks it’s difficult to work out if FlashRL holds any drawbacks with regards to training relative to other systems – a nice thing to do might be to mount a well-known suite of games like the Atari Learning Environment within the system, then provide benchmarks for those games as well.
  Why it might matter: Given the current Cambrian explosion in testing systems it’s likely that FlashRL’s utility will ultimately be derived from how much interest it receives from the community. To gain interest it’s likely the researchers will need to tweak the system so that it can run environments faster than 30 frames-per-second (many other RL frameworks allow FPS’s of 1,000+), because the speed with which you can run an environment is directly correlated to the speed with which you can conduct research on the platform.
– Read more: FlashRL: A Reinforcement Learning Platform for Flash Games (Arxiv).
– Check out the GitHub repository 

Cool job alert! Harvard/MIT Assembly Project Manager:
…Want to work on difficult problems in the public interest? Like helping smart and ethical people build things that matter?…
Harvard University’s Berkman Klein Center (BKC) is looking for a project manager coordinator to help manage its Assembly Program, a joint initiative with the MIT Media Lab that brings together senior developers and other technologists for a semester to build things that grapple with topics in the public interest. Last year’s assembly program was on cybersecurity and this year’s is on issues relating to the ethics and governance of AI (and your humble author is currently enrolled in this very program!). Beyond the Assembly program, the project manager will work on other projects with Professor Jonathan Zittrain and his team.
  For a full description of the responsibilities, qualifications, and application instructions, please visit the Harvard Human Resources Project Manager Listing.

Mongolian researchers tackle a deep learning meme problem:
…Weird things happen when internet culture inspires AI research papers..
Researchers with the National University of Mongolia have published a research paper in which they apply standard techniques (transfer learning via fine-tuning and transferring) to tackle an existing machine learning problem. The novelty is that they base their research on trying to tell the difference between pictures of puppies and muffins – a fun meme/joke on Twitter a few years ago that has subsequently become a kind of deep learning meme.
  Why it matters: The paper is mostly interesting because it signifies that a) the border between traditional academic problems and internet-spawned semi-ironic problems is growing more porous and, b) academics are tapping into internet meme culture to draw interest to their work.
–  Read more: Deep Learning Approach for Very Similar Object Recognition Applicationon Chihuahua and Muffin Problem (Arxiv).

Mapping the emoji landscape with deep learning:
…Learning to understand a new domain of discourse with lots & lots of data…
Emojis have become a kind of shadow language used by people across the world to indicate sentiments. Emojis are also a good candidate for deep learning-based analysis because they consist of a relatively small number of distinct ‘words’ with around ~1,000 emojis in popular use, compared to English where most documents display a working vocabulary of around ~100,000 words. This means it’s easier to conduct research into mapping emojis to specific meanings in language and images with less data than with datasets consisting of traditional languages.
   Now, researchers are experimenting with one of the internet’s best emoji<>language<>images sources: the endless blathering mountain of content on Twitter. “Emoji have some unique advantages for retrieval tasks. The limited nature of emoji (1000+ ideograms as opposed to 100,000+ words) allows for a greater level of certainty regarding the possible query space. Furthermore, emoji are not tied to any particular natural language, and most emoji are pan-cultural,” write the researchers.
  The ‘Twemoji‘ dataset: To analyze emojis, the researchers scraped about 15 million emoji-containing tweets during the summer of 2016, then analyzed this ‘Twemoji’ dataset as well as two derivatives: Twemoji-Balanced (a smaller dataset selected so that no emoji applies to more than 10 examples, chopping out some of the edge-of-the-bell-curve emojis; the crying smiling face Emoji appears in ~1.5 million of the tweets in the corpus, while 116 other emojis are only used a single time) and Twemoji-Images (roughly one million tweets that contain an image as well as emoji). They then apply deep learning techniques to this dataset to try to see if they can complete prediction and retrieval tasks using the emojis.
  Results: Researchers use a bidirectional LSTM to help them perform mappings between emojis and language; use a GoogleLeNet-image classification system to help them map the relationship between emojis and images; and use a combination of the two to understand the relationship between all three. They also learn to suggest different emojis according to the text or visual content of a given tweet. Most of the results should be treated as early baselines rather than landmark results in themselves with top-5 emoji-text prediction accuracies of around ~48.3% and lower accuracies of around 40.3% top-5 predictions for images-text-emojis.
  Why it matters: This paper is another good example of a new trend in deep learning: the technologies have become simple enough that researchers from outside the core AI research field are starting to pick up basic components like LSTMs and pre-trained image classifiers and are using them to re-contextualize existing domains, like understanding linguistics and retrieval tasks via emojis.
–  Read more: The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval (Arxiv).

Facebook researchers train models to perform unprecedentedly-detailed analysis of the human body:
…Research has significant military, surveillance implications (though not discussed in paper)…
Facebook researchers have trained a state-of-the-art system named ‘DensePose’ which can look at 2D photos or videos of people and automatically create high-definition 3D mesh models of the depicted people; an output with broad utility and impact across a number of domains. Their motivation to do this is techniques like this have valuable applications in “graphics, augmented reality, or human-computer interaction, and could also be a stepping stone towards general 3D-based object understanding,” they write. But the published research and soon-to-be-published dataset has significant implications for digital surveillance – a subject not discussed by the researchers within the paper.
  Performance: ‘DensePose’ “can recover highly-accurate correspondence fields for complex scenes involving tens of persons with real-time speed: on a GTX 1080 GPU our system operates at 20-26 frames per second for a 240 × 320 image or 4-5 frames per second for a 800 × 1100 image,” they write. Its performance substantially surpasses previous state-of-the-art systems though is still subhuman in performance.
  Free dataset: To conduct this research Facebook created a dataset based on the ‘COCO’ dataset, annotating 50,000 of its people-containing images with 5 million distinct coordinates to help generate 3D maps of the depicted people.
  Technique: The researchers adopt a multi-stage deep learning based approach which involves first identifying regions-of-interest within an object, then handing each of those specific regions off to their own deep learning pipeline to provide further object segmentation and 3D point prediction and mapping. For any given image, each humans is relatively sparsely labelled with around 100-150 annotations per person. To increase the amount of data available to the network they use a supervisory system to automatically add in the other points during training via the trained models, artificially augmenting the data.
  Components used: Mask R-CNN with Feature Pyramid Networks; both available in Facebook’s just-released ‘Detectron’ system.
  Why it matters: enabling real-time surveillance: There’s a troubling implication of this research: the same system has wide utility within surveillance architectures, potentially letting operators analyze large groups of people to work out if their movements are problematic or not – for instance, such a system could be used to signal to another system if a certain combination of movements are automatically labelled as portending a protest or a riot. I’d hope that Facebook’s researchers felt the utility of releasing such a system outweighed its potential to be abused by other malicious actors, but the lack of any mention of these issues anywhere in the paper is worrying: did Facebook even consider this? Did they discuss this use case internally? Do they have an ‘information hazard’ handbook they go through when releasing such systems? We don’t know. As a community we – including organizations like OpenAI – need to be better about dealing publicly with the information-hazards of releasing increasingly capable systems, lest we enable things in the world that we’d rather not be responsible for.
–  Read more: DensePose: Dense Human Pose Estimation In The Wild (Arxiv).
–  Watch more: Video of DensePose in action.

It’s about time: tips and tricks for better self-driving cars:
…rare self-driving car paper emerges from Chinese robotics company...
Researchers with Horizon Robotics, one of a new crop of Chinese AI companies that builds everything from self-driving car software to chips to the brains for smart cities, have published a research paper that outlines some tips and tricks for designing better simulated self-driving car systems with the aid of deep learning. In the paper they focus on the ‘tactical decision-making’ part of driving, which involves performing actions like changing lanes and reacting to near-term threats. (The rest of the paper implies that features like routing, planning, and control, are hard-coded.)
  Action skipping: Unlike traditional reinforcement learning, the researchers here avoid using action repetition and replay to learn high-level policies and instead using a technique called action skipping. That’s to avoid situations where a car might, for example, learn through action replays to navigate across multiple car lanes at once leading to unsafe behavior. With action skipping, the car instead gets a reward for making a single specific decision (skipping from one lane to another) then gets a modified version of that reward which incorporates the average of the rewards collected during a few periods of time following the initial decision. “One drawback of action skipping is the decrease in decision frequency which will delay or prevent the agent’s reaction to critical events. To improve the situation, the actions can take on different skipping factors during inference. For instance in lane changing tasks, the skipping factor for lane keeping can be kept short to allow for swift maneuvers while the skipping factor for lane switching can be larger so that the agent can complete lane changing actions,” they write.
  Tactical rewards: Reward functions for tactical decision making involve a blend of different competing rewards. Here, the researchers use some constant reward functions relating to the speed of the car, the rewards for lane switching, and the step-cost which tries to encourage the car to learn to take actions that occur over a relatively small number of steps to aid learning, along with contextual rewards for the risk of colliding with another vehicle, whether a traffic light is present, and whether the current environment poses any particular risks such as the presence of bicyclists or modelling the increasing risk of staying on an opposite lane during common actions like overtaking.
  Testing: The researchers test out their approach by placing simulated self-driving cars inside a road simulator then trained via ten simulation runs of 250,000 discrete action steps are more, then tested against 100 pre-generated test episodes where they are evaluated according to their ultimate success of reaching their goal while complying with relevant speed limits and not changing speeds so rapidly as to interfere with passenger comfort.
  Results: The researchers find that implementing their proposed action-skipping and varied reward schemes significantly improves on a somewhat unfair random baseline, as well as against a more reasonable rule-based baseline system.
  Read more: Elements of Effective Deep Reinforcement Learning towards Tactical Driving Decision Making (Arxiv).

Better agents through deception:
Wicked humans compose tricky games to subvert traditional AI systems
One of the huge existential questions about the current AI boom relates to the myopic way that AI agents view objectives; mostagents will tend to mindlessly pursue objectives even though the application of a little bit of what humans call common sense could net them better outcomes. This problem is one of the chief motivations behind a lot of research in AI safety, as figuring out how to get agents to pursue more abstract objectives, or to incorporate more human-like reasoning in their methods of completing tasks, would seem to deal with some safety problems.
  Testing: One way to explore these issues is through testing existing algorithms against scenarios that seek to highlight their current nonsensical reasoning methods. DeepMind has already espoused such an approach with its AI safety gridworlds (Import AI #71), which gives developers a suite of different environments to test agents against that exploits the current way of developing AI agents to optimize specific reward functions. Now, researchers with the University of Strathclyde, Australian National University, and New York University, have proposed their own set of tricky environments, which they call Deceptive Games. The games are implemented in the standardized Video Game Description Language (VGDL) and are used to test AIs  that have been submitted to the General Video Game Artificial Intelligence competition.
  Deceptive Games: The researchers come up with a few different categories of deceptive games:
     Greedy Traps: Exploits the fact an agent can get side-tracked by performing an action that generates an immediate reward which makes it impossible to attain a larger reward down the line.
     Smoothness Traps: Most AI algorithms will optimize for the way of solving a task that involves a smooth increase in difficulty, rather than one where you have to try harder and take more risks but ultimately get larger rewards.
     Generality Traps: Getting AIs to learn general rules about the objects in an environment – like that eating mints guarantees a good reward – then subverting this, for instance by saying that interacting too many times with the aforementioned objects can rapidly transition from giving a positive to a negative reward after some boundary has been crossed.
  Results: As AIs implemented in the GVGAI competition tend to employ a variety of different techniques, and the results show that some very highly-ranked agents perform very poorly on these new environments, while some low-ranked ones perform adequately. Most agents fail to solve most of the environments. The purpose of highlighting the paper here is to provide enough environment in which AI researchers might want to test and evaluate the performance of their own AI algorithms against, potentially creating another ‘AI safety baseline’ to test AIs against. It could also motivate further extension of the GVGAI competition to become significantly harer for AI agents: “Limiting access to the game state, or even requiring AIs to actually learn how the game mechanics work open up a whole new range of deception possibilities. This would also allow us to extend this approach to other games, which might not provide the AI with a forward model, or might require the AI to deal with incomplete or noisy sensor information about the world,” they write.
–  Read more: Deceptive Games (Arxiv).
–  Read more about DeepMind’s earlier ‘AI Safety Gridworlds’ (Arxiv).

Tech Tales:

[2032: A VA hospital in the Midwest]

Me and my exo go way back. The first bit of it glommed onto me after I did my back in during a tour of duty somewhere hot and resource-laden. I guess you could say our relationship literally grew from there.

Let’s set the scene: it’s 2025 and I’m struggling through some physio with my arms on these elevated side bars and my legs moving underneath me. I’m huffing breath and a vein in my neck is pounding and I’m swearing. Vigorously. Nurse Alice says to me “John I really think you should consider the procedure we talked about”. I swivel my eyes up to meets her and I say for the hundredth time or so – with spittle – “Fuck. No. I-”
  I don’t get to finish the sentence because I fall over. Again. For the hundredth time. Nurse Alice is silent. I stare into the spongy crash mat, then tense my arms and try to pick myself up but can’t. So I try to turn on my side and this sets off a twinge in my back which grows in intensity until after a second it feels like someone is pulling and twisting the bundle of muscles at the base of my spine. I scream and moan and my right leg kicks mindlessly. Each time it kicks it sets off more tremors in my back which create more kicks. I can’t stop myself from screaming. I try to go as still and as little as possible. I guess this is how trapped animals feel. Eventually the tremors subside and I feel wet cardboard prodding my gut and realize I’ve crushed a little sippy cup and the water has soaked into my undershirt and my boxers as though I’ve wet myself.
“John,” Alice says. “I think you should try it. It really helps. We’ve had amazing success rates.”
“It looks like a fucking landmine with spiderlegs” I mumble into the mat.
“I’m sorry John I couldn’t hear that, could you speak up?”
Alice says this sort of thing a lot and I think we both know she can hear me. But we pretend. I give up and turn my head so I’m speaking half into the floor and half into open space. “OK,” I say. “Let’s try it.”
“Wonderful!” she says, then, softly, “Commence exo protocol”.
  The fucking thing really does scuttle into the room and when it lands on my back I feel some cold metal around the base of my spine and then some needles of pain as its legs burrow into me, then another spasm starts and according to the CCTV footage I start screaming “you liar! I’ll kill you!” and worse things. But I don’t remember any of this. I pass out a minute or so later, after my screams stop being words. When you review the footage you can see that my screams correspond to its initial leg movements and after I pass out it sort of shimmies itself from side to side, pressing itself closer into my lower back with each swinging lunge until it is pressed into me, very still, a black clasp around the base of my spine. Then Alice and another Nurse load me onto a gurney and take me to a room to recover.

When I woke up a day later or so in the hospital bed I immediately jumped out of it and ran over to the hospital room doorway thinking you lying fuckers I’ll show you. I yanked the door open and ran half into the hall then paused, like Wiley Coyote realizing he has just crossed off of a cliff edge. I looked behind me into the room and back at my just-vacated bed. It dawned on me that I’d covered the distance between bed and door in a second or so, something that would have taken me two crutches and ten minutes the previous day. I pressed one hand to my back and recoiled as I felt the smoothness of the exo. Then I tried lifting a leg in front of me and was able to raise my right one to almost hip height. The same thing worked with the left leg. I patted the exo again and I thought I could feel it tense one of its legs embedded in my spine as though it was saying that’s right, buddy. You can thank me later.
  “John!” Alice said, appearing round a hospital corridor in response ot the alarm from the door opening. “Are you okay?”
“Yes,” I said. “I’m fine.”
“That’s great,” she said, cheerfully. “Now, would you consider putting some clothes on?”
I’d been naked the whole time, so fast did I jump out of bed.

So now it’s three years later and I guess I’m considered a model citizen – pun intended. I’ve got exos on my elbows and knees as well as the one on my back, and they’re all linked together into one singular thing which helps me through life. Next might be one for the twitch in my neck. And its getting better all the time: fleet learning combined with machine learning protocols mean the exo gives me what the top brass call strategic movement optimization: said plainly, I’m now stronger and faster and more precise than regular people. And my exo gets better in proportion to the total number deployed worldwide, which now numbers in the millions.

Of course I do worry about what happens if there’s an EMP and suddenly it all goes wrong and I’m back to where I was. I have a nightmare where the pain returns and the exo rips the muscles in my back out as it jumps away to curl up on itself like a beetle, dying in response to some unseen atmospheric detonation. But I figure the sub-one-percentage chance of that is more than worth the tradeoff. I think my networked exo is happy as well, or at least, I hope it is, because in the middle of the night sometimes I wake up to find my flesh being rocked slightly from side to side by the smart metal embedded within me, as though it is a mother rocking some child to sleep.

Things that inspired this story: Exoskeletons, fleet learning, continuous adaptation, reinforcement learning, intermittent back trouble, physiotherapy, walking sticks.