Import AI: #89: Chinese facial recognition startup raises $600 million; why GPUs could alter AI progress; and using context to deal with language ambiguity

by Jack Clark

Beating Moore’s Law with GPUs:
…Could a rise in GPU and other novel AI-substrates help deal with the decline of Moore’s Law?…
CPU performance has been stagnating for several years as it has become harder to improve linear execution pipelines across whole chips in relation to the reduction in transistor sizes, and the related problems which come from having an increasingly large number of things needing to work in lock-step with one another at minute scales. Could GPUs give us a way around this performance impasse? That’s the idea in a new blog from AI researcher Bharath Ramsundar who thinks that increases in GPU capabilities and the arrival of semiconductor substrates specialized for deep learning means that we can expect performance of AI applications to increase in coming years faster than typical computing jobs running on typical processors. He might be right – one of the weird things about deep learning is that its most essential elements, like big blocks of neural networks, can be scaled up to immense sizes without terrible scaling tradeoffs as their innards consist of relatively simple and parallel tasks like matrix multiplication, so new chips can easily be networked together to further boost base capabilities. Plus, standardization in a few software libraries, like NVIDIA’s cuDNN and CUDA GPU-interfaces, or the rise of TensorFlow for AI programming, means that some applications are getting faster over time purely as a consequence of software updates to these other fundamental improvements.
  Why it matters: Much of the recent progress in AI has occurred because around the mid-2000s processors became capable enough to easily train large neural networks on chunks of data – this underlying hardware improvement unlocked breakthroughs like the 2012 ‘AlexNet’ result for image recognition, related work in speech recognition, and subsequently significant innovations in research (AlphaGo) and application (large-scale sequence-to-sequence learning for ‘Smart Reply’, or the emergence of neural translation systems. If the arrival of things like GPUs and further software standardization and innovation has a good chance of further boosting performance, then researchers will be able to explore even larger or more complex models in the future, as well as run things like neural architecture search at a higher rate, which should combine to further drive progress.
  Read more: The Advent of Huang’s Law (Bharath Ramsundar blog post).

Microsoft launches AI training course including ‘Ethics’ segment:
…New Professional Program for Artificial Intelligence sees Microsoft get into the AI certification business…
Microsoft has followed other companies in making its internal training courses available externally via the Microsoft Professional Program in AI. This program is based on internal training initiatives the software company developed to ramp up their own professional skills.
 The Microsoft course is all fairly typical, teaching people about Python, statistics, the construction and deployment of deep learning and reinforcement learning projects, and deployment. It also includes a specific “Ethics and Law in Data and Analytics” course, which promises to teach developers how to ‘apply ethical and legal frameworks to initiatives in the data profession’.
  Read more: Microsoft Professional Program for Artificial Intelligence (Microsoft).
  Read more: Aiming to fill skill gaps in AI, Microsoft makes training courses available to the public (Microsoft blog).

Learning to deal with ambiguity:
…Researchers take charge of problem of word ambiguity via a charge at including more context…
Carnegie Mellon University researchers have tackled one of the harder problems in translation: dealing with ‘homographs’ – words that are spelled the same but have different meanings in different contexts, like ‘room’ and ‘charges’. They do this in the context of neural machine translation (NMT) systems, which use machine learning techniques to accomplish translation with orders of magnitude fewer hand-specified rules than prior systems.
  Existing NMT systems struggle with homographs, with performance of word-level translation degrading as the number of potential meanings of each word climbs, the researchers show. They try to alleviate this by adding a word context vector that can be used by the NMT systems to learn the different uses of the same word. Adding this ‘context network’ into their NMT architecture leads to significantly improved BLEU scores of sentences translated by the system.
  Why it matters: It’s noteworthy that the system used by the researchers to deal with the homograph problem is itself a learned system which, rather than using hand-written rules, seeks to instead ingest more context about each word and learn from that. This is illustrative of how AI-first software systems get built: if you identify a fault you typically write a program which learns to fix it, rather than learning to write a rule-based program that fixes it.
  Read more: Handling Homographs in Neural Machine Translation (Arxiv).

Chinese facial recognition company raises $600 million:
…SenseTime plans to use funds for five supercomputers for its AI services…
SenseTime, a homegrown computer vision startup that provides facial recognition tools at vast scales, has raised $600 million in funding. The Chinese company supplies facial recognition services to the public and private sectors and is now, according to a co-founder, profitable and looking to expand. The company is now “developing a service code-named “vipar” to parse data from thousands of live camera feeds”, according to Bloomberg News.
  Strategic compute: SenseTime will use money from the financing “to build at least five supercomputers in top-tier cities over the coming year to drive Viper and other services. As envisioned, it streams thousands of live feeds into a single system that’re automatically processed and tagged, via devices from office face-scanners to ATMs and traffic cameras (so long as the resolution is high enough). The ultimate goal is to juggle 100,000 feeds simultaneously,” according to Bloomberg news.
  Read more: China Now Has the Most Valuable AI Startup in the World (Bloomberg).
…Related: Chinese startup uses AI to spot jaywalkers and send them pictures of their face:
…Computer vision @ China scale…
Chinese startup Intellifusion is helping the local government in Shenzhen use facial recognition in combination with widely deployed urban cameras to text jaywalkers pictures of their faces along with personal information after they’ve been caught.
  Read more: China is using facial recognition technology to send jaywalkers fines through text messages (Motherboard).

Think China’s strategic technology initiatives are new? Think again:
…wide-ranging post by former Asia-focused State Department employee puts Beijing’s AI push in historical context…
Here’s an old (August 2017) but good post from the Paulson Institute at the University of Chicago about the history of Chinese technology policy in light of the government’s recent public statements about developing a national AI strategy. China’s longstanding worldview with regards to its technology strategy is that technology is a source of national power and China needs to develop more of an indigenous Chinese capability.
  Based on previous initiatives, it looks likely China will seek to attain frontier capabilities in AI then package those capabilities up as products and use that to fund further research. “Chinese government, industry, and scientific leaders will continue to push to move up the value-added chain. And in some of the sectors where they are doing so, such as ultra high-voltage power lines (UHV) and civil nuclear reactors, China is already a global leader, deploying these technologies to scale and unmatched in this by few other markets,” writes the author. “That means it should be able to couple its status as a leading technology consumer to a new and growing role as an exporter. China’s sheer market power could enable it to export some of its indigenous technology and engineering standards in an effort to become the default global standard setter for this or that technology and system.”
  Read more: The Deep Roots and Long Branches of Chinese Technonationalism (Macro Polo).

French researchers build ‘Jacquard’ dataset to improve robotic grasping:
…11,000+ object dataset provide real objects with associated depth information…
How do you solve a problem like robotic grasping? One way is to use many real world robots working in parallel for several months to learn to pick up a multitude of real world objects – that’s a route Google researchers took with the company’s ‘arm farm’ a few years ago. Another is to use people outfitted with sensors to collect demonstrations of humans grasping different objects, then learn from that – that’s the approach taken by AI startups like Kindred. A third way, and one which has drawn interest from a multitude of researchers, is to create synthetic 3D objects and train robots in a simulator to learn to grasp them – that’s what researchers at the University of California at Berkeley have done with Dex-Net, as well as organizations like Google and OpenAI; some organizations have further augmented this technique via the use of generative adversarial networks to simulate a greater range of grasps on objects.
  Jacquard: Now, French researchers have announced Jacquard, a robotics grasping dataset that contains more than 11,000 different real world objects and 50,000 images annotated with both RGB and realistic depth information. They plan to release it soon, they say, without specifying when. The researchers generate their data by sampling objects from ShapeNet which are each scaled and given different weight values, then dropped into a simulator, where they are then rendered into high-resolution images via Blender, with grasp annotations generated by a three-stage automated process within the ‘pyBullet’ physics library. To evaluate their dataset, they test it in simulation by pre-training an Alexnet on their Jacquard dataset then applying it to another, smaller, held-out dataset, where it generalizes well. The dataset supports multiple robotic gripper sizes, several different grasps linked to each image, and one million labelled grasps.
  Real robots: The researchers tested their approach on a real robot (a Fanuc M-20iA robotic arm) by testing it on a subset of ~2,000 objects from the Jacquard dataset as well as on the full Cornell dataset. A pre-trained AlexNet tested in this way gets about 78% at producing correct grasps, compared to 60.46% for Cornell. Both of these results are quite weak compared to results on the Dex-Net dataset, and other attempts.
  Why it matters: Many researchers expect that deep learning could lead to significant advancement in the manipulation capabilities of robots. But we’re currently missing two key traits: large enough datasets and a way to test and evaluate robots on standard platforms in standard ways. We’re currently going through a boom in the number of robot datasets available, with Jacquard representing another contribution here.
  Read more: Jacquard: A Large Scale Dataset for Robotic Grasp Detection (Arxiv).

What do StarCraft and the future of AI reseach have in common? Multi-agent control:
…Chinese researchers tackle StarCraft micromanagement tasks…
Researchers with the Institute of Automation in the Chinese Academy of Sciences have published research on using reinforcement learning to try to solve micromanagement tasks within StarCraft, a real-time strategy game. One of the main challenges in mastering StarCraft is to develop algorithms that can effectively train multiple units in parallel. The researchers propose what they call a parameter sharing multi-agent gradient-descent Sarsa algorithm, or PG-MAGDS. This algorithm shares the parameters of the overall policy network across multiple units while introducing methods to provide appropriate credit assignment to individual units. They also carry out significant reward shaping to get the agents to learn more effectively. Their PG-MAGDS AIs are able to learn to beat the in-game AI at a variety of micromanagement control scenarios, as well as in large-scale scenarios of more than thirty units on either side. It’s currently difficult to accurately evluate the various techniques people are developing for StarCraft against one another due to a lack of shared baselines and experiments, as well as an unclear split in the research community between using StarCraft 1 (this paper) as the testbed, and StarCraft 2 (efforts by DeepMind, others).
  Still limited: “At present, we can only train ranged ground units with the same type, while training melee ground units using RL methods is still an open problem. We will improve our method for more types of units and more complex scenarios in the future. Finally, we will also consider to use our micromanagement model in the StarCraft bot to play full the game,” the researchers write.
  Read more: StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning (Arxiv).

Tech Tales:

The person was killed at five minutes past eleven  the previous night. Their beaten body was found five minutes later by a passing group of women who had been dining at a nearby restaurant. By 11:15 the body was photographed and data began to be pulled from nearby security cameras, wifi routers, cell towers, and the various robot and drone companies. At 11:15:01 one of the robot companies indicated that a robot had been making a delivery nearby at the time of the attack. The robot was impounded and transported to the local police station where it was placed in a facility known to local officers as ‘the metal shop’. Here, they would try to extract data from the robot to learn what happened. But it would be a difficult task, because the robot had been far enough away from the scene that none of its traditional, easy to poll sensors (video, LIDAR, audio, and so on) had sufficient resolution or fidelity to tell them much.

“What did you see,” said the detective to the robot. “Tell me what you saw.”
The robot said nothing – unsurprising given that it had no speech capability and was, at that moment, unpowered. In another twelve hours the police would have to release the robot back to the manufacturer and if they hadn’t been able to figure anything out by then, then they were out of options.
“They never prepared me for this,” said the detective – and he was right. When he was going through training they never dwelled much on the questions relating to interrogating sub-sentient AI systems, and all the laws were built around an assumption that turned out to be wrong: that the AIs would remain just dumb enough to be interrogatable via direct access into their electronic brains, and that the laws would remain just slow enough for this to be standard procedure for dealing with evidence from all AI agents. This assumption was half right: the law did stay the same, but the AIs got so smart that though you could look into their brains, you couldn’t learn as much as you’d hope.

This particular AI was based in a food delivery robot that roamed the streets of the city, beeping its way through crowds to apartment buildings, where it would notify customers that their Bahn Mi, or hot ramen, or cold cuts of meat, or vegetable box, had arrived. Its role was a seemingly simple one: spend all day and night retrieving goods from different businesses and conveying them to consumers. But its job was very difficult from an AI standpoint – streets would change according to the need for road maintenance or the laying of further communication cables, businesses would lose signs or change signs or have their windows smashed, fashions would change which would alter the profile of each person in a street scene, and climactic shocks meant the weather was becoming ever stranger and every more unpredictable. So to save costs and increase the reliability of the robots the technology companies behind them had been adding more sensors onto the platforms and, once those gains were built-in, working out how to incorporate artificial intelligence techniques to increase efficiency further. A few years ago computational resources became cheap and widely available enough for them to begin re-training each robot based on its own data as well as data from others. They didn’t do this in a purely supervised way, either, instead they had each robot learn to simulate its own model of the world – in this case, a specific region of a city – it worked in, letting it imagine the streets around itself to give it greater abilities relating to route-finding and re-orientation, adapting to unexpected events, and so on.

So now to be able to understand anything about the body that had been found the detective needed to understand the world model of the robot and see if it had significantly changed at any point during the previous day or so. Which is how he found himself staring at a gigantic wall of computer monitors, each showing a different smeary kaleidoscopic vision of a street scene. The detective had access to a control panel that let him manipulate the various latent variables that conditioned the robot’s world model, allowing him to move certain dials and sliders to figure out which things had changed, and how.

The detective knew he was onto something when he found the smear. At first it looked like an error – some kind of computer vision artifact – but as he manipulated various dials he saw that, at 1115 the previous night, the robot had updated its own world model with a new variable that looked like a black smudge. Except this black smudge was only superimposed on certain people and certain objects in the world, and as he moved the slider around to explore the smear, he found that it had strong associations to two other variables – red three-wheeled motorcycles, and men running. The detective pulled all the information about the world model and did some further experiments and added this to the evidence log.

Later, during prosecution, the robot was physically wheeled into the courtroom where the trial was taking place, mostly as a prop for the head prosecutor. The robot hadn’t seen anything specific itself – its sensors were not good enough to have picked anything admissible up. But as it had been in the area it had learned of the presence of this death through a multitude of different factors it had sensed, ranging from groups of people running toward where the accident had occurred, to an increase in pedestrian phone activity, to the arrival of sirens, and so on. And this giant amount of new sensory information had somehow triggered strong links in its world model with three-wheeled motorcycles and running men. Armed with this highly specific set of factors the police had trawled all the nearby security cameras and sensors again and, through piecing together footage from eight different places, had found occasional shots of men running towards a three-wheeled motorcycle and speeding, haphazardly, through the streets. After building evidence further they were able to get a DNA match. The offenders went to prison and the mystery of the body was (partially) solved. Though the company that made the AI for the robot made no public statements regarding the case, it subsequently used the case in private sales materials as case studies for local law enforcement on the surprising ways robots could benefit their town.

Things that inspired this story: Food delivery robots, the notion of jurisdiction, interpretability of imagination, “World Models” by David Ha and Juergen Schmidhuber.