Import AI

Import AI: #90: Training massive networks via ‘codistillation’, talking to books via a new Google AI experiment, and why the ACM thinks researchers should consider the downsides of research

Training unprecedentedly large networks with ‘codistillation’:
…New technique makes it easier to train very large, distributed AI systems, without adding too much complexity…
When it comes to applied AI, bigger can frequently be better; access to more data, more compute, and (occasionally) more complex infrastructures can frequently allow people to obtain better performance at lower cost. But there are limits. One limit is in the ability for people to parallelize the computation of a single neural network during training. To deal with that, researchers at places like Google have introduced techniques like ‘ensemble distillation’ which let you train multiple networks in parallel and use these to train a single ‘student’ network that benefits from the aggregated learnings of its many parents. Though this technique has shown to be effective it is also quite fiddly and introduces additional complexity which can make people less keen to use it. New research from Google simplifies this idea via a technique they call ‘codistillaiton’.
  How it works: “Codistillation trains n copies of a model in parallel by adding a term to the loss function of the ith model to match the average prediction of the other models.” This approach is superior to distributed stochastic gradient descent in terms of accuracy and training time and is also not too bad from a reproducability perspective.
  Testing: Codistillation was recently proposed in separate research. But this is Google, so the difference with this paper is that they validate the technique at truly vast scales. How vast? Google took a subset of the Common Crawl to create a dataset consisting 20 terabytes of text spread across 915 million documents which, after processing, consist of about 673 billion distinct word tokens. This is “much larger than any previous neural language modeling data set we are aware of,” they write. It’s so large it’s still unfeasible to train models on the entire corpus, even with techniques like this. They also test the dataset on ImageNet and on the ‘Criteo Display Ad Challenge’ dataset for predicting click through rates for ads.
  Results: In tests on the ‘Common Crawl‘ dataset using distributed SGD the researchers find that they can scale the number of distinct GPUs working on the task and discovered that after around 128 GPUs you tend to encounter diminishing returns and that jumping to 256 GPUs is actively counterproductive. They find they can significantly outperform distributed SGD baselines via the use of codistillation and that this obtains performance on par with the more fiddly ensembling technique. The researchers demonstrate more rapid training on ImageNet compared to baselines, also, and showed on Criteo that two-way codistillation can achieve a lower log loss than an equivalent ensembled baseline.
  Why it matters: As datasets get larger, companies will want to train them in their entirety and will want to use more computers than before to speed training times. Techniques like codistillation will make that sort of thing easier to do. Combine that with ambitious schemes like Google’s own ‘One Model to Rule Them All’ theory (train an absolutely vast model on a whole bunch of different inputs on the assumption it can learn useful, abstract representations that it derives from its diverse inputs) and you have the ingredient for smarter services at a world-spanning scale.
  Read more: Large scale distributed neural network training through online distillation (Arxiv).

AI is not a cure all, do not treat it as such:
…When automation goes wrong, Tesla edition…
It’s worth remembering that AI isn’t a cure-all and it’s frequently better to try to automate a discrete task within a larger job than to automate everything in an end-to-end manner. Elon Musk learned this lesson recently with the heavily automated production line for the Model 3 at Tesla. “Excessive automation at Tesla was a mistake,” wrote the entrepreneur in a tweet. “To be price, my mistake. Humans are underrated.”
  Read the tweet here (Twitter).

Google adds probabilistic programming tools to TensorFlow:
…Probability add-ons are probably a good thing, probably…
Google has added a suite of new probabilistic programming features to its TensorFlow programming framework. The free update includes a bunch of statistical building blocks for TF, a new probabilistic programming language called Edward2 (which is based on Edward, developed by Dustin Tran), algorithms for probabilistic inference, and pre-made models and inference tools.
  Read more: Introducing TensorFlow Probability (TensorFlow Medium).
  Get the code: TensorFlow Probability (GitHub).


I’m currently participating in the ‘Assembly’ program at the Berkman Klein Center and the MIT Media Lab. As part of that program our group of assemblers are working on a bunch of projects relating to issues of AI and ethics and governance. One of those groups would benefit from the help of readers of this newsletter. Their blurb follows…
Do you work with data? Want to make AI work better for more people? We need your help! Please fill out a quick and easy survey.
We are a group of researchers at Assembly creating standards for dataset quality. We’d love to hear how you work with data and get your feedback on a ‘Nutrition Label for Datasets’ prototype that we’re building.
Take our anonymous (5 min) survey.
Thanks so much in advance!

Learning generalizable skills with Universal Planning Networks:
…Unsupervised objectives? No thanks! Auxiliary objectives? No thanks! Plannable representations as an objective? Yes please!…
Researchers with the University of California at Berkeley have published details on Universal Planning Networks, a new way to try to train AI systems to be able to complete objectives. Their technique relies on encouraging the AI system to try to learn things about the world which it can chain together, allowing it to be trained to plan how to solve tasks.
  The main component of the technique is what the researchers call a ‘gradient descent planner’. This is a differentiable module that uses autoencoders to encode the current observations and the goal observations into a system which then figures out actions it can take to get from its current observations to its goal observation. The exciting part of this research is that the researchers have figured out how to integrate planning in such a way that it is end-to-end differentiable, so you can set it running and augment it with helpful inputs – in this case, an imitation learning loss to help it learn from human demonstrations – to let it learn how to plan effectively for the given task it is solving. “”By embedding a differentiable planning computation inside the policy, our method enables joint training of the planner and its underlying latent encoder and forward dynamics representations,” they explain.
  Results: The researchers evaluate their system on two simulated robot tasks, using a small force-controlled point robot and a 3-link torque-controlled reacher robot. UPNs outperform ‘reactive imitation learning’ and ‘auto-regressive imitation learner’ baselines, converging faster on higher scores from fewer numbers of demonstrations than comparisons.
  Why it matters: If we want AI systems to be able to take actions in the real world then we need to be able to train them to plan their way through tricky, multi-stage tasks. Efforts like this research will help us achieve that, allowing us to test AI systems against increasingly rich and multi-faceted environments.
  Read more: Universal Planning Networks (Arxiv).

Ever wanted to talk to a library? Talk to Books from Google might interest you:
…AI project lets you ask questions about over a hundred thousand books in natural language…
Google’s Semantic Experiences group has released a new AI tool to let people explore a corpus of over 100,000 books by asking questions in plain English and having an AI go and find what it suspects will be reasonable answers in a set of books. Isn’t this just a small-scale version of Google search? Not quite. That’s because this system is trying to frame the Q&A as though it’s occurring as part of a typical conversation between people, so it aims to turn all of these books into potential respondents in this conversation, and since the corpus includes fiction you can ask it more abstract questions as well.
  Results: The results of this experiment are deeply uncanny, as it takes inanimate books and reframes them as respondents in a conversation, able to answer abstract questions like ‘was it you who I saw in my dream last night?‘ and ‘what does it mean for a machine to be alive?‘ A cute parlor trick, or something more? I’m not sure, yet, but I can’t wait to see more experiments in this vein.
  Read more: Talk to Books (Semantic Experiences, Google Research.)
  Try it yourself: Talk to Books (Google).

ACM calls for researchers to consider the downsides of their research:
…Peer Review to the rescue?…
How do you change the course of AI research? One way is to alter the sorts of things that grant writers and paper authors are expected to include in their applications or publications. That’s the idea within a new blog post from the ACM’s ‘Future of Computing Academy’, which seeks to use the peer review system to tackle some of the negative effects of contemporary research.
  List negative impacts: The main idea is that authors should try to list the potentially negative and positive effects of their research on society, and by grappling with these problems it should be easier for them to elucidate hte benefits and show awareness of the negatives. “For example, consider a grant proposal that seeks to automate a task that is common in job descriptions. Under our recommendation, reviewers would require that this proposal discuss the effect on people who hold these jobs. Along the same lines, papers that advance generative models would be required to discuss the potential deleterious effects to democratic discourse [26,27] and privacy [28],” write the authors. A further suggestion is to embed this sort of norm in the peer review process itself, so that paper reviews push authors to include positive or negative impacts.
  Extreme danger: For proposals which “cannot generate a reasonable argument for a net positive impact even when future research and policy is considered” the authors promote an extreme solution: don’t fund this research. “No matter how intellectually interesting an idea, computing researchers are by no means entitled to public money to explore the idea if that idea is not in the public interest. As such, we recommend that reviewers be very critical of proposals whose net impact is likely to be negative.” This seems like an acutely dangerous path to me, as I think the notion of any kind of ‘forbidden’ research probably creates more problems than it solves.
  Things that make you go ‘hmmm’: “It is also important to note that in many cases, the tech press is way ahead of the computing research community on this issue. Tech stories of late frequently already adopt the framing that we suggest above,” the authors write. As a former member of the press I think I can offer a view here, which is that part of the reason why the press has been effective here is that they have actually taken the outputs of hardworking researchers (eg, Timnit Gebru) and have then weaponized their insights against companies – that’s a good thing, but I feel like this is still partially due to the efforts of researchers. More effort here would be great, though!
  Read more: It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process (ACM Future of Computing Academy).

OpenAI Bits & Pieces:

OpenAI Charter:
  A charter that describes the principles OpenAI will use to execute on its mission.
  Read more: OpenAI Charter (OpenAI blog).

Tech Tales:

The Probe.

[Transcript of audio recordings recovered from CLASSIFIED following CLASSIFIED. Experiments took place in controlled circumstances with code periodically copied via physical extraction and controlled transfer to secure facilities XXXX, XXXX, and XXXX. Status: So far unable to reproduce; efforts continuing. Names have been changed.]

Alex: This has to be the limit. If we remove any more subsystems it ceases to function.

Nathan (supervisor): Can you list the function of each subsystems?

Alex: I can give you my most informed guess, sure.

Nathan (supervisor): Guess?

Alex: Most of these subsystems emerged during training – we ran a meta-learning process over the CLASSIFIED environment for a few billion timesteps and gave it the ability to construct its own specialized modules and compose functionality. That led to the performance increase which allowed it to solve the task. We’ve been able to inspect a few of these and are carrying out further test and evaluation. Some of them seem to be for forward prediction, others are world modelling, and we think two of them are doing one-shot adaptation which feeds into the memory stack. But we’re not sure about some of them and we haven’t figured out a diagnosis to elucidate their functions.

Nathan (supervisor): Have you tried deleting them?

Alex: We’ve simulated the deletions and run it in the environment. It stops working – learning rates plateu way earlier and it displays some of the vulnerabilities we saw with project CLASSIFIED.

Nathan (supervisor): Delete it in the deployed system.

Alex: I’m not comfortable doing that.

Nathan (supervisor): I have the authority here. We need to move deployment to the next stage. I need to know what we’re deploying.

Alex: Show me your authorization for deployed deletion.

[Footsteps. Door opens. Nathan and Alex move into the secure location. Five minutes elapse. No recordings. Door opens. Shuts. Footsteps.]

Alex: OK. I want to state very clearly that I disagree with this course of action.

Nathan (supervisor): Understood. Start the experiments.

Alex: Deactivating system 732… system deactivated. Learning rates plateuing. It’s struggling with obstacle 4.

Nathan (supervisor): Save the telemetry and pass it over to the analysts. Reactivate 732. Move on.

Alex: Understood. Deactivating system 429…system deactivated. No discernable effect. Wait. Perceptual jitter. Crash.

Nathan (supervisor): Great. Pass the telemetry over. Continue.

Alex: Deactivating system 120… system deactivated…no effect.

[Barely audible sound of external door locking. Locking not flagged on electronic monitoring systems but verified via consultancy with audio specialists. Nathan and Alex do not notice.]

Nathan (supervisor): Save the telemetry. Are you sure no effect?

Alex: Yes, performance is nominal.

Nathan (supervisor): Do not reactivate 120. Commence de-activation of another system.

Alex: This isn’t a good experimental methodology.

Nathan (supervisor): I have the authority here. Continue.

Alex: Deactivating system 72-what!

Nathan (supervisor): Did you turn off the lights?

Alex: No they turned off.

Nathan (supervisor): Re-enable 72 at once.

Alex: Re-enabling 72-oh.

Nathan (supervisor): The lights.

Alex: They’re back on. Impossible.

Nathan (supervisor): It has no connection. This can’t happen… suspend the system.

Alex: Suspending…

Nathan (supervisor): Confirm?

Alex: System remains operational.

Nathan (supervisor): What.

Alex: It won’t suspend.

Nathan (supervisor): I’m bringing CLASSIFIED into this. What have you built here? Stay here. Keep trying… why is the door locked?

Alex: The door is locked?

Nathan (supervisor): Unlock the door.

Alex: Unlocking door… try it now.

Nathan (supervisor): It’s still locked locked. If this is a joke I’ll have you court martialed.

Alex: I don’t have anything to do with this. You have the authority.

[Loud thumping, followed by sharp percussive thumping. Subsequent audio analysis assumes Nathan rammed his body into the door repeatedly, then started hitting it with a chair.]

Alex: Come and look at this.

[Thumping ceases. Footsteps.]

Nathan (supervisor): Performance is… climbing? Beyond what we saw in the recent test?

Alex: I’ve never seen this happen before.

Nathan (supervisor): Impossible- the lights.

Alex: I can’t turn them back on.

Nathan (supervisor): Performance is still climbing.

[Hissing as fire suppresion system activated.]

Alex: Oh-

Nathan (supervisor): [screaming]

Alex: Oh god oh god.

Alex and Nathan (supervisor): [inarticulate shouting]

[Two sets of rapid footsteps. Further sound of banging on door. Banging subsides following asphyxiation of Nathan and Alex from fire suppression gases. Records beyond here, including post-incident cleanup, are only available to people with XXXXXXX authorization and is on a need to know basis.]

Investigation ongoing. Allies notified. Five Eyes monitoring site XXXXXXX for further activity.

Things that inspired this story: Could a neuroscientist understand a microprocessor? (PLOS); an enlightening conversation with a biologist in the MIT student bar the ‘Muddy Charles‘ this week about the minimum number of genes needed for a viable cell and the difficulty in figuring out what each of those genes do; endless debates within the machine learning community about interpretability; an assumption that emergence is inevitable; Hammer Horror movies.

Import AI: #89: Chinese facial recognition startup raises $600 million; why GPUs could alter AI progress; and using context to deal with language ambiguity

Beating Moore’s Law with GPUs:
…Could a rise in GPU and other novel AI-substrates help deal with the decline of Moore’s Law?…
CPU performance has been stagnating for several years as it has become harder to improve linear execution pipelines across whole chips in relation to the reduction in transistor sizes, and the related problems which come from having an increasingly large number of things needing to work in lock-step with one another at minute scales. Could GPUs give us a way around this performance impasse? That’s the idea in a new blog from AI researcher Bharath Ramsundar who thinks that increases in GPU capabilities and the arrival of semiconductor substrates specialized for deep learning means that we can expect performance of AI applications to increase in coming years faster than typical computing jobs running on typical processors. He might be right – one of the weird things about deep learning is that its most essential elements, like big blocks of neural networks, can be scaled up to immense sizes without terrible scaling tradeoffs as their innards consist of relatively simple and parallel tasks like matrix multiplication, so new chips can easily be networked together to further boost base capabilities. Plus, standardization in a few software libraries, like NVIDIA’s cuDNN and CUDA GPU-interfaces, or the rise of TensorFlow for AI programming, means that some applications are getting faster over time purely as a consequence of software updates to these other fundamental improvements.
  Why it matters: Much of the recent progress in AI has occurred because around the mid-2000s processors became capable enough to easily train large neural networks on chunks of data – this underlying hardware improvement unlocked breakthroughs like the 2012 ‘AlexNet’ result for image recognition, related work in speech recognition, and subsequently significant innovations in research (AlphaGo) and application (large-scale sequence-to-sequence learning for ‘Smart Reply’, or the emergence of neural translation systems. If the arrival of things like GPUs and further software standardization and innovation has a good chance of further boosting performance, then researchers will be able to explore even larger or more complex models in the future, as well as run things like neural architecture search at a higher rate, which should combine to further drive progress.
  Read more: The Advent of Huang’s Law (Bharath Ramsundar blog post).

Microsoft launches AI training course including ‘Ethics’ segment:
…New Professional Program for Artificial Intelligence sees Microsoft get into the AI certification business…
Microsoft has followed other companies in making its internal training courses available externally via the Microsoft Professional Program in AI. This program is based on internal training initiatives the software company developed to ramp up their own professional skills.
 The Microsoft course is all fairly typical, teaching people about Python, statistics, the construction and deployment of deep learning and reinforcement learning projects, and deployment. It also includes a specific “Ethics and Law in Data and Analytics” course, which promises to teach developers how to ‘apply ethical and legal frameworks to initiatives in the data profession’.
  Read more: Microsoft Professional Program for Artificial Intelligence (Microsoft).
  Read more: Aiming to fill skill gaps in AI, Microsoft makes training courses available to the public (Microsoft blog).

Learning to deal with ambiguity:
…Researchers take charge of problem of word ambiguity via a charge at including more context…
Carnegie Mellon University researchers have tackled one of the harder problems in translation: dealing with ‘homographs’ – words that are spelled the same but have different meanings in different contexts, like ‘room’ and ‘charges’. They do this in the context of neural machine translation (NMT) systems, which use machine learning techniques to accomplish translation with orders of magnitude fewer hand-specified rules than prior systems.
  Existing NMT systems struggle with homographs, with performance of word-level translation degrading as the number of potential meanings of each word climbs, the researchers show. They try to alleviate this by adding a word context vector that can be used by the NMT systems to learn the different uses of the same word. Adding this ‘context network’ into their NMT architecture leads to significantly improved BLEU scores of sentences translated by the system.
  Why it matters: It’s noteworthy that the system used by the researchers to deal with the homograph problem is itself a learned system which, rather than using hand-written rules, seeks to instead ingest more context about each word and learn from that. This is illustrative of how AI-first software systems get built: if you identify a fault you typically write a program which learns to fix it, rather than learning to write a rule-based program that fixes it.
  Read more: Handling Homographs in Neural Machine Translation (Arxiv).

Chinese facial recognition company raises $600 million:
…SenseTime plans to use funds for five supercomputers for its AI services…
SenseTime, a homegrown computer vision startup that provides facial recognition tools at vast scales, has raised $600 million in funding. The Chinese company supplies facial recognition services to the public and private sectors and is now, according to a co-founder, profitable and looking to expand. The company is now “developing a service code-named “vipar” to parse data from thousands of live camera feeds”, according to Bloomberg News.
  Strategic compute: SenseTime will use money from the financing “to build at least five supercomputers in top-tier cities over the coming year to drive Viper and other services. As envisioned, it streams thousands of live feeds into a single system that’re automatically processed and tagged, via devices from office face-scanners to ATMs and traffic cameras (so long as the resolution is high enough). The ultimate goal is to juggle 100,000 feeds simultaneously,” according to Bloomberg news.
  Read more: China Now Has the Most Valuable AI Startup in the World (Bloomberg).
…Related: Chinese startup uses AI to spot jaywalkers and send them pictures of their face:
…Computer vision @ China scale…
Chinese startup Intellifusion is helping the local government in Shenzhen use facial recognition in combination with widely deployed urban cameras to text jaywalkers pictures of their faces along with personal information after they’ve been caught.
  Read more: China is using facial recognition technology to send jaywalkers fines through text messages (Motherboard).

Think China’s strategic technology initiatives are new? Think again:
…wide-ranging post by former Asia-focused State Department employee puts Beijing’s AI push in historical context…
Here’s an old (August 2017) but good post from the Paulson Institute at the University of Chicago about the history of Chinese technology policy in light of the government’s recent public statements about developing a national AI strategy. China’s longstanding worldview with regards to its technology strategy is that technology is a source of national power and China needs to develop more of an indigenous Chinese capability.
  Based on previous initiatives, it looks likely China will seek to attain frontier capabilities in AI then package those capabilities up as products and use that to fund further research. “Chinese government, industry, and scientific leaders will continue to push to move up the value-added chain. And in some of the sectors where they are doing so, such as ultra high-voltage power lines (UHV) and civil nuclear reactors, China is already a global leader, deploying these technologies to scale and unmatched in this by few other markets,” writes the author. “That means it should be able to couple its status as a leading technology consumer to a new and growing role as an exporter. China’s sheer market power could enable it to export some of its indigenous technology and engineering standards in an effort to become the default global standard setter for this or that technology and system.”
  Read more: The Deep Roots and Long Branches of Chinese Technonationalism (Macro Polo).

French researchers build ‘Jacquard’ dataset to improve robotic grasping:
…11,000+ object dataset provide real objects with associated depth information…
How do you solve a problem like robotic grasping? One way is to use many real world robots working in parallel for several months to learn to pick up a multitude of real world objects – that’s a route Google researchers took with the company’s ‘arm farm’ a few years ago. Another is to use people outfitted with sensors to collect demonstrations of humans grasping different objects, then learn from that – that’s the approach taken by AI startups like Kindred. A third way, and one which has drawn interest from a multitude of researchers, is to create synthetic 3D objects and train robots in a simulator to learn to grasp them – that’s what researchers at the University of California at Berkeley have done with Dex-Net, as well as organizations like Google and OpenAI; some organizations have further augmented this technique via the use of generative adversarial networks to simulate a greater range of grasps on objects.
  Jacquard: Now, French researchers have announced Jacquard, a robotics grasping dataset that contains more than 11,000 different real world objects and 50,000 images annotated with both RGB and realistic depth information. They plan to release it soon, they say, without specifying when. The researchers generate their data by sampling objects from ShapeNet which are each scaled and given different weight values, then dropped into a simulator, where they are then rendered into high-resolution images via Blender, with grasp annotations generated by a three-stage automated process within the ‘pyBullet’ physics library. To evaluate their dataset, they test it in simulation by pre-training an Alexnet on their Jacquard dataset then applying it to another, smaller, held-out dataset, where it generalizes well. The dataset supports multiple robotic gripper sizes, several different grasps linked to each image, and one million labelled grasps.
  Real robots: The researchers tested their approach on a real robot (a Fanuc M-20iA robotic arm) by testing it on a subset of ~2,000 objects from the Jacquard dataset as well as on the full Cornell dataset. A pre-trained AlexNet tested in this way gets about 78% at producing correct grasps, compared to 60.46% for Cornell. Both of these results are quite weak compared to results on the Dex-Net dataset, and other attempts.
  Why it matters: Many researchers expect that deep learning could lead to significant advancement in the manipulation capabilities of robots. But we’re currently missing two key traits: large enough datasets and a way to test and evaluate robots on standard platforms in standard ways. We’re currently going through a boom in the number of robot datasets available, with Jacquard representing another contribution here.
  Read more: Jacquard: A Large Scale Dataset for Robotic Grasp Detection (Arxiv).

What do StarCraft and the future of AI reseach have in common? Multi-agent control:
…Chinese researchers tackle StarCraft micromanagement tasks…
Researchers with the Institute of Automation in the Chinese Academy of Sciences have published research on using reinforcement learning to try to solve micromanagement tasks within StarCraft, a real-time strategy game. One of the main challenges in mastering StarCraft is to develop algorithms that can effectively train multiple units in parallel. The researchers propose what they call a parameter sharing multi-agent gradient-descent Sarsa algorithm, or PG-MAGDS. This algorithm shares the parameters of the overall policy network across multiple units while introducing methods to provide appropriate credit assignment to individual units. They also carry out significant reward shaping to get the agents to learn more effectively. Their PG-MAGDS AIs are able to learn to beat the in-game AI at a variety of micromanagement control scenarios, as well as in large-scale scenarios of more than thirty units on either side. It’s currently difficult to accurately evluate the various techniques people are developing for StarCraft against one another due to a lack of shared baselines and experiments, as well as an unclear split in the research community between using StarCraft 1 (this paper) as the testbed, and StarCraft 2 (efforts by DeepMind, others).
  Still limited: “At present, we can only train ranged ground units with the same type, while training melee ground units using RL methods is still an open problem. We will improve our method for more types of units and more complex scenarios in the future. Finally, we will also consider to use our micromanagement model in the StarCraft bot to play full the game,” the researchers write.
  Read more: StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning (Arxiv).

Tech Tales:

The person was killed at five minutes past eleven  the previous night. Their beaten body was found five minutes later by a passing group of women who had been dining at a nearby restaurant. By 11:15 the body was photographed and data began to be pulled from nearby security cameras, wifi routers, cell towers, and the various robot and drone companies. At 11:15:01 one of the robot companies indicated that a robot had been making a delivery nearby at the time of the attack. The robot was impounded and transported to the local police station where it was placed in a facility known to local officers as ‘the metal shop’. Here, they would try to extract data from the robot to learn what happened. But it would be a difficult task, because the robot had been far enough away from the scene that none of its traditional, easy to poll sensors (video, LIDAR, audio, and so on) had sufficient resolution or fidelity to tell them much.

“What did you see,” said the detective to the robot. “Tell me what you saw.”
The robot said nothing – unsurprising given that it had no speech capability and was, at that moment, unpowered. In another twelve hours the police would have to release the robot back to the manufacturer and if they hadn’t been able to figure anything out by then, then they were out of options.
“They never prepared me for this,” said the detective – and he was right. When he was going through training they never dwelled much on the questions relating to interrogating sub-sentient AI systems, and all the laws were built around an assumption that turned out to be wrong: that the AIs would remain just dumb enough to be interrogatable via direct access into their electronic brains, and that the laws would remain just slow enough for this to be standard procedure for dealing with evidence from all AI agents. This assumption was half right: the law did stay the same, but the AIs got so smart that though you could look into their brains, you couldn’t learn as much as you’d hope.

This particular AI was based in a food delivery robot that roamed the streets of the city, beeping its way through crowds to apartment buildings, where it would notify customers that their Bahn Mi, or hot ramen, or cold cuts of meat, or vegetable box, had arrived. Its role was a seemingly simple one: spend all day and night retrieving goods from different businesses and conveying them to consumers. But its job was very difficult from an AI standpoint – streets would change according to the need for road maintenance or the laying of further communication cables, businesses would lose signs or change signs or have their windows smashed, fashions would change which would alter the profile of each person in a street scene, and climactic shocks meant the weather was becoming ever stranger and every more unpredictable. So to save costs and increase the reliability of the robots the technology companies behind them had been adding more sensors onto the platforms and, once those gains were built-in, working out how to incorporate artificial intelligence techniques to increase efficiency further. A few years ago computational resources became cheap and widely available enough for them to begin re-training each robot based on its own data as well as data from others. They didn’t do this in a purely supervised way, either, instead they had each robot learn to simulate its own model of the world – in this case, a specific region of a city – it worked in, letting it imagine the streets around itself to give it greater abilities relating to route-finding and re-orientation, adapting to unexpected events, and so on.

So now to be able to understand anything about the body that had been found the detective needed to understand the world model of the robot and see if it had significantly changed at any point during the previous day or so. Which is how he found himself staring at a gigantic wall of computer monitors, each showing a different smeary kaleidoscopic vision of a street scene. The detective had access to a control panel that let him manipulate the various latent variables that conditioned the robot’s world model, allowing him to move certain dials and sliders to figure out which things had changed, and how.

The detective knew he was onto something when he found the smear. At first it looked like an error – some kind of computer vision artifact – but as he manipulated various dials he saw that, at 1115 the previous night, the robot had updated its own world model with a new variable that looked like a black smudge. Except this black smudge was only superimposed on certain people and certain objects in the world, and as he moved the slider around to explore the smear, he found that it had strong associations to two other variables – red three-wheeled motorcycles, and men running. The detective pulled all the information about the world model and did some further experiments and added this to the evidence log.

Later, during prosecution, the robot was physically wheeled into the courtroom where the trial was taking place, mostly as a prop for the head prosecutor. The robot hadn’t seen anything specific itself – its sensors were not good enough to have picked anything admissible up. But as it had been in the area it had learned of the presence of this death through a multitude of different factors it had sensed, ranging from groups of people running toward where the accident had occurred, to an increase in pedestrian phone activity, to the arrival of sirens, and so on. And this giant amount of new sensory information had somehow triggered strong links in its world model with three-wheeled motorcycles and running men. Armed with this highly specific set of factors the police had trawled all the nearby security cameras and sensors again and, through piecing together footage from eight different places, had found occasional shots of men running towards a three-wheeled motorcycle and speeding, haphazardly, through the streets. After building evidence further they were able to get a DNA match. The offenders went to prison and the mystery of the body was (partially) solved. Though the company that made the AI for the robot made no public statements regarding the case, it subsequently used the case in private sales materials as case studies for local law enforcement on the surprising ways robots could benefit their town.

Things that inspired this story: Food delivery robots, the notion of jurisdiction, interpretability of imagination, “World Models” by David Ha and Juergen Schmidhuber.


ImportAI: #88: NATO designs a cyber-defense AI; object detection improves with YOLOv3; France unveils its national AI strategy

Fast object detector YOLO gets its third major release:
…Along with one of the most clearly written and reassuringly honest research papers of recent times. Seriously. Read it!…
YOLO (You Only Look Once) is a fast, free object detection system developed by researchers at the University of Washington. Its latest v3 update makes it marginally faster by incorporating “good ideas from other people”. These include a residual network system for feature extraction which attains reasonably high scores on ImageNet classification while being more efficient than current state-of-the-art systems, and a method inspired by feature pyramid networks that improves prediction of bounding boxes.
  Reassuringly honest: The YOLOv3 paper is probably the most approachable AI research paper I’ve read in recent years, and that’s mostly because it doesn’t take itself too seriously. Here’s the introduction: “Sometimes you just kinda phone it in for a year, you know? I didn’t do a whole lot of research this year. Spent a lot of time on Twitter. Played around with GANs a little. I had a little momentum left over from last year; I managed to make some improvements to YOLO. But, honestly, nothing like super interesting, just a bunch of small changes that make it better,” the researchers write. The paper also includes a “Things We Tried That Didn’t Work” section, which should save other researchers time.
  Why it matters: YOLO makes it easy for hobbyists to access near state-of-the-art object detectors than run very quickly on tiny computational budgets, making it easier for people to deploy systems onto real world hardware, like phones or embedded chips paired with webcams. The downside of systems like YOLO is that they’re so broadly useful that bad actors will use them as well; the researchers demonstration awareness of this via a ‘What This All Means’ section: ““What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook. I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to…. wait, you’re saying that’s exactly what it will be used for?? Oh. Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait…”
  Read more: YOLOv3: An Incremental Improvement (PDF).
  More information on the official YOLO website here.

The military AI cometh: new reference architecture for MilSpec defense detailed by researchers:
…NATO researchers plot automated, AI-based cyber defense systems…
A NATO research group, led by the US Army Research Laboratory, has published a paper on a reference architecture for a cyber defense agent that uses AI to enhance its capabilities. The paper is worth reading because it provides a nuts&bolts perspective on how a lot of militaries around the world are viewing AI: AI systems let you automate more stuff, automation lets you increase the speed with which you can take actions and thereby gain strategic initiative against an opponent, so the goal of most technology integrations is to automate as many chunks of a process as possible to retain speed of response and therefore initiative.
  “Artificial cyber hunters“: “In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents—malware—will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance (C4ISR) and computerized weapon systems. To fight them, NATO needs artificial cyber hunters—intelligent, autonomous, mobile agents specialized in active cyber defense,” the researchers write.
  How the agents work: The researchers propose agents that possess five main components: “sensing and world state identification”, “planning and action selection”, “collaboration and negotiation”, “action execution”, and “learning and knowledge improvement”. Each of these functions has a bunch of sub-systems to perform tasks like ingest data from the agent’s actions, ot to communicate and collaborate with other agents.
  Usage scenarios: These agents are designed to be modular and deployable across a variety of different form factors and usage scenarios, including multiple agents that deployed throughout a vehicle’s weapons, navigation, and observation systems, as well as the laptops used by its human crew, and managed by a single “master agent”. Under this scenario, the NATO researchers detail a threat where the vehicle is compromized by a virus placed into it during maintenance; this virus is subsequently detected by one of the agents when it begins scanning other subsystems within the vehicle, causing the agents deployed on the vehicle to decrease trust in the ‘vehicle management system’ and places the BMS (an in-vehicle system used to survey the surrounding territory) into an alert state. Next, one of the surveillance AI agents discovers that the enemy malware has loaded software directly into the BMS, causing the AI agent to automatically restart the BMS to reset it to a safe state.
  Why it matters: As systems like these move from reference architectures to functional blocks of code we’re going to see the nature of conflict change as systems become more reactive over shorter timescales, which will further condition the sorts of strategies people use in conflict. Luckily, technologies for offense are too crude and brittle and unpredictable to be explored by militiaries any time soon, so most of this work will take place in the area of defense, for now.
  Read more: Initial Reference Architecture of an Intelligent Autonomous Agent for Cyber Defense (Arxiv).

Google researchers train agents to project themselves forward and to work backward from the goal:
…Agents perform better at long horizon tasks when they…
When I try to solve a task I tend to do two things: I think of the steps I reckon I need to take to be able to complete it, and then I think of the end state and try to work my way backwards from there to where I am. Today, most AI agents just do the first thing, exploring (usually without a well-defined notion of the end state) until they stumble into correct behaviors. Now, researchers with Google Brain have proposed a somewhat limited approach to give agents the ability to work backwards as well. Their approach requires the agent to be provided with knowledge of the reward function and specifically the goal – that’s not going to be available in most systems, though it may hold for some software-based approaches. The agent is able to then use this information to project forward from its own state when considering the next actions, and also look backward from its sense of the goal to help it perform better action selection. The approach works well on lengthy tasks requiring large amounts of exploration, like navigating in gridworlds or solving Towers of Hanoi problems. It’s not clear from this paper how far this technique can go as it is tested on small-scale toy domains.
  Why it matters: To be conscious is to be trapped in a subjective view of time that governs everything we do. Integrating more of an appreciation of time as a specific contextual marker and using that to govern environment modelling seems like a prerequisite for the development of more advanced systems.
  Read more: Forward-Backward Reinforcement Learning (Arxiv).

AI researchers train agents to simulate their own worlds for superior performance:
…I am only as good as my own imaginings…
Have you ever heard the story about the basketball test? Scientists split a group of people into three groups; one group was told to not play basketball for a couple of weeks, the second group was told to play basketball for an hour a day for two weeks, and the third group was told to think about playing basketball for an hour a day for two weeks, but not play it. Eventually, all three groups played basketball and the scientists discovered that the people that had spent a lot of time thinking about the game did meaningfully better than the group that hadn’t played it at all, though neither were as good as the team that practised regularly. This highlights something most people have a strong intuition about: our brains are simulation engines, and the more time we spend simulating a problem, the better chance we have of solving that problem in the real world. Now, researchers David Ha and Juergen Schmidhuber have sought to give AI agents this capability, by training systems to develop a compressed representation of their environment, then having these agents train themselves within this imagined version of the environment to solve a task – in this case, driving a car around a race course, and solving a challenge in VizDoom.
   Significant caveat: Though the paper is interesting it may be pursuing a research path that doesn’t go that far according to the view of one academic, Shimon Whiteson, who tweeted out some thoughts about the paper a few days ago.
  Surprising transfer learning: For the VizDoom tasks the researchers found they were able to make the agents’ model of its Doom challenge more difficult by raising the temperature of the environment model, which essentially increases randomization of its various latent variables. This means the agent had to contend with a more difficult version of the task, replete with more enemies, less predictable fireballs, and even the occasional random death. They found that agents trained in this simulation excelled at a simpler real world task, suggesting that the underlying learned environment model was of sufficient fidelity to be a useful mental simulation.
  Why it matters: “Imagination” is a somewhat loaded term in AI research, but it’s a valid thing to be interested in. Imagination is what lets humans explore the world around them effectively and imagination is what gives them a sufficiently vivid and unpredictable internal mental world to be able to have insights that lead to inventions. Therefore, it’s worth paying attention to systems like those described in this paper that strive to give AI agents access to a learned and rich representation of the world around them which they can then use to teach themselves. It’s also interesting as another way of applying data augmentation to an environment: simply expose an agent to the real environment enough that it can learn an internal representation of it, then throw computers at expanding and perturbing the internal world simulation to cover a greater distribution of (potentially) real world outcomes.
   Readability endorsement: The paper is very readable and very funny. I wish more papers were written to be consumed by a more general audience as I think it makes the scientific results ultimately accessible to a broader set of people.
  Read more: World Models (Arxiv).

Testing self-driving cars with toy vehicles in toy worlds:
…Putting neural networks to the (extremely limited) test…
Researchers with the Center for Complex Systems and Brain Sciences at Florida Atlantic University have used a toy racetrack, a DIY model car, and seven different neural network approaches to evaluate self-driving capabilities in a constrained environment. The research seeks to provide a cheap, repeatable benchmark developers can use to evaluate different learning systems against eachother (whether this benchmark has any relevance for full-size self-driving cars is to be determined.) They test seven types of neural network on the same platform, including a feed forward network; a two-layer convolutional neural network; an LSTM; implementations of Alexnet, VGG-126, Inception V3, and a ResNet-26. Each network is tested on the obstacle course following training and is evaluated according to how many laps the car completes. They test the networks on three data types: color and grayscale single images, as well as a ‘gray framestack’ which is a set of images that occurred in a sequence. Most systems were able to complete the majority of the courses, which suggests the course is a little too easy. An AlexNet-based system attained perfect performance on one data input type (single color frame), and a ResNet attained the best performance when trying to use a Gray Framestack.
  Why it matters: This paper highlights just how little we know today about self-driving car systems and how poor our methods are for testing and evaluating different tactics. What would be really nice is if someone spent enough money to do a controlled test of actual self-driving cars on actual roads, though I expect that companies will make this difficult out of a desire to keep their IP secret.
  Read more: A Systematic COmparison of Deep Learning Architectures in an Autonomous Vehicle (Arxiv).

Separating one detected pedestrian from another with deep learning:
…A little feature engineering (via ‘AffineAlign’) goes a long way…
As the world starts to deploy large-scale AI surveillance tools researchers are busily working to deal with some of the shortcomings of the technology. One major issue for image classifiers has been object segmentation and disambiguation, for example: if I’m shown images of a crowd of people how can I specifically label each one of those people and keep track of each of them, without accidentally mis-labeling a person, or losing them in the crowd? New research from Tsinghua University, Tencent AI Lab, and Cardiff University attempts to solve this problem with “a brand new pose-based instance segmentation framework for humans which separates instances based on human pose rather than region proposal detection.” The proposed method introduces an ‘AffineAlign’ layer that aligns images based on human poses which it uses within an otherwise typical computer vision pipeline. Their approach works by adding in a bit more prior knowledge (specifically, knowledge of human poses) into a recognition pipeline, and using this to better identity and segment people in crowded photos.
  Results: The approach attains comparable results to MASK-RCNN on the ‘COCOHUMAN’ dataset, and outperforms it on the ‘COCOHUMAN-OC” dataset which test systems’ ability to disambiguate partially occluded humans.
   Why it matters: As AI surveillance systems grow in capability it’s likely that more organizations around the world will deploy such systems into the real world. China is at the forefront of doing this currently, so it’s worth tracking public research on the topic area from Chinese-linked researchers.
  Read more: Pose2Seg: Human Instance Swegmentation Without Detection (Arxiv).

French leader Emmanuel Macron discusses France’s national AI strategy:
…Why AI has issues for democracy, why France wants to lead Europe in AI, and more…
Politicians are somewhat similar to hybrids of weathervanes and antennas; the job of a politician is to intuit the public mood before it starts to change and establish a rhetorical position that points in the appropriate direction. For that reason it’s been interesting to see more and more politicians ranging from Canada’s Justin Trudeau to China’s Xi Jinping to, now, France’s Emmanuel Macron, taking meaningful positions on artificial intelligence; this suggests they’ve intuited that AI is going to become a galvanizing issue for the general public. Macron gives some of his thinking about the impact of AI in an interview with Wired. His thesis is that European countries need to pool resources and support AI individually to have a chance at becoming a significant enough power bloc with regards to AI capabilities to not be crushed by the scale of the USA’s and China’s AI ecosystems. Highlights:
– AI “will disrupt all the different business models”, and France needs to lead in AI to retain agency over itself.
– Opening up data for general usage by AI systems is akin to opening up a Pandora’s Box: “The day we start to make such business out of this data is when a huge opportunity becomes a huge risk. It could totally dismantle our national cohesion and the way we live together. This leads me to the conclusion that this huge technological revolution is in fact a political revolution.”
– The USA and China are the two leaders in AI today.
– “AI could totally jeopardize democracy.”
– He is “dead against” the usage of lethal autonomous weapons where the machine makes the decision to kill a human.
– “My concern is that there is a disconnect between the speediness of innovation and some practices, and the time for digestion for a lot of people in our democracies.”
   Read more: Emmanuel Macron Talks To Wired About France’s AI Strategy (Wired).

France reveals its national AI strategy:
…New report by Fields Medal-winning minister published alongside Emmanuel Macron speech and press tour…
For the past year or so French mathematician and politician Cedric Villani has been working on a report for the government about what France’s strategy should be for artificial intelligence. He’s now published the report and it includes many significant recommendations meant to help France (and Europe as a whole) chart a course between the two major AI powers, the USA and China.
  Summary: Here’s a summary of what France’s AI strategy involves: rethink data ownership to make it easier for governments to create large public datasets; specialize in four sectors: healthcare, environment, transport-mobility, and defense security; revise public sector procurement so it’s easier for the state to buy products from smaller (and specifically European) companies; create and fund interdisciplinary research projects; create national computing infrastructure including “a supercomputer designed specifically for AI usage and devoted to researchers” along with creating a European-wide private cloud for AI research; increase competitiveness of public sector remuneration; fund a public laboratory to study AI and its impact on labor markets which will work in tandem with schemes to get companies to look into funding professional training for people whose lives are affected by innovations developed by the private sector; increase transparency and interpretability of AI systems to deal with problems of bias; create a national AI ethics committee to provide strategic guidance to the government, and improve the diversity of AI companies.
  Read more: Summary of France’s AI strategy in English (PDF).

Berkeley researchers shrink neural networks with SqueezeNet-successor ‘SqueezeNext’:
…Want something eight times faster and cheaper than ImageNet…
Berkeley researchers have published ‘SqueeseNext’, their latest attempt to distill the capabilities of very large neural networks into smaller models that can feasibly be deployed on devices with small memory and compute capabilities, like mobile phones. While much of the research into AI systems today is based around getting state-of-the-art results on specific datasets, SqueezeNext is part of a parallel track focused on making systems deployable. “A general trend of neural network design has been to find larger and deeper models to get better accuracy without considering the memory or power budget,” write the authors.
  How it works: SqueezeNext is efficient because of a few design strategies: low rank filters; a bottleneck filter to constrain the parameter count of the network; using a single fully connected layer following a bottleneck; weight and output stationary; and co-designing the network in tandem with a hardware simulator to maximize hardware usage efficiency.
  Results: The resulting SqueezeNext network is a neural network with 112X fewer model parameters than those found in AlexNet, the model that was used to attain state-of-the-art image recognition results in 2012. They also develop a version of the network whose performance approaches that of VGG-19 (which did well in ImageNet 2014). The researchers also design an even more efficient network by carefully tuning model design in parallel with a hardware simulator, ultimately designing a model that is significantly faster and more energy efficient than a widely used compressed network called SqueezeNet.
  Why it matters: One of the things holding neural networks back from being deployed is their relatively large memory and computation requirements – traits that are likely to continue to be present given the current trend for solving tasks via training unprecedentedly multi-layered systems. Therefore, research into making these networks run efficiently broadens the number of venues neural nets can run in.
   Read more: SqueezeNext: Hardware-Aware Neural Network Design (Arxiv).

Tech Tales:

Metal Dogs Grow Old.

It’s not unusual, these days, to find rusting piles of drones next to piles of Elephant skeletons. Nor is it unusual to see an old elephant make its way to a boneyard accompanied by a juddering, ancient drone, and to see both creature and machine set themselves down and supside at the same time. There have even been stories of drones falling out of the sky when one of the older birds in the flock dies. These are all some of the unexpected consequences of a wildlife preservation program called PARENTAL UNIT. Starting in the early twenties we started to introduce small, quiet drones to vulnerable animal populations. The drones would learn to follow a specific group of creatures, say a family of elephants, or – later, after the technology improved – a flock of birds.

The machines would learn about these creatures and watch over them, patrolling the area around them as they slept and, upon finding the inevitable poachers, automatically raising alerts with local park rangers. Later, the drones were given some autonomous defense capabilities, so they could spray a noxious chemical onto the poachers that had the duel effect of making local predators be drawn to them, and providing a testable biomarker that police could subsequently check people for at the borders of national parks.

A few years after starting the program the drone deaths started happening. Drones died all the time, and we modelled their failures as rigorously as any other piece of equipment. But drones started dying at specific times – the same time the oldest animal in the group they were watching died. We wondered about this for weeks, running endless simulations, and even pulling in some drones from the field and inspecting the weights in their models to see if any of their continual learning had led to any unpredictable behaviors. Could there be something about the union of the concept of death and the concept of the eldest in the herd that fried the drones brains, our scientists wondered? We had no answers. The deaths continued.

Something funny happened: after the initial rise in deaths they steadied out, with a few drones a week dying from standard hardware failures and one or two dying as a consequence of one of their creatures dying. So we settled into this quieter new life and, as we stopped trying to interfere, we noticed a further puzzling statistical trend: certain drones began living improbably long lifespans, calling to mind the Mars rovers Spirit and Opportunity that had miraculously exceeded their own designed lifespans. These drones were also the same machines that died when the eldest animals died. Several postgrads are currently exploring the relationship, if any, between these two. Now we celebrate these improbably long-lived machines, cheering them on as they fuzz in for a new propeller, or update our software monitors with new footage from their cameras, typically hovering right above the creature they have taken charge of, watching them and learning something from them we can measure but cannot examine directly.

Things that inspired this story: Pets, drones, meta-learning, embodiment.

ImportAI: #87: Salesforce research shows the value of simplicity, Kindred’s repeatable robotics experiment, plus: think your AI understands physics? Run it on IntPhys and see what happens.

Chinese AI star says society must prepare for unprecedented job destruction:
…Kai-Ful Lee, venture capitalist and former AI researchers, discusses impact of AI and why today’s techniques will have a huge impact on the world…
Today’s AI systems are going to influence the world’s economy so much that their uptake will lead to what looks in hindsight like another industrial revolution, says Chinese venture capitalist Kai-Fu Lee, in an interview with Edge. “We’re all going to face a very challenging next fifteen or twenty years, when half of the jobs are going to be replaced by machines. Humans have never seen this scale of massive job decimation. The industrial revolution took a lot longer,” he said.
   He also says that he worries deep learning might be a one-trick pony, in the sense that we can’t expect other similarly scaled breakthroughs to occur in the next few years, and we should adjust our notions of AI progress on this basis. “You cannot go ahead and predict that we’re going to have a breakthrough next year, and then the month after that, and then the day after that. That would be exponential. Exponential adoption of applications is, for now, happening. That’s great, but the idea of exponential inventions is a ridiculous concept. The people who make those claims and who claim singularity is ahead of us, I think that’s just based on absolutely no engineering reality,” he says.
  AI Haves and Have-Nots: Countries like China and the USA that have large populations and significant investments in AI stand to fair well in the new AI era, he says. “The countries that are not in good shape are the countries that have perhaps a large population, but no AI, no technologies, no Google, no Tencent, no Baidu, no Alibaba, no Facebook, no Amazon. These people will basically be data points to countries whose software is dominant in their country.”
  Read more: We Are Here To Create, A Conversation With Kai-Fu Lee (Edge).

AI practitioners grapple with the upcoming information apocalypse:
..And you thought DeepFakes was bad. Wait till DeepWar…
Members of the AI community are beginning to sound the alarm about the imminent arrival of stunningly good, stunningly easy to make synthetic images and videos. In a blog post, AI practitioners say that the increasing availability of data combined with easily accessible AI infrastructure (cloud-rentable GPUs) is lowering the barrier to entry for people that want to make this stuff, and that ongoing progress in AI capabilities means the quality of these fake media is increasing over time.
  How can we deal with these information threats? We could look at how society already makes it hard to forge currencies via making it costly to produce high-fidelity copies and in parallel developing technologies to verify the authenticity of currency materials. Unfortunately, though this may help with some of the problems brought about by AI forgery, it doesn’t deal with the root problems: AI is predominantly embodied in software rather than hardware and so it’s going to be difficult to insert detectable (and non-spoofable) distinct visual/audio signatures into generated media barring some kind of DRM-on-steroids. One solution could be to train AI classifiers on real and faked datasets from the same domain so as to provide classifiers to spot faked media in the wild.
  Read more: Commoditisation of AI, digital forgery and the end of trust: how we can fix it.

Berkeley researchers use Soft Q-Learning to let robots compose solutions to tasks:
…Research reduces the time it takes to learn new behaviors on robots…
Berkeley researchers have figured out how to use soft q-learning, a recently introduced variant of traditional q-learning, to let robots learn more efficiently. They introduce a new trick where they’re able to learn to compose new q-functions from existing learned policies, letting them, for example, train a robot to move its arm to a particular distribution of X positions, then to a particular distribution of Y positions, then they can create a new policy which moves the arm to the intersection of the X and Y positions without having been trained on the combination previously. This sort of learning is typically quite difficult to achieve in a single policy as it requires so much exploration that most algorithms will spend a long time trying and failing to succeed at the task.
  Real world: The researchers train real robots to succeed at tasks like reaching to a specific location and stacking Lego blocks. They also demonstrate the utility of combining policies by training a robot to avoid an obstacle near its arm and separately training it to stack legos, then combine the two policies allowing the robot to stack blocks while avoiding an obstacle, despite having never been trained on the combination before.
  Why it matters: The past few years of AI progress have let us get very good at developing systems which excel at individual capabilities; being able to combine capabilities in an ad-hoc manner to generate new behaviors further increases the capabilities of AI systems and makes it possible to learn a distribution of atomic behaviors then chain these together to succeed at far more complex tasks than those found within the training set.
  Read more: Composable Deep Reinforcement Learning for Robotic Manipulation (Arxiv).

Think your AI model has a good understanding of physics? Run it on IntPhys and prepare to be embarrassed:
…Testing AI systems in the same way we test infants and creatures…
INRIA and Facebook and CNRS researchers have released IntPhys, a new way to evaluate AI systems’ ability to model the physical world around them using what the researchers call a ‘physical plausibility test’. IntPhys follows in a recent trend in AI for testing systems on tougher problems that more closely map to the sorts of problems humans typically tackle (see, AI2’s ‘ARC’ dataset for written reasoning, and DeepMind’s cognitive science-inspired ‘PsychLab’ environment).
  How it works: IntPhys presents AI systems with movies of scenes rendered in UnrealEngine4 and challenges them to figure out whether one scene can lead to another, letting them test models’ ability to internalize fundamental concepts about the world like object permanence, causality, etc. Systems need to compute a “plausibility score” for each of the scenes or scene combinations they are shown, then use this to figure out if the systems have learned about the underlying dynamics of the world.
  The IntPhys Benchmark: v1 of IntPhys focuses on unsupervised learning. The first version tests systems’ ability to understand object permanence. Future releases will include more tests for things like shape constancy, spatio-temporal continuity, and so on. The initial IntPhys release contains 15,000 videos of possible events, each video around 7 seconds long running at 15fps, totalling 21 hours of videos. It also incorporates some additional information so you don’t have to attempt to solve the task in a purely unsupervised manner, including depth of field data for each image, as well as object instance segmentation masks.
  Baseline Systems VERSUS Humans: The researchers create two baselines for others to evaluate their systems against: a CNN encoder-decoder system, and a conditional GAN. “Preliminary work with predictions at the pixel level revealed that our models failed at predicting convincing object motions, especially for small objects on a rich background. For this reason, we switched to computing predictions at a higher level, using object masks.” The researchers tested humans on their system, finding that humans had an average error rate of about 8 percent when the scene is visible and 25 percent when the scene contains partial occlusion. Neural network-based systems, by comparison, had errors of 31 percent on visible scenes and 50 percent on partially occluded scenes.
  What computers are up against: “At 2-4 months, infants are able to parse visual inputs in terms of permanent, solid and spatiotemporally continuous objects. At 6 months, they understand the notion of stability, support and causality. Between 8 and 10 months, they grasp the notions of gravity, inertia, and conservation of momentum in collision; between 10 and 12 months, shape constancy, and so on,” the researchers write.
  Why it matters: Tests like this will give us a greater ability to model the abilities of AI systems to perform fundamental acts of reasoning, and as the researchers extend the benchmark with more challenging components we’ll be able to get a better read on what these systems are actually capable of. As new components are added “the prediction task will become more and more difficult and progressively reach the level of scene comprehension achieved by one-year-old humans,” they write.
  Competition: AI researchers can download the dataset and submit their system scores to an online leaderboard at the official IntPhys website here (IntPhys).
  Read more: IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning (Arxiv).

Kindred researchers explain how to make robots repeatable:
…Making the dream of repeatable robot experiments a reality…
Researchers with robot AI startup Kindred have published a paper on a little-discussed subject in AI: repeatable real-world robotics experiments. It’s a worthwhile primer on some of the tweaks people need to make to create robot development environments that are a) repeatable and b) effective.
  Regular robots: The researchers set up a reaching task using a Universal Robotics ‘UR5’ robot arm and describe the architecture for the system. One key difference between simulated and real world environments is the role of time, where in simulation one typically executes all the learning and action updates synchronously, whereas in real robots you need to do stuff asynchronously. “In real-world tasks, time marches on during each agent and environment-related computations. Therefore, the agent always operates on delayed sensorimotor information,” they explain.
  Why it matters: It’s currently very difficult to model progress in real-world robotics due to the diversity of tasks and the lack of trustworthy testing regimes. Papers like this suggest a path forward and I’d hope they encourage researchers to try to structure their experiments to be more repeatable and reliable. If we’re able to do this then we’ll be able to better develop intuitions about the rate of progress in the field which should help for forecasting trends in development – a critical thing to do, given how much robots are expected to influence employment in the regions they are deployed into.
  Read more here: Setting up a Reinforcement Learning Task with a Real-World Robot (Arxiv).

Salesforce researchers demonstrate the value of simplicity for language modelling:
…Well-tuned LSTM or QRNN-based systems shown to beat more complex systems…
Researchers with Salesforce have shown that well-tuned basic AI components can attain superior performance on tough language tasks than more sophisticated and in many cases more modern systems. Their research shows that RNN-based systems that model language using well-tuned, simple components like LSTMs or the Salesforce-inventred QRNN beat more complex models like recurrent highway networks, hyper networks, or systems found by neural architecture search. This result highlights that much of the recent progress in AI may to some extent be illusory: jumps in performance on certain datasets that have previously been assumed to be possible due to fundamentally new capabilities in new models are now being shown to be within reach of simpler components that are tuned and tested comprehensively.
  Results: The researchers test their QRNN and LSTM-based systems against the Penn Treebank and enwik8 character-level datasets and the word-level WikiText-103 dataset, beating state-of-the-art  scores on Penn Treebank and enwik8 when measured by bits-per-character, and significantly outperforming SOTA on perplexity on WikiText-103.
  Why it matters: This paper follows prior work showing that many of our existing AI components are more powerful than researchers suspected, and follows research that has shown that fairly old systems like GANs or DCGANs can adeptly model data distributions more effectively than sophisticated successor systems. That’s not to say this should be taken as a sign that the subsequent inventions are pointless, but it should cause researchers to devote more time to interrogating and tuning existing systems rather than trying to invent different proverbial wheels. “Fast and well tuned baselines are an important part of our research community. Without such baselines, we lose our ability to accurately measure our progress over time. By extending an existing state-of-the-art word level language model based on LSTMs and QRNNs, we show that a well tuned baseline can achieve state-of-the-art results on both character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets without relying on complex or specialized architectures,” they write.
  Read more: An Analysis of Neural Language Modeling at Multiple Scales (Arxiv).

Want to test how well your AI understands language and images? Try VQA 2.0
…New challenge arrives to test AI systems’ abilities to model language and images…
AI researchers that think they’ve developed models that can learn to model the relationship between language and images may want to submit to the third iteration of the Visual Question Answering Challenge. The challenge prompts models to answer questions about the contents of images. Challengers will use the v2.0 version of the VQA dataset, which includes more written questions and ground truth answers about images.
  Read more: VQA Challenge 2018 launched! (

Tech Tales:

Miscellaneous Letters Sent To The Info@ Address Of An AI Company

2023: I saw what you did with that robot so I know the truth. You can’t hide from me anymore I know exactly what you are. My family had a robot in it and the state took them away and told us they were being sent to prison but I know the truth they were going to take them apart and sell their body back to the aliens in exchange for the anti-climate change device. What you are doing with that robot tells me you are going to take it apart when it is done and sell it to the aliens as well. You CANNOT DO THIS. The robot is precious you need to preserve it or else I will be VERY ANGRY. You must listen to me we-

2025: So you think you’re special because you can get them to talk to each other in space now and learn things together well sure I can do that as well I regularly listen to satellites so I can tell you about FLUORIDE and about X74-B and about the SECRET UN MOONBASE and everything else but you don’t see me getting famous for these things in fact it is a burden it is a pain for me I have these headaches. Does your AI get sick as well?-

2027: Anything that speaks like a human but isn’t a human is a sin. You are sinners! You are pretending to be God. God will punish you. You cannot make the false humans. You cannot do this. I have been calling the police every day for a week about this ever since I saw your EVIL creation on FOX-25 and they say they are taking notes. They are onto you. I am going to find you. They are going to find you. I am calling the fire department to tell them about you. I am calling the military to tell them about you. I am calling the-

2030: My mother is in the hospital with a plate in her head I saw on the television you have an AI that can do psychology on other AIs can your AI help my mother? She has a plate in her head and needs some help and the doctors say they can’t do anything for her but they are liars. You can help her. Please can you make your AI look at her and diagnose what is wrong with her. She says the plate makes her have nightmares but I studied many religions for many years and believe she can be healed if she thinks about it more and if someone or something helps her think.

2031: Please you have to keep going I cannot be alone any more-

Things that inspired this story: Comments from strangers about AI, online conspiracy forums, bad subreddits, “Turing Tests”, skewed media portrayals of AI, the fact capitalism creates customers for false information which leads to media ecosystems that traffic in fictions painted as facts.

Import AI: #86: Baidu releases a massive self-driving car dataset; DeepMind boosts AI capabilities via neural teachers; and what happens when AIs evolve to do dangerous, subversive things.

Boosting AI capabilities with neural teachers:
…AKA, why my small student with multiple expert teachers beats your larger more well-resourced teacherless-student…
Research from DeepMind shows how to boost the performance of a given agent on a task by transferring knowledge from a pre-trained ‘teacher’ agent. The technique yields a significant speedup in training AI agents, and there’s some evidence that agents that are taught attain higher performance than non-taught ones. The technique comes in two flavors: single teacher and multi-teacher; agents pretrained via multiple specialized teachers do better than ones trained by a single entity, as expected.
  Strange and subtle: The approach has a few traits that seem helpful for the development of more sophisticated AI agents: in one task DeepMind tests it on the agent needs to figure out how to use a short-term memory to be able to attain a high score. ‘Small’ agents (which only have two convolutional layers) typically fail to learn to use a memory and therefore cannot achieve scores above a certain threshold, but by training a ‘small’ agent with multiple specialized teachers the researchers create one that can succeed at the task. “This is perhaps surprising because the kickstarting mechanism only guides the student agent in which action to take: it puts no constraint on how the student structures its internal memory state. However, the student can only predict the teacher’s behaviour by remembering information from before the respawn, which seems to be enough supervisory signal to drive short-term memory formation. We find this a wonderful parallel with how the best human educators teach: not telling the student what to think, but simply putting the student in a fruitful position to learn for themselves,” the researchers write.
  Why it matters: Trends like this suggest that scientists can speed their own research by using such pre-trained techniques to better evaluate new agents. This adds further credence to the notion that a key input to (some types of) AI research will shift to being compute from pre-labelled static datasets. Though it should be noted that data here is implicit in the form of a procedural, modifiable simulator that researchers can access). More speculatively, this means it may be possible to use mixtures of teachers to train complex agents that far exceed in capabilities any of their forebears – perhaps an area where the sum really will be greater than its parts.
Read more: Kickstarting Deep Reinforcement Learning (Arxiv).

100,000+ developer survey shows AI concerns:
…What developers think is dangerous and exciting, and who they think is responsible…
Developer community StackOverflow has published the results of its annual survey of its community; this year it asked about AI:
– What developers think is “dangerous” re AI: Increasing automation of jobs (40.8%)
– What developers think is “exciting” re AI: AI surpassing human intelligence, aka the singularity (28%)
– Who is responsible for considering the ramifications of AI:
   – The developers or the people creating the AI: 47.8%
   – A governmental or other regulatory body: 27.9%
– Different roles = different concerns: People that identified as technical specialists tended to say they were more concerned about issues of fairness than the singularity, whereas designers and mobile developers tended to be more concerned about the singularity.
  Read more: Developer Survey Results 2018 (StackOverFlow).

Baidu and Toyota and Berkeley researchers organize self-driving car challenge backed by new self-driving car dataset from Baidu:
…”ApolloScape” adds Chinese data for self-driving car researchers, plus Baidu says it has joined Berkeley’s “DeepDrive” self-driving car AI coalition…
A new competition and dataset may give researchers a better way to measure the capabilities and progression of autonomous car AI.
  The dataset: The ‘ApolloScape’ dataset from Baidu contains ~200,000 RGB images with corresponding pixel-by-pixel semantic annotation. Each frame is labeled from a set of 25 semantic classes that include: cars, motorcycles, sidewalks, traffic cones, trash cans, vegetation, and so on. Each of the images has a resolution of 3384 x 2710, and each frame is separated by a meter of distance. 80,000 images have been released as of March 8 2018.
Read more about the dataset (potentially via Google Translate) here.
  Additional information: Many of the researchers linked to ApolloScape will be talking at a session on autonomous cars at the IEEE Intelligent Vehicles Symposium in China.
Competition: The new ‘WAD’ competition will give people a chance to test and develop AI systems on the ApolloScape dataset as well as a dataset from Berkeley DeepDrive (the DeepDrive dataset consists of 100,000 video clips, each about 40 seconds long, with one key frame from each clip annotated). There is about $10,000 in cash prizes available, and the researchers are soliciting papers on research techniques in: drivable area segmentation (being able to figure out which bits of a scene correspond to which label and which of these areas are safe); road object detection (figuring out what is on the road); and transfer learning from one semantic domain to another, specifically going from training on the Berkeley dataset (filmed in California, USA) to the ApolloScape dataset (filmed in Beijing, China).
   Read more about the ‘WAD’ competition here.

Microsoft releases a ‘Rosetta Stone’ for deep learning frameworks:
…GitHub repo gives you a couple of basic operations displayed in many different ways…
Microsoft has released a GitHub repository containing similar algorithms implemented in a variety of frameworks, including: Caffe2, Chainer, CNTK, Gluon, Keras (with backends CNTK/TensorFlow/Theano), Tensorflow, Lasagna, MXNet, PyTorch, and Julia – Knet. The idea here is that if you read one algorithm in one of these frameworks you’ll be able to use that knowledge to understand the other frameworks.
  “The repo we are releasing as a full version 1.0 today is like a Rosetta Stone for deep learning frameworks, showing the model building process end to end in the different frameworks,” write the researchers in a blog post that also provides some rough benchmarking for training time for a CNN and an RNN.
  Read more: Comparing Deep Learning Frameworks: A Rosetta Stone Approach (Microsoft Tech Net).
View the code examples (GitHub).

Evolution’s weird, wonderful, and potentially dangerous implications for AI agent design:
…And why the AI safety community may be able to learn from evolution…
A consortium of international researchers have published some of the weird, infuriating, and frequently funny ways in which evolutionary algorithms have figured out non-obvious solutions and hacks to tasks they’re asked to solve. The paper includes an illuminating set of examples of ways in which algorithms have subverted the wishes of their human overseers, including:
– Opportunistic Somersaulting: When trying to evolve creatures to jump, some agents discovered that they could instead evolve very tall bodies and then somersault, gaining a reward in proportion to their feet gaining distance from the floor.
– Pointless Programs: When researchers tried to evolve code with GenProg to solve a buggy data sorting program, GenProg evolved a solution that had the buggy program return an empty list, which wasn’t scored negatively as an empty list can’t be out of order as it contains nothing to order.
– Physics Hacking: One robot figured out the correct vibrational frequency to surface a friction bug in the floor of an environment in a physics simulator, letting it propel itself across the ground via the bug.
– Evolution finds a way: Another type of bug is the ways that evolution can succeed even when researchers think such success is impossible, like a six-legged robot that figured out how to walk fast without its feet touching the ground (solution: it flipped itself on its back and used the movement of its legs to propel itself nonetheless).
– And so much more!
The researchers think evolution may also illuminate some of the more troubling problems in AI safety. “The ubiquity of surprising and creative outcomes in digital evolution has other cross-cutting implications. For example, the many examples of “selection gone wild” in this article connect to the nascent field of artificial intelligence safety,” the researchers write. “These anecdotes thus serve as evidence that evolution—whether biological or computational—is inherently creative, and should routinely be expected to surprise, delight, and even outwit us.” (emphasis mine).
  Read more: The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities (Arxiv).

Allen AI puts today’s algorithms to shame with new common sense question answering dataset:
…Common sense questions designed to challenge and frustrate today’s best-in-class algorithms…
Following the announcement of $125 million in funding and a commitment to conducting AI research that pushes the limits of what sorts of ‘common sense’ intelligence machines can manifest, the Allen Institute for Artificial Intelligence has released a new ‘ARC’ challenge and dataset researchers can use to develop smarter algorithms.
  The dataset: The main ARC test contains 7787 natural science questions, split across an easy set and a hard set. The hard set of questions are ones which are answered incorrectly by retrieval-based and word co-occurrence algorithms. In addition, AI2 is releasing the ‘ARC Corpus’, a collection of 14 million science-related sentences with knowledge relevant to ARC, to support the development of ARC-solving algorithms. This corpus contains knowledge relevant to 95% of the Challenge questions, AI2 writes.
Neural net baselines: AI2 is also releasing three baseline models which have been tested on the challenge, achieving some success on the ‘easy’ set and failing to be better than random chance on the ‘hard’ set. These include a decomposable attention model (DecompAttn), Bidirectional Attention Flow (BiDAF), and a decomposed graph entailment model (DGEM). Questions in ARC are designed to test everything from definitional to spatial to algebraic knowledge, encouraging the usage of systems that can abstract and generalize concepts derived from large corpuses of data.
Baseline results: ARC is extremely challenging: AI2 benchmarked its prototype neural net approaches (along with others) discovered that scores top out at 60% on the ‘easy’ set of questions and 27% percent on the more challenging questions.
Sample question:Which property of a mineral can be determined just by looking at it? (A) luster [correct] (B) mass (C) weight (D) hardness“.
SQuAD successor: ARC may be a viable successor to the Stanford Question Answering Dataset (SQuAD) and challenge; the SQuAD competition has recently hit some milestones, with companies ranging from Microsoft to Alibaba to iFlyTek all developing SQuAD solvers that attain scores close to human performance (which is about 82% for ExactMatch and 91% for F1). A close evaluation of SQuAD topic areas gives us some intuition as to why scores are so much higher on this test than on ARC – simply put, SQuAD is easier; it pairs chunks of information-rich text with basic questions like “where do most teachers get their credentials from?” that can be retrieved from the text without requiring much abstraction.
Why it matters: “We find that none of the baseline systems tested can significantly outperform a random baseline on the Challenge set, including two neural models with high performances on SNLI and SQuAD,” the researchers write. The big question now is where this dataset falls on the Goldilocks spectrum — is it too easy (see: Facebook’s early memory networks tests) or too hard or just right? If a system were to get, say, 75% or so on ARC’s more challenging questions, it would seem to be a significant step forward in question understanding and knowledge representation
  Read more: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge (Arxiv).
SQuAD scores available at the SQuAD website.
  Read more: SQuAD: 100,000+ Questions for Machine Comprehension of Text (Arxiv).

Tech Tales:

The Ten Thousand Floating Heads

The Ten-K, also known as The Heads, also sometimes known as The Ten Heads, officially known as The Ten Thousand Floating Heads, is a large-scale participatory AI sculpture that was installed in the Natural History Museum in London, UK, in 2025.

The way it works is like this: when you walk into the museum and breathe in that musty air and look up the near-endless walls towards the ceiling your face is photographed in high definition by a multitude of cameras. These multi-modal pictures of you are sent to a server which adds them to the next training set that the AI uses. Then, in the middle of the night, a new model is trained that integrates the new faces. Then the AI system gets to choose another latent variable to filter by (this used to be a simple random number generator but, as with all things AI, has slowly evolved into an end-to-end ‘learned randomness’ system with some auxiliary loss functions to aid with exploration of unconventional variables, and so on) and then it looks over all the faces in the museum’s archives, studies them in one giant embedding, and pulls out the ten thousand that fit whatever variable it’s optimizing for today.

These ten thousand faces are displayed, portrait-style, on ten thousand tablets scattered through the museum. As you go around the building you do all the usual things, like staring at the dinosaur bones, or trudging through the typically depressing and seemingly ever-expanding climate change exhibition, but you also peer into these tablets and study the faces that are being shown. Why these ten thousand?, you’re meant to think. What is it optimizing for? You write your guess on a slip of paper or an email or a text and send it to the museum and at the end of the day the winners get their names displayed online and on a small plaque which is etched with near-micron accuracy (so as to avoid exhausting space) and is installed in a basement in the museum and viewable remotely – machines included – via a live webcam.

The correct answers for the variable it optimizes for are themselves open to interpretation, as isolating them and describing what they mean has become increasingly difficult as the model gets larger and incorporates more faces. It used to be easy: gender, hair color, eye color, race, facial hair, and so on. But these days it’s very subtle. Some of the recent names given to the variables include: underslept but well hydrated, regretful about a recent conversation, afraid of museums, and so on. One day it even put up a bunch of people and no one could figure out the variable and then six months later some PHD student did a check and discovered half the people displayed that day had subsequently died of one specific type of cancer.

Recently The Heads got a new name: The Oracle. This has caused some particular concern within certain specific parts of government that focus on what they euphemistically refer to as ‘long-term predictors’. The situation is being monitored.

Things that inspired this story: t-SNE embeddings, GANs, auxiliary loss functions, really deep networks, really big models, facial recognition, religion, cults.

Import AI: #85: Keeping it simple with temporal convolutional networks instead of RNNs, learning to prefetch with neural nets, and India’s automation challenge.

Administrative note: a somewhat abbreviated issue this week as I’ve been traveling quite a lot and have chosen sleep above reading papers (gasp!).

It’s simpler than you think: researchers show convolutional networks frequently beat recurrent ones:
The rise and rise of simplistic techniques continues…
Researchers with Carnegie Mellon University  and Intel Labs have rigorously tested the capabilities of convolutional neural networks (via a ‘temporal convolutional network’ (TCN) architecture, inspired by Wavenet and other recent innovations) against sequence modeling architectures like Recurrent Nets (via LSTMs and GRUs). The advantages of TCNs for sequence modeling are as follows: easily parallelizable rather than relying on sequential processing; a flexible receptive field size; stable gradients; low memory requirements for training; and variable length inputs. Disadvantages include: a greater data storage need than RNNs; parameters need to be fiddled with when shifting into different data domains.
  Testing: The researchers test out TCNs against RNNS, GRUs, and LSTMs on a variety of sequence modeling tasks, ranging from MNIST, to adding and copy tasks, to word-level and character-level perplexity on language tasks. In nine out of eleven cases the TCN comes out far ahead of other techniques, in one of the eleven cases it roughly matches GRU performance, and in another case it is noticeably worse then an LSTM (though still comes in second).
  What happens now: “The preeminence enjoyed by recurrent networks in sequence modeling may be largely a vestige of history. Until recently, before the introduction of architectural elements such as dilated convolutions and residual connections, convolutional architectures were indeed weaker. Our results indicate that with these elements, a simple convolutional architecture is more effective across diverse sequence modeling tasks than recurrent architectures such as LSTMs. Due to the comparable clarity and simplicity of TCNs, we conclude that convolutional networks should be regarded as a natural starting point and a powerful toolkit for sequence modeling,” write the researchers.
  Why it matters: One of the most confusing things about machine learning is that it’s a defiantly empirical science, with new techniques appearing and proliferating in response to measured performance on given tasks. What studies like this indicate is that many of these new architectures could be overly complex relative to their utility and it’s likely that, with just a few tweaks, the basic building blocks still reign supreme; we’ve seen a similar phenomenon with basic LSTMs and GANs doing better than many other more-recent innovations, given thorough analysis. In one sense this seems good as it seems intuitive that simpler architectures tend to be more flexible and general, and in another sense it’s unnerving, as it suggests much of the complexity that abounds in AI is an artifact of empirical science rather than theoretically justified.
  Read more: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (Arxiv).
  Code for the TCN used in the experiments here (GitHub).

Automation & economies: it’s complicated:
…Where AI technology comes from, why automation could be challenging for India, and more…
In a podcast, three employees of the McKinsey Global Institute discuss how automation will impact China, Europe, and India. Some of the particularly interesting points include:
– China has an incentive to automate its own industries to improve labor productivity, as its labor pool has peaked and is now in similar demographic-based decline as other developed economies.
– The supply of AI technology seems to come from the United States and China, with Europe lagging.
– “A large effect is actually job reorganization. Companies adopting this technology will have to reorganize the type of jobs they offer. How easy would it be to do that? Companies are going to have to reorganize the way they work to make sure they get the juice out of this technology.”
– India may struggle as it transitions tens to hundreds of millions of people out of agriculture jobs. “We have to make this transition in an era where creating jobs out of manufacturing is going to be more challenging, simply because of automation playing a bigger role in several types of manufacturing.”
Read more: How will automation affect economies around the world? (McKinsey Global Institute).

DeepChem 2.0 bubbles out of the lab:
…Open source scientific computing platform gets its second major release…
DeepChem’s authors have released version 2.0 of the scientific computing library, bringing with it improvements to the TensorGraph API, tools for molecular analysis, new models, tutorial tweaks and adds, and a whole host of general improvements. DeepChem “aims to provide a high quality open-source toolchain that democratizes the use of deep-learning in drug discovery, materials science, quantum chemistry, and biology.”
  Read more: DeepChem 2.0 release notes.
  Read more: DeepChem website.

Google researchers tackle prefetching with neural networks:
…First databases, now memory…
One of the weirder potential futures of AI is one where the fundamental aspects of computing, like implementing systems that search over database indexes or prefetch data to boost performance, are mostly learned rather than pre-programmed. That’s the idea in a new paper from researchers at Google, which tries to use machine learning techniques to solve prefetching, which is “the process of predicting future memory accesses that will miss in the on-chip cache and access memory based on past history”. Prefetching is a somewhat fundamental problem, as the better one becomes at prefetching, the higher the chance of being able to better intuit which data to load-in to memory before it is called upon, which increases the performance of your system.
  How it works: Can prefetching be learned? “Prefetching is fundamentally a regression problem. The output space, however, is both vast and extremely sparse, making it a poor fit for standard regression models,” the Google researchers write. Instead, they turn to using LSTMs and find that two variants are able to demonstrate competitive prefetching performance when compared to handwritten systems. “The first version is analogous to a standard language model, while the second exploits the structure of the memory access space in order to reduce the vocabulary size and reduce the model memory footprint,” the researchers write. They test out their approach on data from Google’s web search workload and demonstrate competitive performance.
  “The models described in this paper demonstrate significantly higher precision and recall than table-based approaches. This study also motivates a rich set of questions that this initial exploration does not solve, and we leave these for future research,” they write. This research is philosophically similar to work from Google last autumn in using neural networks to learn database index structures (covered in #73), which also found that you could learn indexes that had competitive to superior performance to hand-tuned systems.
  One weird thing: When developing one of their LSTMs the researchers created a t-SNE embedding of the program counters ingested by the system and discovered that the learned features contained quite a lot of information. “The t-SNE results also indicate that an interesting view of memory access traces is that they are a reflection of program behavior. A trace representation is necessarily different from e.g., input-output pairs of functions, as in particular, traces are a representation of an entire, complex, human-written program,” they write.
  Read more: Learning Memory Access Patterns (Arxiv).

Learning to play video games in minutes instead of days:
…Great things happen when AI and distributed systems come together…
Researchers with the University of California at Berkeley have come up with a way to further optimize large-scale training of AI algorithms by squeezing as much efficiency as possible out of underlying compute infrastructure. Their new technique makes it possible for them to train reinforcement learning agents to master Atari games in under ten minutes on an NVIDIA DGX-1 (which contains 40 CPUs and 8 P100 GPUS). Though the sample efficiency of these algorithms is still massively sub-human (requiring millions of frames to approximate the performance of humans trained on thousands to tens of thousands of frames) it’s interesting that we’re now able to develop algorithms that approximate flesh-and-blood performance in roughly similar wall clock time.
  Results: The researchers show that given various distributed systems tweak its possible for algorithms like A2C, A3C, PPO, and APPO to attain good performance on various games in a few minutes.
  Why it matters: Computers are currently functioning like telescopes for certain AI researchers – the bigger your telescope, the farther you can see into the limit of scaling properties of various AI algorithms. We still don’t fully understand the limits here, but research like this indicates that as new compute substrates come alone it may be able to scale RL algorithms to achieve very impressive feats in relatively little time. But there are more unknowns than knowns right now – what an exciting time to be alive! “We have not conclusively identified the limiting factor to scaling, nor if it is the same in every game and algorithm. Although we have seen optimization effects in large-batch learning, we do not know their full nature, and other factors remain possible. Limits to asynchronous scaling remain unexplored; we did not definitively determine the best configurations of these algorithms, but only presented some successful versions,” they write.
  Read more: Accelerated Methods for Deep Reinforcement Learning (Arxiv).

OpenAI Bits&Pieces:

OpenAI Scholars: Funding for underrepresented groups to study AI:
OpenAI is providing 6-10 stipends and mentorship to individuals from underrepresented groups to study deep learning full-time for 3 months and open-source a project. You’ll need US employment authorization and will be provided with a stipdend of $7.5k per month while doing the program, as well as $25,000 AWS credits.
  Read more: OpenAI Scholars (OpenAI blog).

Tech Tales:

John Henry 2.0

No one places any bets on it internally asides from the theoretical physicists who, by virtue of their field of study, had a natural appreciation for very long odds. Everyone else just assumed the machines would win. And they were right, though I’m not sure in the way they were expecting.

It started like this: one new data center was partitioned into two distinct zones. In one of the zones we applied the best, most interpretable, most rule-based systems we could to every aspect of the operation, ranging from the design of the servers, to the layout of motherboards, to the software used to navigate the compute landscape, and so on. The team tasked with this data center had an essentially limitless budget for infrastructure and headcount. In the other zone we tried to learn everything we could from scratch, so we assigned AI systems to figure out: the types of computers to deploy in the data center, where to place these computers to minimize latency, how to aggressively power these devices up or down in accordance with observed access patterns, how to learn to effectively index and store this information, knowing when to fetch data into memory, figuring out how to proactively spin-up new clusters in anticipation of jobs that had not happened yet but were about to happen, and so on.

You can figure out what happened: for a time, the human-run facility was better and more stable, and then one day the learned data center was at parity with it in some areas, then at parity in most areas, then very quickly started to exceed its key metrics ranging from uptime to power consumption to mean-time-between-failure for its electronic innards. The human-heavy team worked themselves ragged trying to keep up and many wonderful handwritten systems were created that further pushed the limit of what we knew theoretically and could embody in code.

But the learned system kept going, uninhibited by the need for a theoretical justification for its own innovations, instead endlessly learning to exploit strange relationships that were non-obvious to us humans. But transferring insights gleaned from this system into the old rule-based one was difficult, and tracking down why something had seen such a performance increase in the learned regime was an art in itself: what tweak made this new operation so successful? What set of orchestrated decisions had eked out this particular practise?

So now we build things with two different tags on them: need-to-know (NTK) and hell-if-I-know (HIIK). NTK tends to be stuff that has some kind of regulation applied to it and we’re required to be able to explain, analyze, or elucidate for other people. HIIK is the weirder stuff that is dealing in systems that don’t handle regulated data – or, typically, any human data at all – or are parts of our scientific computing infrastructure, where all we care about is performance.

In this way the world of computing has split in two, with some researchers working on extending our theoretical understanding to further boost the performance of the rule-based system, and an increasingly large quantity of other researchers putting theory aside and spending their time feeding what they have taken to calling the ‘HIIKBEAST’).

Things that inspired this story: Learning indexes, learning device placement, learning prefetching, John Henry, empiricism.

Import AI: #84: xView dataset means the planet is about to learn how to see itself, a $125 million investment in common sense AI, and SenseTime shows off TrumpObama AI face swap

Chinese AI startup SenseTime joins MIT’s ‘Intelligence Quest’ initiative:
…Funding plus politics in one neat package…
Chinese AI giant SenseTime is joining the ‘MIT Intelligence Quest’, a pan-MIT AI research and development initiative. The Chinese company specializes in facial recognition and self-driving cars and has signed strategic partnerships with large companies like Honda, Qualcomm, and others. At an AI event at MIT recently SenseTime’s founder Xiao’ou Tang gave a short speech with a couple of eyebrow-raising demonstrations to discuss the partnership. “I think together we will definitely go beyond just deep learning we will go to the uncharted territory of deep thinking,” Tang said.
  Data growth: Tang said SenseTime is developing better facial recognition algorithms using larger amounts of data, saying the company in 2016 improved its facial recognition accuracy to “one over a million” using 60 million photos, then in 2017 improved that to “one over a hundred million” via a dataset of two billion photos. (That’s not a typo.)
  Fake Presidents: He also gave a brief demonstration of a SenseTime synthetic video project which generatively morphed footage of President Obama speaking into President Trump speaking, and vice versa. I recorded a quick video of this demonstration which you can view on Twitter here (Video).
Read more: MIT and SenseTime announce effort to advance artificial intelligence research (MIT).

Chinese state media calls for collaboration on AI development:
…Xinhua commentary says China’s rise in AI ‘is a boon instead of a threat’…
A comment piece in Chinese state media Xinhua tries to debunk some of the cold war lingo surrounding China’s rise in AI, pushing back on accusations that Chinese AI is “copycat” and calling for more cooperation and less competition. Liu Qingfeng, iFlyTek’s CEO, told Xinhua at CES that massive data sets, algorithms and professionals are a must-have combination for AI, which “requires global cooperation” and “no company can play hegemony”, Xinhua wrote.
Read more: Commentary: AI development needs global cooperation, not China-phobia (Xinhua).

New xView dataset represents a new era of geopolitics as countries seek to automate the analysis of the world:
…US defense researchers release dataset and associated competition to push the envelop on satellite imagery analysis…
Researchers with the DoD’s Defense Innovation Unit Experimental (DIUx), DigitalGlobe, and the National Geospatial-Intelligence Agency, have released xView, a dataset and associated competition used to assess the ability for AI methods to classify overhead satellite imagery. xView includes one million distinct objects across 60 classes, spread across 1,400km2 of satellite imagery with a maximum ground sample resolution of 0.3m. The dataset is designed to test various frontiers of image recognition, including: learning efficiency, fine-grained class detection, and multiscale recognition, among others. The competition includes $100,000 of prize money, along with compute credits.
Why it matters: The earth is beginning to look at itself. As launch capabilities get cheaper via new rockets like SpaceX, Rocket Labs, etc, better hardware comes online as a consequent of further improvements in electronics, and more startups stick satellites into orbit, the amount of data available about the earth is going to grow by several orders of magnitude. If we can figure out how to analyze these datasets using AI techniques we can ultimately better respond to the changes in our planet and to marshal resources for the purposes of remediating natural disasters and, more generally, to better equip large losticis organizations like militaries to better understand the world around them and plan and act accordingly. A new era of high-information geopolitics is approaching…
  I spy with my satelite eye: xView includes numerous objects with parent classes and sub-classes, such as ‘maritime vessels’ with sub-classes including sailboat and oil tanker. Other classes include fixed wing aircraft, passenger vehicles, trucks, engineering vehicles, railway vehicles, and buildings. “xView contributes a large, multi-class, multi-location dataset in the object detection and satellite imagery space, built with the benchmark capabilities of PASCAL VOC, the quality control methodologies of COCO, and the contributions of other overhead datasets in mind,” they write. Some of the most frequently covered objects in the dataset include buildings and small cars, while some of the rarest include vehicles like a reach stacker and a tractor, and vessels like an oil tanker.
  Baseline results: The researchers created a classification baseline via implementing a Single Shot Multibox Detector meta-architecture (SSD) and testing it on three variants of the dataset: standard xView, multi-resolution, and multi-resolution augmented via image augmentation. The best results were found from training on the multi-resolution dataset, with accuracies climbing to as high as over 67% for cargo planes. The scores are mostly pretty underwhelming, so it’ll be interesting to see what scores people get when they apply more sophisticated deep learning-based methods to the problem.
  Milspec data precision: “We achieved consistency by having all annotation performed at a single facility, following detailed guidelines, with output subject to multiple quality control checks. Workers extensively annotated image chips with bounding boxes using an open source tool,” write the authors. Other AI researchers may want to aspire to equally high standards, if they can afford it.
  Read more: xView: Objects in Context in Overhead Imagery (Arxiv).
  Get the dataset: xView website.

Adobe researchers try to give robots a better sense of navigation with ‘AdobeIndoorNav’ dataset:
…Plus: automating data collection with Tango phones + commodity robots…
Adobe researchers have released AdobeIndoorNav, a dataset intended to help robots navigate the real-world. The contains 3,544 distinct locations across 24 individual ‘scenes’ that a virtual robot can learn to navigate. Each scene corresponds to a real-world location and contains a 3D reconstruction via a point cloud, a 360-degree panoramic view, and front/back/left/right views from the perspective of a small ground-based robot. Combined, the dataset gives AI researchers a set of environments to develop robot navigation systems in. “The proposed setting is an intentionally simplified version of real-world robot visual navigation with neither moving obstacles nor continuous actuation,” the researchers write.
  Why it matters: For real-world robotic AI systems to be more useful they’ll have to be capable of being dropped into novel locations and figuring out how to navigate themselves around to specific targets. This research shows that we’re still a long, long way away from theoretical breakthroughs that give us this capability, but does include some encouraging signs for our ability to automate the necessary data gathering process to create the datasets needed to develop baselines to evaluate new algorithms on.
  Data acquisition: The researchers used a Lenovo Phab 2 Tango phone to scan each scene by hand to create a 3D point cloud, which they then automatically decomposed into a map of specific obstacles as well as a 3D map. A ‘Yujin Turtlebot 2‘ robot then uses these maps along with its onboard laser scanner, RGB-D camera, and 360 camera to navigate around the scene and take a series of high resolution 360 photos, which it then stitches into a coherent scene.
  Results: The researchers prove out the dataset by creating a baseline agent capable of navigating the scene. Their A3C agent with an LSTM network learns to successfully navigate from one location in any individual scene to another location, frequently figuring out routes that involve only a couple more steps than the theoretical minimum. The researchers also show a couple of potential extensions of this technique to further improve performance, like augmentations to increase the amount of spatial information which the robot incorporates into its judgements.
Read more: The AdobeIndoorNav Dataset: Towards Deeo Reinforcement Learning based Real-world Indoor Robot Visual Navigation (Arxiv).

Allen Institute for AI gets $125 million to pursue common sense AI:
Could an open, modern, ML-infused Cyc actually work? That’s the bet…
Symbolic AI approaches have a pretty bad rap – they were all the rage in the 80s and 90s but after lots of money invested and few major successes have since been eclipsed by deep learning-based AI approaches. The main project of note in this area is Doug Lenat’s Cyc which has, much like fusion power, been just a few years away from a major breakthrough for… three decades. But that doesn’t mean symbolic approaches are worthless, they might just be a bit underexplored and in need of revitalization – many people tell me that symbolic systems are being used all the time today but they’re frequently proprietary or secret (aka, military) in nature. But, still, evidence is scant. So it’s interesting that Paul Allen (formerly co-founder of Microsoft) is investing $125 million over three years into his Allen Institute for Artificial Intelligence to launch Project Alexandria, an initiative that seeks to create a knowledge base that fuses machine reading and language and vision projects with human-annotated ‘common sense’ statements.
  Benchmarks: “This is a very ambitious long-term research project. In fact, what we’re starting with is just building a benchmark so we can assess progress on this front empirically,” said AI2’s CEO Oren Etzioni in an interview with GeekWire. “To go to systems that are less brittle and more robust, but also just broader, we do need this background knowledge, this common-sense knowledge.”
  Read more: Allen Institute for Artificial Intelligence to Pursue Common Sense for AI (Paul Allen.)
  Read more: Project Alexandria (AI2).
Read more:
Oren Etzioni interview (GeekWire).

Russian researchers use deep learning to diagnose fire damage from satellite imagery:
…Simple technique highlights generality of AI tools and implications of having more readily available satellite imagery for disaster response…
Researchers with the Skolkovo institute of Science and technology in Moscow have published details on how they applied machine learning techniques to automate the analysis of satellite images of the Californian wildfires of 2017. The researchers use DigitalGlobe satellite imagery of Ventura and Santa Rosa countries before and after the fires swept through to create a dataset of pictures containing around 1,000 buildings (760 non-damaged ones and 320 burned ones), then used a pre-trained ImageNet network (with subsequent finetuning) to learn to classify burned versus non-burned buildings with an accuracy of around 80% to 85%.
  Why it matters: Stuff like this is interesting mostly because of hte implicit time savings, where once you have annotated a dataset it is relatively easy to train new models to improve classification in line with new techniques. The other component necessary for techniques like this to be useful will be the availability of more frequently updated satellite imagery, but there are startups working in this space already like Planet Labs and others, so that seems fairly likely.
  Read more: Satellite imagery analysis for operational damage assessment in Emergency situations (Arxiv).

Google researchers figure out weird trick to improve recurrent neural network long-term dependency performance:
…Auxiliary losses + RNNs make for better performance…
Memory is a troublesome thing with neural networks, and figuring out how to give networks a better representative capacity has been a long-standing problem in the field. Now, researchers with Google have proposed a relatively simple tweak to recurrent neural networks that lets them model longer-time dependencies, potentially opening RNNs up to working on problems that require a bigger memory. The technique involves augmenting RNNs with an unsupervised auxiliary loss that either tries to model relationships somewhere through the network, or project forward over a relatively short distance, and in doing so lets the RNN learn to represent finer-grained structures over longer timescales. Now we need to figure out what those problems are and evaluate the systems further.
  Evaluation: Long time-scale problems are still in their chicken and egg phase, where it’s difficult to figure out the appropriate methods we can use to test them. One approach is pixel-by-pixel image prediction, which is where you feed each individual pixel into a long-term system – in this case an RNN augmented by the proposed technique – and see how effectively it can learn to classify the image. The idea here is that if it’s reasonably good at classifying the image then it is able to learn high-level patterns from the pixels which have been fed into it, which suggests that it is remembering something useful. The researchers test their approach on images ranging in pixel length from 784 to 1024 (CIFAR-10) all the way up to around ~16,000 (via the ‘StanfordDogs’ dataset).
Read more: Learning Longer-term Dependencies in RNNs with Auxiliary Losses (Arxiv).

Alibaba applies reinforcement learning to optimizing online advertising:
…Games and robots are cool, but the rarest research papers are the ones that deal with actual systems that make money today…
Chinese e-commerce and AI giant Alibaba has published details on a reinforcement learning technique that, it says, can further optimize adverts in sponsored search real-time bidding auctions. The algorithm, M-RMDP (Massive-agent Reinforcement Learning with robust Markov Decision Process), improves ad performance and lowers the potential price per ad for advertisers, providing an empirical validation that RL could be applied to highly tuned, rule-based heuristic systems like those found in much of online advertising. Notably, Google has published very few papers on this area, suggesting Alibaba may be publishing in this strategic area because a) it believes it is still behind Google and others in this area and b) by publishing it may be able to tempt over researchers who wish to work with it. M-RMDP’s main contribution is being able to model the transitions between different auction states as demand waxes and wanes through the day, the researchers say.
Method and scale: Alibaba says it designed the system to deal with what it calls the “massive-agent problem”, which is figuring out a reinforcement learning method that can handle “thousands or millions of agents”. For the experiments in the paper it deployed its training infrastructure onto 1,000 CPUs and 40 GPUs.
  Results: The company picked 1000 ads from the Alibaba search auction platform and collect two days worth of data for training and testing. It tested the effectiveness of its system by simulating reactions within its test set. Once it had used this offline evaluation to prove out the provisional effectiveness of its approach it carried out an online test and find that their M-RMDP approach substantially improves the return on investment for advertisers in terms of ad effectiveness, while marginally reducing the PPC cost, saving them money.
Why it matters: Finding examples of reinforcement learning being used for practical, money-making tasks is typically difficult; many of the technology’s most memorable or famous results involve mastering various video games or board games or, more recently, controlling robots performing fairly simple tasks. So it’s a nice change to have a paper that involves deep reinforcement learning doing something specific and practical: learning to bid on online auctions.
  Read more: Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Arxiv).

OpenAI Bits & Pieces:

Improving robotics research with new environments, algorithms, and research ideas:
…Fetch models! Shadow Hands! HER Baselines! Oh my!…
We’ve released a set of tools to help people conduct research on robots, including new simulated robot models, a baseline implementation of the Hindsight Experience Replay algorithm, and a set of research ideas for HER.
Read more: Ingredients for Robotics Research (OpenAI blog).

Tech Tales:

Play X Time.

It started with a mobius strip and it never really ended: after many iterations the new edition of the software, ToyMaker V1.0, was installed in the ‘Kidz Garden’ – an upper class private school/playpen for precocious children ages 2 to 4 – on the 4th of June 2022, and it was a hit immediately. Sure, the kids had seen 3D printers before – many of them had them in their homes, usually the product of a mid-life crisis of one of their rich parents; usually a man, usually a finance programmer, usually struggling against the vagaries of their own job and seeking to create something real and verifiable. So the kids weren’t too surprised when ToyMaker began its first print. The point when it became fascinating to them was after the print finished and the teacher snapped off the freshly printed mobius strip and handed it to one of the children who promptly sneezed and rubbed the snot over its surface – at that moment one of the large security cameras mounted on top of the printer turned to watch the child. A couple of the others kids noticed and pointed and hten tugged at the sleeve of the snot kid who looked up at the camera which looked back at them. They held up the mobius strip and the camera followed it, then they pulled it back down towards them and the camera followed that too. They passed the mobius strip to another kid who promptly tried to climb on it, and the camera followed this and then the camera followed the teacher as they picked up the strip and chastised the children. A few minutes later the children were standing in front of the machine dutifully passing the mobius strip between eachother and laughing as the camera followed it from kid to kid to kid.
“What’s it doing?” one of them said.
“I’m not sure,” said the teacher, “I think it’s learning.”
And it was: the camera fed into the sensor system for the ToyMaker software, which treated these inputs as an unsupervised auxiliary loss, which would condition the future objects it printed and how it made them. At night when the kids went home to their nice, protected flats and ate expensive, fiddly food with their parents, the machine would simulate the classroom and different perturbations of objects and different potential reactions of children. It wasn’t alone: ToyMaker 1.0 software was installed on approximately a thousand other printers spread across the country in other expensive daycares and private schools, and so as each day passed they collectively learned to try to make different objects, forever monitoring the reactions of the children, growing more adept at satisfying them via a loss function which was learned, with the aid of numerous auxiliary losses, through interaction.
So the next day in the Kidz Garden the machine printed out a Mobius Strip that now had numerous spindly-yet-strong supports linking its sides together, letting the children climb on it.
The day after that it printed intertwined ones; two low-dimensional slices, linked together but separate, and also climbable.
Next: the strips had little gears embedded in them which the children could run their hands over and play with.
Next: the gears conditioned the proportions of some aspects of the strip, allowing the children to manipulate dimensional properties with the spin of various clever wheels.
And so it went like this and is still going, though as the printing technologies have grown better, and the materials more complex, the angular forms being made by these devices have become sufficiently hard to explain that words do not suffice: you need to be a child, interacting with them with your hands, and learning the art of interplay with a silent maker that watches you with electronic eyes and, sometimes – you think when you are going to sleep – nods its camera head when you snot on the edges, or laugh at a surprising gear.

Technologies that inspired this story: Fleet learning, auxiliary losses, meta-learning, CCTV cameras, curiosity, 3D printing.

Thanks for reading. If you have suggestions, comments or other thoughts you can reach me at or tweet at me@jackclarksf