Import AI: Issue 50: NVIDIA gets some GPU competition, learning from failure to use sparse rewards, and why game environments could be the jet fuel of AI

by Jack Clark

Deep Learning in Africa:
more on Indaba 2017, a gathering of AI researchers and students across Africa. African attendance and participation at AI is very, very low. The goal of Indaba is to change this by bringing together people from 23 African countries as well as other countries across the world to help them work and learn together. A commendable initiative!
…”African machine learning is strong and varied. To support the food security of our nations, computer vision is used to detect cassava root disease in images captured using low-cost mobile phones [1]. Where health services and advice is limited, especially for HIV and AIDS, machine learning is used to shorten response times in mobile question-answering services, allowing these services to reach more people [2]. And the African contribution to Big Science, in particular in radio astronomy through the square kilometer array telescope, will advance the state of machine learning to provide new insights into the workings of the universe [3],” write the organizers.
…Find out more information in the note from the organizers here.

At Microsoft, AI becomes its own special category:
Microsoft is laying off several thousand of its sales staff as the company re-orients itself to focus on selling software in four specific categories: modern workplace, business applications,  apps and infrastructure, and data and AI.

Finally, NVIDIA gets some competition in deep learning hardware!
…AMD has released version 1.0 of MIOpen, it’s rubbishly named library for making machine learning work well on ROCm-supporting (aka AMD) cards. Consider it a competitor to NVIDIA’s ‘CUDNN’
…What it means: It’s getting substantially easier for developers to write AI code that works on AMD cards thank to MIOpen adding support for deep learning primitives like forward and backpropagation, support for pooling algorithms, batch normalization, binary package support for Ubuntu 16.04 and Fedora 24, and much more into the AMD GPU software stack.
...AMD has been in damage-control mode for several years, working on building its main businesses in CPUs along with gaming cards and chips for game consoles. Things are starting to look up for the company, so it’s now devoting resources to peripheral (but I’d argue, strategic) areas like AI software support for its GPUs. If AMD continues to invest in this technology then it could take some share and probably lead to better price competition in the market – a boon for researchers and cloud providers around the world.
…Bonus factoid: There are some indications, based on comments on reddit/r/machinelearning, that the people behind Facebook‘s language PyTorch are already working to port the framework torun on AMD cards.
Find out more about the software by visiting its GitHub repository.

Small, cute, and out to BEHEAD PLANTS:
…The inventor of the iconic robot vacuum cleaner the Rhoomba is back with a new invention – a weed-killing robot called Tertill.
…The cute, green hockey-puck shaped device will patrol a garden, exploring the ground beneath it and rapidly spinning a small nylon strong to behead plants that pass beneath it.
…One of the most illustrative things about this little bot is just how little artificial intelligence actually goes into it. It has no eyes, no major communication ability, no terrifically smart planning and mapping, or anything else – in the real world, you want to keep autonomy very tightly controlled to avoid nasty surprises for the human buyers of the bots.
Find out more and check out a video of the machine in action here. They’re funding manufacturing and further development via Kickstarter, and have already spent two years in development, going through over 6 designs in the process – watch as the bot morphs from a rectangular tank into a cheerful green beveled puck.

Improvising SLAM functions with deep neural networks:
…Research from the University of Freiburg, University of Hong Kong, and HKUST, propose neural SLAM, a system to encourage AI agents to develop a map of their environment as they explore it.
…The research proposes a way to get around the short-term memory capabilities of traditional AI components like RNNs and LSTMs, by running all inputs to the RL agent into an LSTM, then writing from there into an external memory. The agent then uses this memory to store a record of where it has been, update its current location, and to aid it in planning complex multi-step tasks, such as exploring a big complex maze.
…Results: The researchers compare their systems against two baselines, an A3C agent with access to 128 LSTM units, and another Neural SLAM agent with access to a 2D external memory. The full Neural Slam agent attains reliably higher reward than the others and displays better performance characteristics on the largest, most complex environments.
…Environments used: a 2D gridworld and a 3D environment made in Gazebo.
You can read more in the research paper here.
…This sits alongside other work focused on embedding a memory into a network specifically to be used for mapping and movement and will sit alongside other recent papers like ‘Neural Map: Structured Memory for Deep Reinforcement Learning‘ and ‘Cognitive Mapping and Planning for Visual Navigation‘.

Crypto-cyberpunk AI:
…AI safety organization MIRI recently got a surprise $1.01 million dollar donation from an anonymous philanthropist who had made gains with the ‘Ethereum’ cryptocurrency. More of this weird future please!
Read more about the grand and get a general update of MIRI’s work here.

DeepMind goes North, opens office to host the godfather of reinforcement learning and his acolytes: DeepMind is opening up its first research office located outside of the UK and it has gone for the not-exactly-famous town of Edmonton, Alberta.
...The reason: Alberta is home to the University of Alberta, which is one of the nexuses of research into deep reinforcement learning. The university is home to Richard Sutton – a gregarious AI specialist with a superb beard who literally wrote the book on reinforcement learning and is currently enamored with the idea of using techniques like meta- and hierarchical-based RL to create agents that can deal with large, complex, ambiguous situations. Sutton along with other academics linked to the UofA – like Michael Bowling and Patrick Pilarski – will work part-time at DeepMind’s new office while retaining their links to the university. Seven others are joining as well.
…”As well as continuing to contribute to the academic community through teaching and research, we intend to provide additional funding to support long-term AI programs at UAlberta. Our hope is that this collaboration will help turbo-charge Edmonton’s growth as a technology and research hub, attracting even more world-class AI researchers to the region and helping to keep them there too,” DeepMind CEO Demis Hassabis writes in a blog post.
Find out more about the move here.

Learning to collaboration with deep learning, reinforcement learning, and Facebook AI research:
Facebook, along with OpenAI, DeepMind, Microsoft, and others is working on using modern AI techniques like deep reinforcement learning to figure out how to model and train agents that can work with one another. In the latest experiment, the company is looking at “how to construct agents that can maintain cooperation… while avoiding being exploited by cheaters.”
…The researchers’ take insights based on tbe Prisoner’s Dilemma – a canonical game theory example first outlined formally by mathematician John Nash – and use these in combination with deep RL to create agents “that are capable of solving arbitrary bilateral social dilemmas via the shadow of the future”. They run their experiments on a variety of scenarios with tit-for-tat dynamics (as in, the agents learn to mimic eachother’s behavior when fruitful.) and developed a technique called amTFT (approximate generalization Markov tit-for-tat) which they can use to train systems to develop these values.
…They’re able to use their system to create agents that are far more efficient than others trained with traditional competitive policies, though sometimes at the trade-off of overall score.
…You can find out more about the research by reading the paper: ‘Maintaining Cooperation in complex social dilemmas using deep reinforcement learning’.
…Remember: Though many of the examples being experimented with in the fields of language and social collaboration research seem relatively crude and/or simplistic and/or unrealistic, it’s worth remembering that only 5 years ago the idea of solving something like ImageNet was somewhat fanciful, yet after a few turns of the crank of Moore’s Law crossed with GPUs crossed with algorithmic invention, we’ve actually superseded it.
…Find out more in Facebook’s research paper here: ”Maintaining Cooperation in complex social dilemmas using deep reinforcement learning’. 

Never bring a regular GAN to a DCGANFIGHT:
…Here’s a fun experiment by Alexia Jolicoeur-Martineau in which they use a movable feast of modern GAN techniques to try and get computers to dream up synthetic cat faces. The result of the so-called Meow Generator? The (practically vintage) two-year old DCGAN still seems like it generates the best samples, though some newer techniques have somewhat better prospects for avoiding mode collapse and other hazards found in training these systems.
…Check out the wall of synthetic cats here.

AI versus AI…NIPs get an adversarial example competition…
….Adversarial examples, a class of deep learning inputs that have been manipulated to cause a problem with the end classifiers, are a problem for deep learning. If we ever want AI to be seriously deployed at scale then we almost certainly don’t want it to be vulnerable to data poisoning.
…Kudos then to Google (and in particular Ian Goodfellow) for organizing an adversarial example competition at NIPS 2017. People will have the opportunity to design adversarial images to attempt to poison known and unknown classifiers, as well to try and come up with ways to defend against adversarial examples.
….Details of the competition are available on recent Google acquisition Kaggle.

If you think data is the new oil, then game environments are your jet fuel.
…Facebook has released ELF, a very fast interface between games written in C/C++ and agents written in Python.
The software also includes a special game written by Facebook AI Research to let it rapidly test out new AI algorithms on an environment with the characteristics of competitive environments, such as Starcraft (a game that DeepMind, Facebook, Tencent, and many others are conducting research on.)
…”MiniRTS has all the key dynamics of a real-time strategy game, including gathering resources, building facilities and troops, scouting the unknown territories outside the perceivable regions, and defend/attack the enemy,” Facebook writes. The best part? The game can run at 40,000 frames per second on a MacBook Pro core. The faster you can run an environment the faster you can learn from it.
…The platform provides a hierarchical interface so you can train agents to do increasingly abstract behaviors, potentially letting researchers, say, train a low-level movement policy and leave the high-level one to the inbuilt-AI, or vice versa. I’m curious if eventually we’ll see researchers train multiple policies in parallel at different levels of abstraction, breaking a game down into different layers of information which each get their own AI modules to compute.
…ELF can host any existing game written in C/C++.
…”With this lightweight and flexible platform, RL methods on RTS games can be explored in an efficient way, including forward modeling, hierarchical RL, planning under uncertainty, RL with complicated action space, and so on,” they write.
….You can find out more information in the research paper, Introducing ELF: An Extensive, Lightweight, and Flexible Research Platform for RTS Games.
Access the code on the GitHub repo here.

Baidu teams up with… everyone? for Apollo Self-Driving Car software:
…Baidu and tens of other companies team up for Apollo, an open source self-driving car platform:
…Chinese search engine Baidu is teaming up with tens of other companies, including major automotive players like Bosch and Chevrolet? And others to build Apollo, an open source platform for self-driving cars.
…Key ingredients: Not much – yet. So far it includes modules for localization, control, TK, and TK. I’ll wait to see how they extend it.
Find out more here.

The price of AI:
…Pen? $1
…Dictaphone? $50
…Cheap laptop? $300
Automating local news? $1 million
Google has given a (roughly) $1 million grant to the UK’s Press Association, which will use the money in partnership with an automation startup called ‘Urb’ to trawl through public datasets and then use natural language processing technologies to extract pertinent information and write news stories about it.
Find out more information from the PA here.
….What could possibly go wrong, I infer you thinking? Here’s a good example: the LA Times newspaper has its own ‘earthquake bot’ which watches the data feeds and dutifully produces stories when new tremors shake the eponymous city. Recently, the bot produced a story about quite a severe earthquake. The catch? The earthquake had happened 90 years ago and the bot wasn’t smart enough to scan for dates in the input feed.

Berkeley artificial intelligence research (BAIR) blog posts:
…Background: Berkeley recently set up an AI blog to help its students and faculty better communicate their research to the general public. This is a great initiative!
…Here’s the latest post from Joshua Achiam on Constrainted Policy Optimization.

OpenAI Bits and Pieces:

Karate Kid Neural Networks (aka, curriculum learning):
…New paper from an all-star OpenAI intern team of Tambet Mattisen, Avital Oliver, Taco Cohen, and research scientist John Schulman.
…Components used: Minecraft (via MSR’s Project Malmo).
…Research paper here: Teacher-Student Curriculum Learning

Learning from failure (Hindsight Experience Replay):
Work from our researchers demonstrates unprecedented performance at a variety of robotics manipulation tasks, when paired with algorithms like DDPG.
Research paper here: Hindsight Experience Replay.

Tech Tales:

[ 2035: A soundstage in New York, containing three large eight foot by eight foot by eight foot cubes. A man in a gold-flannel jacket enters the stage from the right, walks to the center in front of the boxes, and faces the crowd.  ]
“Ladies and Gentleman are you here and are you ready for-”
“-ESCAPE THE ROOM!” the crowd chants.
“That’s right folks it’s another week so it’s time for another eeee-may-zing competition between three of our favorite and favored robot companions! In Box1 we have GarbageBot2.5, in Box2 we have Chef-O-Matic, and in Box3 we have the FloorSweeper9000. Now for those viewers joining us for the first time let’s go over the rules. When I press this button on my wrist the boxes will start to heat up. How hot will they get? They’ll get-”
“HOT HOT HOT!” the crowd chants.
“So if these clever fellows can’t figure out how to escape the room before the box gets too hot it’s game-over shutdown-time for our friends here. So are you ready?”
The crowd erupts from its seats, waving hands in the air, forming impromptu conga lines, gangs of kids shouting in unision HOT HOT HOT or ESCAPE THE ROOM or BOTS GET HOT.
“Ok,” the announcer says. “Lets begin!”

He presses the button on his wristwatch and instantly giant LED panels to either side of the stage light up with fat, tall digital thermometers, showing the temperature as it starts to rise inside the boxes.

In Box1, GarbageBot2.5 starts compacting some trash left in its box. It shapes the compacted garbage into a wedge, extrudes the wedge onto the ground, then uses its tire treads to climb up onto it, buying it more time to plan while the floor below it heats up. The Chef-o-Matic bot in Box2 unfolds a set of knives and starts to test one of the corners of its box, tapping away at edges with the tips of the blades, successively applying more pressure.

The temperatures climb. Things are over very quickly in Box3, as the floor-sweeper – a low, wide-mouthed robot – absorbs heat from the floor so rapidly that one of its processing cores burn out. It sits, a partial digital lobotomy and begins to helplessly melt into the floor. .

Over in Box1, GarbageBot has resorted to using a kind of lever on its back to hurl sharp, heavy bits of garbage at one part of the wall. The box is very hot now. It runs out of nearby garbage and starts to grab chunks of compacted trash from the wedge it is standing on, racing to make a dent in the wall, while bringing its rubber treads closer to the – now glowing – hot floor. In Box2, the Chef-o-Matjc makes its first penetration in its wall and the temperature in its box starts to climb much more slowly. The crowd cheers for it. It uses a can opener to start to open up an entire seam of its box, then cuts laterally, creating a metal flap. It starts to accelerate into the flap from the other side of the box, trying to use its weight to push the metal open. As it starts to succeed, things take a turn for the worse in Box1 as one of GarbageBot’s treads gets stuck to the floor. The Chef-o-Matic wins by default, but the host doesn’t stop the competition till its made its way fully out of the box, allowing it the adulation of the crowd.

“HOT CHEF,” shout one of the kids. “HOT CHEF HOT CHEF HOT CHEF”

“So,” the announcer says,  “How do you feel chefbot?” The announcer leans down and plugs a small cable into the back of the robot, linking it to the big LED panels. It sits for a couple of seconds, the screens blank, then they light up with the image of a lobster in a pot, slowly being boiled.

The crowd starts laughing. “Quite an imagination you’ve got there, ChefBot!” says the announcer, then turns his head directly to the crowd and the camera. “That’s all for this week folks. See you again soon!”