Import AI Issue 57: Robots with manners, why computers are like telescopes to the future, and Microsofts bets big on FPGAs over ASICs

by Jack Clark

AI Safety, where philosophy meets engineering:
…AI safety is a nebulous, growing, important topic that’s somewhat poorly understood – even by those working within it. One question the community lacks a satisfying answer to is: what is the correct layer of abstraction at which to ensure safety? Do we do it by encoding a bunch of philosophical and logical precepts into a machine, then feed it on successively more high-fidelity realities? Or do we train systems to model their behavior’s on human’s own actions and behaviors, potentially trading off some interpretability for the notion that humans ‘know what looks right’ and (mostly) act in ways that other humans approve of.
… This writeup by Open Phil’s Daniel Dewey sheds some light on one half of this question, which is MIRI’s work on ‘highly reliable agent design’ and its attempts to tackle some of the thornier problems inherent to the precept side of AI safety (eg – how can we guarantee a self-improving system doesn’t develop wildly divergent views to us about what constitutes good behavior? What sorts of reasoning systems can we expect the agent to adopt when participating in our environments?How does the agent model the actions of others to itself?).
…Read more here: ‘My current thoughts on MIRI’s ‘highly reliable agent design’ work.

Why compute is strategic for AI:
…Though data is crucial to AI algorithms, I think for AI development computers are much more strategic, especially when carrying out research on problems that demand complex environments (like enhancing reinforcement learning algorithms, or work on multi-agent simulations, and so on).
…”Having a really, really big computer is kind of like a time warp, in that you can do things that aren’t economical now but will be economically [feasible] maybe a decade from now,” says investor Bill Joy.
…Read more in this Q&A with Joy about technology and a (potentially) better battery.

That’s Numberwang – MNIST for Fashion arrives:
…German e-commerce company Zalando has published ‘Fashion-MNIST’, a training dataset containing 60,000 28X28px images of different types of garment, like trousers or t-shirts or shows. This is quite a big deal – everyone tends to reach for the tried-and-tested MNIST when testing out new AI classification systems, but as the dataset just consists of numbers 0-9 in a range of different formats, it’s also become terribly boring. (And there’s some concern that we could be overfitting).
…”Fashion-MNIST is intended to serve as a direct drop-in replacement of the original MNIST dataset for benchmarking machine learning algorithms,” they write. Let’s hope that if people test on MNIST they now also test on Fashion-MNIST as well (or better yet, move on to CIFAR or ImageNet as a new standard ‘testing baseline’.)
…Read more about the dataset here.
…Check out benchmarks on the dataset published by Zalando here.

Reach out and touch shapes #2: New Grasping research from Google X:
…When you pick up a coffee cup you’ve never seen before, what do you do? Personally, I eyeball it, then in my head I figure out roughly where I should grab it based on its appearance and my previous (voluminous) experience at picking up coffee cups, then I grab it. If I smash it I (hopefully) learn about how my grip was wrong and adjust for next time.
…Now, researchers from Google have tried to mimic some of this broad mental process by creating what they call a ‘Geometry-aware’ learning agent that lets them teach their own robots to pick up any of 101 everyday objects with a success rate of between 70% and 80% (and around 60% on totally never-before-seen objects).
…The system represents the new sort of architecture being built – highly specialized and highly modular. Here, an agent studies an object in front of it through around three to four distinct camera views, then uses this spread of 2D images to infer a 3D representation of the object, which it then projects into an OpenGL layer which it uses to manipulate views and potential grasps of the object. It’s able to figure out appropriate grasps by drawing on an internal representation of around 150,000  valid demonstration grasps over these behaviors, then adjusting its behavior to have characteristics similar to those successful grasps. The system works and demonstrates significantly better performance than other systems, though until it gets to accuracies in the 99%+ range it is unlikely to be of major use to industry. (Though given how rapidly deep learning can progress, it seems likely progress could be swift here.)
…Notable: Google only needed around ~1500 human demonstrations (given via HTC Vive in virtual reality in Google’s open source ‘PyBullet’ 3D world environment) to create the dataset of 150,000 distinct grasping predictions. It was able to augment the human demonstrations with a series of orientation randomization systems to help it generate other, synthetic, successful grips.
…Read more here: Learning Grasping Interaction with Geometry-Aware 3D Representations.

Skinning the magnetic cat with traditional physics techniques, as well as machine learning techniques:
What connections exist between machine learning and physics? In this illuminating post we learn how traditional physics techniques as well as ML ones can be used to make meaningful statements about interactions in a (simple) Ising model.
…Read more here in: How does physics connect to machine learning?

A selection of points at the intersection of healthcare and machine learning:
…Deep learning-based pose estimation techniques can be used to better spot and diagnose afflictions like Parkinson’s, embeddings derived from the social media timelines of people can help provide ongoing diagnosis capabilities regarding mental health, and the FDA needs to give approval for a new deep learning model but will accept tweaks to existing models without needing people to fill in a tremendous amount of forms – read about these points and more in this post: 30 Things I Learned at MLHC 2017.

Microsoft matches IBM’s speech recognition breakthrough with a significantly simpler system:
…A team from Microsoft Research have revealed their latest speech recognition system, with an error rate of around 5.1% on the Switchboard corpus.
Read more about the system here (PDF).
…Progress on speech recognition has been quite rapid here, with IBM and Microsoft fiercely competing with eachother to set new standards, presumably because they want to sell speech recognition systems to large-scale customers, while – and this is pure supposition on my part – Amazon and Google plan to sell theirs via API and are less concerned about the PR field.
…A quick refresher on error rates on switchboard.
…August 2017: Microsoft: 5.1%*
…March 2017: IBM: 5.5%
…October 2016: Microsoft: 5.9%**
…September 2016: Microsoft: 6.3%
…April 2016: IBM: 6.9%.
…*Microsoft claims parity with human transcribers, though wait for external validation of this.
…**Microsoft claimed parity with human transcribers, though turned out to be an inaccurate measure.

Ultimate surveillance: AI to recognize you simply by the way you walk:
We’re slowly acclimatizing to the idea that governments will use facial recognition technologies widely across their societies – in recent years the technology has expanded from police and surveillance systems into border control checkpoints and now, in places like China, into public spaces like street crosses, where AI spots repeat offenders at relatively minor crimes like jaywalking, or crossing against a light.
…Activists already wear masks or bandage their faces to try to stymie these efforts. Some artists have even proposed daubing on certain kinds of makeup that stymie facial recognition systems (a fun, real world demonstration of the power of adversarial examples).
…Now, researchers with Masaryk University in the Czech Republic propose using video surveillance systems to identify a person, infer their own specific gait, then search for that gait across other security cameras.
…”You are how you walk. Your identity is your gait pattern itself. Instead of classifying walker identities as names or numbers that are not available in any case, a forensic investigator rather asks for information about their appearances captured by surveillance system – their location trace that includes timestamp and geolocation of each appearance. In the suggested application, walkers are clustered rather than classified. Identification is carried out as a query-by-example,” the researchers write.
How it works: The system takes input from a standard RGB-D camera (the same as those found in the Kinect – now quite widely available) then uses motion capture technology to derive the underlying structure of the person’s movements. Individual models of different people’s gaits are learned through a combination of Fisher’s Linear Discriminant Analysis and Maximum Margin Critereon (MMC).
How well does it work: Not hugely well, so put the tinfoil hats down for now. But as many research groups are doing work on gait analysis and identification as part of large-scale video understanding projects I’d expect the basic components that go into this sort of project improve over time.
…Read more: You Are How You Walk: Uncooperative MoCap Gait Identification for Video Surveillance with Incomplete and Noisy Data.
…Bonus question: Could techniques such as this spot Keyser Soze?

Review article: just what the heck has been happening in deep reinforcement learning?
…Several researchers have put together a review paper, analyzing progress in deep RL. Deep RL is a set of techniques that have underpinned recent advances in getting AI systems to control and master computer games purely from pixel inputs, and to learn useful behaviors on robots (real and simulated), along with other applications.
…If some of what I just wrote was puzzling to you, then you might benefit from reading the paper here: A Brief Survey of Deep Reinforcement Learning.
…Everyone should read the conclusion of the piece: “Whilst there are many challenges in seeking to understand our complex and everchanging world, RL allows us to choose how we explore it. In effect, RL endows agents with the ability to perform experiments to better understand their surroundings, enabling them to learn even high-level causal relationships. The availability of high-quality visual renderers and physics engines now enables us to take steps in this direction, with works that try to learn intuitive models of physics in visual environments. Challenges remain before this will be possible in the real world, but steady progress is being made in agents that learn the fundamental principles of the world through observation and action.”

Robots with manners (and ‘caresses’):
…In some parts of Northern England it’s pretty typical that you greet someone – even a stranger – by wondering up to them, slapping them on the arm, and saying ‘way-eye’. In London, if you do that people tend to stare at you with a look of frightfully English panic, or call the police.
…How do we make sure our robots don’t make these sorts of social faux pas? An EU-JAPAN project called ‘CARESSES’ is trying to solve this, by creating robots that pay attention to the cultural norms of the place they’re deployed in.
…The project so far consists of a set of observations about how robots can integrate behaviors that account for cultural shifts, and includes three different motivating scenarios, created through consultation with a Transcultural Nurse. This includes having the robot minimize uncertainty when talking to someone from Japan, or checking how deferential it should be with a Greek Cypriot.
…Components used: the system runs on the ‘universeAAL’ platform, an EU AI framework project, and integrates with ‘ECHONET’, a Japanese standard for home automation.
…Read more for (at this stage) mostly a list of possible approaches. In a few years it’s likely that various current research avenues in deep reinforcement learning could be integrated into robot systems like the ones described within:
The CARESSES EU-Japan project: making assistive robots culturally competent.

Microsoft has a Brainwave with FPGAs specialized for AI:
…Moore’s Law is over – pesky facts of reality, like Amdahl’s Law for transistor scaling, or the materials science properties of silicon – are putting a brake on progression in traditional chip architectures. So what’s an ambitious company with plans for AI domination to do? The answer if you’re Google is to try to create an application specific integrated circuit (ASIC) with certain AI capabilities baked directly into the logic of the chip – that’s what the company’s Tensor Processing Units (TPUs) are for.
..Microsoft is taking a different tack with ‘Project Brainwave’, an initiative to use field programmable gate arrays for AI processing, with a small ASIC-esque component baked onto each FPGA. The bet here is that though FPGAs tend to be less efficient than ASICs, their innate flexibility (field programmable means you can modify the logic of the chip after it has been fabbed and deployed in a data center) means Microsoft will be able to adapt them to new workloads as rapidly as new AI components get invented.
…The details: Microsoft’s chips contain a small hardware accelerator element (similar to a TPU though likely broader in scope and with less specific performance accelerations), and a big block of undifferentiated FPGA infrastructure.
..The bet: Google is betting that its worthwhile to optimize chips for current basic AI operations, trading off flexibility for performance, while Microsoft is betting the latter. Developments in AI research and their relative rate of occurrence will make one of these strategies succeed and the other struggle.
…Read more about the chips here, and check out the technical slide presentation.

All hail the software hegemony:
…Saku P, a VR programmer with idiosyncratic views on pretty much everything – has a theory that Amazon represents the shape of most future companies – a large software entity that scales itself through employing contractors for its edge business functions (aka, dealing with actual humans in the form of delivering goods), while using its core business to build infrastructure to enable secondary and tertiary businesses.
…Play this tape forward and what you get is an economy dominated by a few colossal technology companies, likely spending vast sums on building technical vanity projects that double as strategic business investments (see: Facebook’s various drone schemes, Google’s ‘net infrastructure everywhere push, Jeff Bezos pouring his Amazon-derived wealth into space company Blue Origin, and so on.).
…Read more here: How big will companies be in the 21st Century?

Tech Tales:

[2027: Kexingham Green, a council estate in the outer-outer exurban sprawl of London, UK. Beyond the green belt, where new grey social housing towns rose following the greater foreign property speculation carnival of the late twenty-teens. A slab of housing created by the government’s ‘renew and rehouse from the edge’ campaign, housing tens of thousands of souls, numerous chain supermarkets, and many now derelict parking lots.]

Durk Ciaran, baseball cap on backwards and scuffed yeezies on his feet paired with a pristine – starched? – Arsenal FC sponsored by Alphabet (™) shirt – regarded the crowd in front of him. “Ladies and gentlemen and drones let me introduce to you the rawest, most blinged out, most advanced circus in all of Kex-G – Durk’s Defiant Circus!”
…”DDC! DDC! DDC!,” yell the crowds.
…”So let’s begin,” Durk says, sticking two fingers in his mouth and letting out a long whistle. A low, hockeypuck ex-warehoue drone hisses out of a pizza box at the edge of the crowd and moves towards Durk, who without looking raises one foot as the machine slides under it, then another, suddenly standing on the robot. Durk begins to move in a long circle, spinning slightly on the ‘bot. “Alright,” he says, “Who’s hungry?”
…”The drones!” yells the crowd.
…”Please,” Durk says, “Pigeons!” A ripple of laughter. He takes a loaf of bread out of his pocket and holds it against the right side of his torso with his elbow, using his left hand to pull doughy chunks of it, placing them in his right hand. Once he’s got a fistful of bread chunks he takes the bread and puts it back in his pocket. “Are we ready?” he says.
…”Yeahh!!!!!” yell the crowd.
…:”Alright, bless up!” he says, tossing the chunks of bread in the air. And out of a couple of ragged tents at the edge of the parking lot come the drones, fizzing out, grey, re-purposed Amazon Royal Mail (™) delivery drones, now homing in on the little trackers Durk baked into the bread the previous evening. The drones home in on the bread and then their little fake pigeon mouthes snap open, gulping down the chunks, slamming shut again. A small hail of crumbs fall on the crowd, who go wild.
…But there’s a problem with one of the drones – one of its four propellers starts to emit a strange, low-pitched juddering hum. Its flight angle changes. The crowd start to worry, audible groans and ‘whoas’ flood out of them.
…”Now what’s gonna happen to this Pigeon?: Durk says, looking up at the drone. “What’s it gonna do?” But he knows. He thumbs a button on what looks superficially like a bike key on his pocket key fob. Visualizes in his head what will soon become apparent to the crowd. Listens to the drone judder. He closes his eyes, spinning on the re-purposed warehouse bot, listening to the crowd as they chatter to themselves, some audibly commenting on others craning their heads. Then he hears the sighs. Then the “look, look!”. Then the sound of a kid crying slightly. “What’s going on Mummy what is THAT?”
…It comes in fast, from a great distance. Launches off of a distant towerblock. Dark, military-seeming green. A carrier drone. Re-purposed Chinese tech, originally used by the PLA to drop supplies across Africa as part of a soft geopolitical outreach program, now sold in black electronics markets around the world. Cheap transport, no questions asked. Durk looks at it now. Sees the great, Eagle-like eyes spraypainted on the side of its front. The carrier door fitted with 3D-printed plastic to form a great yellow beak. Green Eagle DDC stenciled on one of its wings, facing up so the crowd can’t see it but he can. It opens its mouth. The small, grey Pigeon drone tries to fly away but can’t, its rotor damaged. Green Eagle comes in and with a metal gulp eats the drone whole, its yellow mouth snapping shut, before arcing up and away.
…”The early bird gets the worm,” Durk says. “But you need to think about the thing that likes to eat the earlybirds. Now thankyou ladies and gentlemen and please – make a donation to the DDC, crypto details in the stream, or here.” He snaps his fingers and a lengthy set of numbers and letters appears in LEDs on the sidewalk. “Now, goodbye!” he says, thumbing another button in his pocket, letting his repurposed warehouse drone carry him towards one of the towerblocks, hiding him back into the rarely surveiled Kexingham estate, just before the police arrive.

Ideas that inspired this story:
Drones, DJI, deep reinforcement learning, Amazon Go, Kiva Systems, AI as geopolitical power, Drones as geopolitical power, Technology as the ultimate lever in soft geopolitical power, land speculators.

…Tech Tales Coda:

Last week I wrote about a Bitcoin mine putting its chip-design skills to use to create AI processors and spin-up large-scale AI processing mines.
…Imagine my surprise when I stumbled on this Quartz story a few hours after sending the newsletter: Chinese company Bitmain has used its chip-design skills to create a new set of AI processors and to spin-up an AI processing wing of its business. Spooky!