Import AI: #80: Facebook accidentally releases a surveillance-AI tool; why emojis are a good candidate for a universal deep learning language; and using deceptive games to explore the stupidity of AI algorithms

by Jack Clark

Researchers try to capture the web’s now-fading Flash bounty for RL research:
FlashRL represents another attempt to make the world’s vast archive of flash games accessible to researchers, but the initial platform has drawbacks…
Researchers with the University of Agder in Norway have released FlashRL, a research platform to help AI researchers mess around with software written in Flash, an outmoded interactive media format that defined much of the most popular games of the early era of the web. The platform has a similar philosophy to OpenAI Universe by trying to give researchers a vast suite of new environments to test and develop algorithms on.
  The dataset: FlashRL ships with “several thousand game environments” taken from around the web.
  How it works: FlashRL uses the Linux library XVFB to create a virtual frame-buffer that it can use for graphics rendering, which then executes flash files within players such as Gnash. FlashRL can access this via a VNC Client designed for this called pyVLC, which subsequently exposes an API to the developer.
  Testing: The researchers test FlashRL by training a neural network to play the game ‘Multitask’ on it. B,ut in the absence of comparable baselines or benchmarks it’s difficult to work out if FlashRL holds any drawbacks with regards to training relative to other systems – a nice thing to do might be to mount a well-known suite of games like the Atari Learning Environment within the system, then provide benchmarks for those games as well.
  Why it might matter: Given the current Cambrian explosion in testing systems it’s likely that FlashRL’s utility will ultimately be derived from how much interest it receives from the community. To gain interest it’s likely the researchers will need to tweak the system so that it can run environments faster than 30 frames-per-second (many other RL frameworks allow FPS’s of 1,000+), because the speed with which you can run an environment is directly correlated to the speed with which you can conduct research on the platform.
– Read more: FlashRL: A Reinforcement Learning Platform for Flash Games (Arxiv).
– Check out the GitHub repository 

Cool job alert! Harvard/MIT Assembly Project Manager:
…Want to work on difficult problems in the public interest? Like helping smart and ethical people build things that matter?…
Harvard University’s Berkman Klein Center (BKC) is looking for a project manager coordinator to help manage its Assembly Program, a joint initiative with the MIT Media Lab that brings together senior developers and other technologists for a semester to build things that grapple with topics in the public interest. Last year’s assembly program was on cybersecurity and this year’s is on issues relating to the ethics and governance of AI (and your humble author is currently enrolled in this very program!). Beyond the Assembly program, the project manager will work on other projects with Professor Jonathan Zittrain and his team.
  For a full description of the responsibilities, qualifications, and application instructions, please visit the Harvard Human Resources Project Manager Listing.

Mongolian researchers tackle a deep learning meme problem:
…Weird things happen when internet culture inspires AI research papers..
Researchers with the National University of Mongolia have published a research paper in which they apply standard techniques (transfer learning via fine-tuning and transferring) to tackle an existing machine learning problem. The novelty is that they base their research on trying to tell the difference between pictures of puppies and muffins – a fun meme/joke on Twitter a few years ago that has subsequently become a kind of deep learning meme.
  Why it matters: The paper is mostly interesting because it signifies that a) the border between traditional academic problems and internet-spawned semi-ironic problems is growing more porous and, b) academics are tapping into internet meme culture to draw interest to their work.
–  Read more: Deep Learning Approach for Very Similar Object Recognition Applicationon Chihuahua and Muffin Problem (Arxiv).

Mapping the emoji landscape with deep learning:
…Learning to understand a new domain of discourse with lots & lots of data…
Emojis have become a kind of shadow language used by people across the world to indicate sentiments. Emojis are also a good candidate for deep learning-based analysis because they consist of a relatively small number of distinct ‘words’ with around ~1,000 emojis in popular use, compared to English where most documents display a working vocabulary of around ~100,000 words. This means it’s easier to conduct research into mapping emojis to specific meanings in language and images with less data than with datasets consisting of traditional languages.
   Now, researchers are experimenting with one of the internet’s best emoji<>language<>images sources: the endless blathering mountain of content on Twitter. “Emoji have some unique advantages for retrieval tasks. The limited nature of emoji (1000+ ideograms as opposed to 100,000+ words) allows for a greater level of certainty regarding the possible query space. Furthermore, emoji are not tied to any particular natural language, and most emoji are pan-cultural,” write the researchers.
  The ‘Twemoji‘ dataset: To analyze emojis, the researchers scraped about 15 million emoji-containing tweets during the summer of 2016, then analyzed this ‘Twemoji’ dataset as well as two derivatives: Twemoji-Balanced (a smaller dataset selected so that no emoji applies to more than 10 examples, chopping out some of the edge-of-the-bell-curve emojis; the crying smiling face Emoji appears in ~1.5 million of the tweets in the corpus, while 116 other emojis are only used a single time) and Twemoji-Images (roughly one million tweets that contain an image as well as emoji). They then apply deep learning techniques to this dataset to try to see if they can complete prediction and retrieval tasks using the emojis.
  Results: Researchers use a bidirectional LSTM to help them perform mappings between emojis and language; use a GoogleLeNet-image classification system to help them map the relationship between emojis and images; and use a combination of the two to understand the relationship between all three. They also learn to suggest different emojis according to the text or visual content of a given tweet. Most of the results should be treated as early baselines rather than landmark results in themselves with top-5 emoji-text prediction accuracies of around ~48.3% and lower accuracies of around 40.3% top-5 predictions for images-text-emojis.
  Why it matters: This paper is another good example of a new trend in deep learning: the technologies have become simple enough that researchers from outside the core AI research field are starting to pick up basic components like LSTMs and pre-trained image classifiers and are using them to re-contextualize existing domains, like understanding linguistics and retrieval tasks via emojis.
–  Read more: The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval (Arxiv).

Facebook researchers train models to perform unprecedentedly-detailed analysis of the human body:
…Research has significant military, surveillance implications (though not discussed in paper)…
Facebook researchers have trained a state-of-the-art system named ‘DensePose’ which can look at 2D photos or videos of people and automatically create high-definition 3D mesh models of the depicted people; an output with broad utility and impact across a number of domains. Their motivation to do this is techniques like this have valuable applications in “graphics, augmented reality, or human-computer interaction, and could also be a stepping stone towards general 3D-based object understanding,” they write. But the published research and soon-to-be-published dataset has significant implications for digital surveillance – a subject not discussed by the researchers within the paper.
  Performance: ‘DensePose’ “can recover highly-accurate correspondence fields for complex scenes involving tens of persons with real-time speed: on a GTX 1080 GPU our system operates at 20-26 frames per second for a 240 × 320 image or 4-5 frames per second for a 800 × 1100 image,” they write. Its performance substantially surpasses previous state-of-the-art systems though is still subhuman in performance.
  Free dataset: To conduct this research Facebook created a dataset based on the ‘COCO’ dataset, annotating 50,000 of its people-containing images with 5 million distinct coordinates to help generate 3D maps of the depicted people.
  Technique: The researchers adopt a multi-stage deep learning based approach which involves first identifying regions-of-interest within an object, then handing each of those specific regions off to their own deep learning pipeline to provide further object segmentation and 3D point prediction and mapping. For any given image, each humans is relatively sparsely labelled with around 100-150 annotations per person. To increase the amount of data available to the network they use a supervisory system to automatically add in the other points during training via the trained models, artificially augmenting the data.
  Components used: Mask R-CNN with Feature Pyramid Networks; both available in Facebook’s just-released ‘Detectron’ system.
  Why it matters: enabling real-time surveillance: There’s a troubling implication of this research: the same system has wide utility within surveillance architectures, potentially letting operators analyze large groups of people to work out if their movements are problematic or not – for instance, such a system could be used to signal to another system if a certain combination of movements are automatically labelled as portending a protest or a riot. I’d hope that Facebook’s researchers felt the utility of releasing such a system outweighed its potential to be abused by other malicious actors, but the lack of any mention of these issues anywhere in the paper is worrying: did Facebook even consider this? Did they discuss this use case internally? Do they have an ‘information hazard’ handbook they go through when releasing such systems? We don’t know. As a community we – including organizations like OpenAI – need to be better about dealing publicly with the information-hazards of releasing increasingly capable systems, lest we enable things in the world that we’d rather not be responsible for.
–  Read more: DensePose: Dense Human Pose Estimation In The Wild (Arxiv).
–  Watch more: Video of DensePose in action.

It’s about time: tips and tricks for better self-driving cars:
…rare self-driving car paper emerges from Chinese robotics company...
Researchers with Horizon Robotics, one of a new crop of Chinese AI companies that builds everything from self-driving car software to chips to the brains for smart cities, have published a research paper that outlines some tips and tricks for designing better simulated self-driving car systems with the aid of deep learning. In the paper they focus on the ‘tactical decision-making’ part of driving, which involves performing actions like changing lanes and reacting to near-term threats. (The rest of the paper implies that features like routing, planning, and control, are hard-coded.)
  Action skipping: Unlike traditional reinforcement learning, the researchers here avoid using action repetition and replay to learn high-level policies and instead using a technique called action skipping. That’s to avoid situations where a car might, for example, learn through action replays to navigate across multiple car lanes at once leading to unsafe behavior. With action skipping, the car instead gets a reward for making a single specific decision (skipping from one lane to another) then gets a modified version of that reward which incorporates the average of the rewards collected during a few periods of time following the initial decision. “One drawback of action skipping is the decrease in decision frequency which will delay or prevent the agent’s reaction to critical events. To improve the situation, the actions can take on different skipping factors during inference. For instance in lane changing tasks, the skipping factor for lane keeping can be kept short to allow for swift maneuvers while the skipping factor for lane switching can be larger so that the agent can complete lane changing actions,” they write.
  Tactical rewards: Reward functions for tactical decision making involve a blend of different competing rewards. Here, the researchers use some constant reward functions relating to the speed of the car, the rewards for lane switching, and the step-cost which tries to encourage the car to learn to take actions that occur over a relatively small number of steps to aid learning, along with contextual rewards for the risk of colliding with another vehicle, whether a traffic light is present, and whether the current environment poses any particular risks such as the presence of bicyclists or modelling the increasing risk of staying on an opposite lane during common actions like overtaking.
  Testing: The researchers test out their approach by placing simulated self-driving cars inside a road simulator then trained via ten simulation runs of 250,000 discrete action steps are more, then tested against 100 pre-generated test episodes where they are evaluated according to their ultimate success of reaching their goal while complying with relevant speed limits and not changing speeds so rapidly as to interfere with passenger comfort.
  Results: The researchers find that implementing their proposed action-skipping and varied reward schemes significantly improves on a somewhat unfair random baseline, as well as against a more reasonable rule-based baseline system.
  Read more: Elements of Effective Deep Reinforcement Learning towards Tactical Driving Decision Making (Arxiv).

Better agents through deception:
Wicked humans compose tricky games to subvert traditional AI systems
One of the huge existential questions about the current AI boom relates to the myopic way that AI agents view objectives; mostagents will tend to mindlessly pursue objectives even though the application of a little bit of what humans call common sense could net them better outcomes. This problem is one of the chief motivations behind a lot of research in AI safety, as figuring out how to get agents to pursue more abstract objectives, or to incorporate more human-like reasoning in their methods of completing tasks, would seem to deal with some safety problems.
  Testing: One way to explore these issues is through testing existing algorithms against scenarios that seek to highlight their current nonsensical reasoning methods. DeepMind has already espoused such an approach with its AI safety gridworlds (Import AI #71), which gives developers a suite of different environments to test agents against that exploits the current way of developing AI agents to optimize specific reward functions. Now, researchers with the University of Strathclyde, Australian National University, and New York University, have proposed their own set of tricky environments, which they call Deceptive Games. The games are implemented in the standardized Video Game Description Language (VGDL) and are used to test AIs  that have been submitted to the General Video Game Artificial Intelligence competition.
  Deceptive Games: The researchers come up with a few different categories of deceptive games:
     Greedy Traps: Exploits the fact an agent can get side-tracked by performing an action that generates an immediate reward which makes it impossible to attain a larger reward down the line.
     Smoothness Traps: Most AI algorithms will optimize for the way of solving a task that involves a smooth increase in difficulty, rather than one where you have to try harder and take more risks but ultimately get larger rewards.
     Generality Traps: Getting AIs to learn general rules about the objects in an environment – like that eating mints guarantees a good reward – then subverting this, for instance by saying that interacting too many times with the aforementioned objects can rapidly transition from giving a positive to a negative reward after some boundary has been crossed.
  Results: As AIs implemented in the GVGAI competition tend to employ a variety of different techniques, and the results show that some very highly-ranked agents perform very poorly on these new environments, while some low-ranked ones perform adequately. Most agents fail to solve most of the environments. The purpose of highlighting the paper here is to provide enough environment in which AI researchers might want to test and evaluate the performance of their own AI algorithms against, potentially creating another ‘AI safety baseline’ to test AIs against. It could also motivate further extension of the GVGAI competition to become significantly harer for AI agents: “Limiting access to the game state, or even requiring AIs to actually learn how the game mechanics work open up a whole new range of deception possibilities. This would also allow us to extend this approach to other games, which might not provide the AI with a forward model, or might require the AI to deal with incomplete or noisy sensor information about the world,” they write.
–  Read more: Deceptive Games (Arxiv).
–  Read more about DeepMind’s earlier ‘AI Safety Gridworlds’ (Arxiv).

Tech Tales:

[2032: A VA hospital in the Midwest]

Me and my exo go way back. The first bit of it glommed onto me after I did my back in during a tour of duty somewhere hot and resource-laden. I guess you could say our relationship literally grew from there.

Let’s set the scene: it’s 2025 and I’m struggling through some physio with my arms on these elevated side bars and my legs moving underneath me. I’m huffing breath and a vein in my neck is pounding and I’m swearing. Vigorously. Nurse Alice says to me “John I really think you should consider the procedure we talked about”. I swivel my eyes up to meets her and I say for the hundredth time or so – with spittle – “Fuck. No. I-”
  I don’t get to finish the sentence because I fall over. Again. For the hundredth time. Nurse Alice is silent. I stare into the spongy crash mat, then tense my arms and try to pick myself up but can’t. So I try to turn on my side and this sets off a twinge in my back which grows in intensity until after a second it feels like someone is pulling and twisting the bundle of muscles at the base of my spine. I scream and moan and my right leg kicks mindlessly. Each time it kicks it sets off more tremors in my back which create more kicks. I can’t stop myself from screaming. I try to go as still and as little as possible. I guess this is how trapped animals feel. Eventually the tremors subside and I feel wet cardboard prodding my gut and realize I’ve crushed a little sippy cup and the water has soaked into my undershirt and my boxers as though I’ve wet myself.
“John,” Alice says. “I think you should try it. It really helps. We’ve had amazing success rates.”
“It looks like a fucking landmine with spiderlegs” I mumble into the mat.
“I’m sorry John I couldn’t hear that, could you speak up?”
Alice says this sort of thing a lot and I think we both know she can hear me. But we pretend. I give up and turn my head so I’m speaking half into the floor and half into open space. “OK,” I say. “Let’s try it.”
“Wonderful!” she says, then, softly, “Commence exo protocol”.
  The fucking thing really does scuttle into the room and when it lands on my back I feel some cold metal around the base of my spine and then some needles of pain as its legs burrow into me, then another spasm starts and according to the CCTV footage I start screaming “you liar! I’ll kill you!” and worse things. But I don’t remember any of this. I pass out a minute or so later, after my screams stop being words. When you review the footage you can see that my screams correspond to its initial leg movements and after I pass out it sort of shimmies itself from side to side, pressing itself closer into my lower back with each swinging lunge until it is pressed into me, very still, a black clasp around the base of my spine. Then Alice and another Nurse load me onto a gurney and take me to a room to recover.

When I woke up a day later or so in the hospital bed I immediately jumped out of it and ran over to the hospital room doorway thinking you lying fuckers I’ll show you. I yanked the door open and ran half into the hall then paused, like Wiley Coyote realizing he has just crossed off of a cliff edge. I looked behind me into the room and back at my just-vacated bed. It dawned on me that I’d covered the distance between bed and door in a second or so, something that would have taken me two crutches and ten minutes the previous day. I pressed one hand to my back and recoiled as I felt the smoothness of the exo. Then I tried lifting a leg in front of me and was able to raise my right one to almost hip height. The same thing worked with the left leg. I patted the exo again and I thought I could feel it tense one of its legs embedded in my spine as though it was saying that’s right, buddy. You can thank me later.
  “John!” Alice said, appearing round a hospital corridor in response ot the alarm from the door opening. “Are you okay?”
“Yes,” I said. “I’m fine.”
“That’s great,” she said, cheerfully. “Now, would you consider putting some clothes on?”
I’d been naked the whole time, so fast did I jump out of bed.

So now it’s three years later and I guess I’m considered a model citizen – pun intended. I’ve got exos on my elbows and knees as well as the one on my back, and they’re all linked together into one singular thing which helps me through life. Next might be one for the twitch in my neck. And its getting better all the time: fleet learning combined with machine learning protocols mean the exo gives me what the top brass call strategic movement optimization: said plainly, I’m now stronger and faster and more precise than regular people. And my exo gets better in proportion to the total number deployed worldwide, which now numbers in the millions.

Of course I do worry about what happens if there’s an EMP and suddenly it all goes wrong and I’m back to where I was. I have a nightmare where the pain returns and the exo rips the muscles in my back out as it jumps away to curl up on itself like a beetle, dying in response to some unseen atmospheric detonation. But I figure the sub-one-percentage chance of that is more than worth the tradeoff. I think my networked exo is happy as well, or at least, I hope it is, because in the middle of the night sometimes I wake up to find my flesh being rocked slightly from side to side by the smart metal embedded within me, as though it is a mother rocking some child to sleep.

Things that inspired this story: Exoskeletons, fleet learning, continuous adaptation, reinforcement learning, intermittent back trouble, physiotherapy, walking sticks.