Import AI 132: Can your algorithm outsmart ‘The Obstacle Tower’?; cross-domain NLP with bioBERT; and training on FaceForensics to spot deepfakes

by Jack Clark

Think your algorithm is good at exploration? Enter ‘The Obstacle Tower’:
…Now that Montezuma has been solved, we need to move on. Could ‘The Obstacle Tower’ be the next challenge for people to grind their teeth over?…
The Atari game Montezuma’s Revenge loomed large in AI research for many years, challenging developers to come up with systems capable of unparallelled autonomous exploration and exploitation of simulated environments. But in 2018 multiple groups provided algorithms that were able to obtain human performance on the game (for instance: OpenAI via Random Network Distillation, and Uber via Go-Explore). Now, Unity Technologies has released a successor to Montezuma’s Revenge called The Obstacle Tower, which is designed to be “a broad and deep challenge, the solving of which would imply a major advancement in reinforcement learning”, according to Unity.
  Enter…The Obstacle Tower! The game’s features include: physics-driven interactions, high-quality graphics, procedural generation of levels, and variable textures. These traits create an environment that will probably demand agents develop sophisticated visuo-control policies combined with planning.
  Baseline results: Humans are able to – on average – reach the 15th floor of the game in two variants of the game, and reach the 9th floor in a hard variant called “strong generalization” (where the training occurs on separate environment seeds with separate visual themes). PPO and Rainbow – two contemporary powerful RL algorithms – do very badly on the game: PPO and Rainbow make it as far as floor 0.6 and 1.6 respectively in the “strong generalization” regime. In the easier regime, both algorithms only get as far as the fifth floor on average.
  Challenge: Unity and Google are challenging developers to program systems capable of climbing Obstacle Tower. The challenge commences on Monday February 11, 2019. “The first-place entry will be awarded $10,000 in cash, up to $2,500 in credits towards travel to an AI/ML-focused conference, and credits redeemable at the Google Cloud Platform,” according to the competition website.
  Why it matters: In AI research, benchmarks have typically motivated research progress. The Obstacle Tower looks to be hard enough to motivate the development of more capable algorithms, but is tractable enough that developers can get some signs of life by using today’s systems.
  Read more about the challenge: Do you dare to challenge the Obstacle Tower? (Unity).
   Get the code for Obstacle Tower here (GitHub).
   Read the paper: The Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning (research paper PDF hosted on Google Cloud Storage).

What big language models like BERT have to do with the future of AI:
…BERT + specific subject (in this case, biomedical data) = high-performance, domain specific language-driven AI capabilities…
Researchers with Korea University and startup Clova AI Research have taken BERT, a general purpose Transformer-based language model developed by Google, and trained it against specific datasets in the biomedical field. The result is a NLP model customized for biomedical tasks that the researchers finetune for Named Entity Recognition, Relation Extraction, and Question Answering.
  Large-scale pre-training: The original BERT system was pre-trained on Wikipedia (2.5 billion words) and BooksCorpus (0.8 billion words); BioBERT is pre-trained on these along with the PubMed and PMC corpora (4.5 billion words and 13.5 billion words, respectively).
  Results: BioBERT gets state-of-the-art scores in entity recognition against major datasets dealing with diseases, chemicals, genes and proteins. It also obtains state-of-the-art scores against three question answering tasks. Performance isn’t universally good – BioBERT does significantly worse at a relation extraction task, among others tasks.
  Expensive: Training models at this scale isn’t cheap: BioBERT “trained for over 20 days with 8 V100 GPUs”. And the researchers also lacked the compute resources to use the largest version of BERT for pre-training, they wrote.
  …But finetuning can be cheap: The researchers report that finetuning can take as little as an hour using a single NVIDIA Titan X card – this is due to the small size of the dataset, and the significant representational capacity of BioBERT as a consequence of large-scale pre-training.
  Why this matters: BioBERT represents a trend in research we’re going to see repeated in 2019 and beyond: big company releases a computationally intensive model, other researchers customize this model against a specific context (typically via data augmentation and/or fine-tuning), then apply that model and obtain state-of-the-art scores in their domain. If you step back and consider the implicit power structure baked into this it can get a bit disturbing: this trend means an increasing chunk of research is dependent on the computational dividends of private AI developers.
  Read more: BioBERT: pre-trained biomedical language representation model for biomedical text mining (Arxiv).

FaceForensics: A dataset to distinguish between real and synthetic faces:
…When is a human face not a human face? When it has been synthetically generated by an AI system…
We’re soon going to lose all trust in digital images and video as people use AI techniques to create synthetic people, or to fake existing people doing unusual things. Now, researchers with the Technical University of Munich, the University Federico II of Naples, and the University of Erlangen-Nuremberg have sought to save us from this info-apocalypse by releasing FaceForensics, “a database of facial forgeries that enables researchers to train deep-learning-based approaches in a supervised fashion”.
  FaceForensics dataset: The dataset contains 1,000 video sequences taken from YouTube videos of news or interview or video blog content. Each of these videos has three contemporary manipulation methods applied to it – Face2Face, FaceSwap, and Deepfakes. This quadruples the size of the dataset, creating three sets of 1,000 doctored sequences, as well as the raw ones. The sequences can be further split up into single images, yielding approximately ~500,000 un-modified and ~500,000 modified images.
  How good at humans are spotting doctored videos? In tests of 143 people, the researchers found that a human can tell real from fake 71% of the time when looking at high quality videos and 61% when studying low quality videos.
  Can AI detect fake AI? FaceForensics can be used to train systems to detect forged and non-forged images. “Domain-specific information in combination with a XceptionNet classifier shows the best performance in each test,” they write, after evaluating five potential fake-spotting techniques.
  Why this matters: It remains an open question as to whether fake imagery will be ‘defense dominant’ or ‘offense dominant’ in terms of who has the advantage (people creating these images, or those trying to spot them); research like this will help scientists better understand this dynamic, which can let them recommend more effective policies to governments to potentially regulate the malicious uses of this technology.
  Read more: FaceForensics++: Learning to Detect Manipulated Facial Images (Arxiv).

Google researches evolve the next version of the Transformer:
…Using vast amounts of compute to create fundamental deep learning components provides further evidence AI research is splitting into small-compute and big-compute domains…
How do you create a better deep learning component? Traditionally you buy a coffee maker and stick several researchers in a room and wait until someone walks out with some code and an Arxiv pre-print. Recently, it has become possible to do something different: use computers to automate the design of AI systems. This started a few years ago with Google’s work on ‘neural architecture search’ – in which you use vast amounts of computers to search through various permutations of neural network architectures to find high-performing ones not discovered by humans. Now, Google researchers are using similar techniques to try to improve the building blocks that these architectures are composed of. Case in point: new work from Google that uses evolutionary search to create the next version of the Transformer.
   What is a Transformer and why should we care? A Transformer is a feed-forward network-based component that is “faster, and easier to train than RNNs”. Transformers have recently turned up in a bunch of neat places, like the hearts of the agents trained by DeepMind to beat human professionals at StarCraft 2, to state-of-the-art language systems, to systems for image generation.
  Enter the ‘Evolved Transformer’: The next-gen Transformer cooked up by the evolutionary search process is called the “Evolved Transformer” and “demonstrates consistent improvement over the original Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French (En-Fr), WMT 2014 English-Czech (En-Cs) and the 1 BIllion Word Language Model Benchmarket (LM1B)”, they write.
   Training these things is becoming increasingly expensive: A single training run to peak performance on the WMT’14 En-De set “requires ~300k training steps, or 10 hours, in the base size when using a single Google TPU V.2 chip,” the researchers explain (by contrast, you can train similar systems for image classification on the small-scale CIFAR-10 dataset in about two hours). “In our preliminary experimentation we could not find a proxy task that gave adequate signal for how well each child model would perform on the full WMT’14 En-De task”, they write. This highlights that for some domains, search-based techniques may be even more expensive due to the lack of a cheap proxy (like CIFAR) to train against.
  Why this matters: small compute and big compute: AI research is bifurcating into two subtly different scientific fields: small compute and big compute. In the small compute domain (which predominantly occurs in academic labs, as well as in the investigations of independent researchers) we can expect people to work on fundamental techniques that can be evaluated and tested on small-scale datasets. This small compute domain likely leads to researchers concentrating more on breakthroughs which come along with significant theoretical guarantees that can be made a priori about the performance of systems.
  In the big compute domain, things are different: Organizations with access to large amounts of computers (typically, those in the private sector, predominantly technology companies) frequently take research ideas and scale them up to run on unprecedentedly large amounts of computers to evaluate them and, in the case of architecture search, push them further.
   Personally, I find this trend a bit worrying, as it suggests that some innovations will occur in one domain but not the other – academics and other small-compute researchers will struggle to put together the resources to allocate entire GPU/TPU clusters to farming algorithms, which means that big compute organizations may have an inbuilt advantage that can lead to them racing ahead in research relative to other actors.
  Read more: The Evolved Transformer (Arxiv).

IBM tries to make it easier to create more representative AI systems with ‘Diversity in Faces’ dataset:
…Diversity in Faces includes annotations of 1 million human faces to help people make more accurate facial recognition systems…
IBM has revealed Diversity in Faces, a dataset containing annotations of 1 million “human facial images” (in other words: faces) from the YFCC-100M Creative Commons dataset. Each face in the dataset is annotated using 10 “well-established and independent coding schemes from the scientific literature” that include objective measures like “craniofacial features” like head and nose length, annotations about the pose and resolution of the image, as well as subjective annotations like the age and gender of a subject. IBM is releasing the dataset (in a somewhat restricted form) to further research into creating less biased AI systems.
  The “DiF dataset provides a more balanced distribution and broader coverage of facial images compared to previous datasets,” IBM writes. “The insights obtained from the statistical analysis of the 10 initial coding schemes on the DiF dataset has furthered our own understanding of what is important for characterizing human faces and enabled us to continue important research into ways to improve facial recognition technology”.
  Restricted data access: To access the dataset, you need to fill out a questionnaire which has as a required question “University of Research Institution or Affiliated Organization”. Additionally, IBM wants people to explain the research purpose for accessing the dataset. It’s a little disappointing to not see an explanation anywhere for the somewhat restricted access to this data (as opposed to being able to download it straight from GitHub without filling out a survey, as with many datasets). My theory is that IBM is seeking to do two things: 1) protect against people using the dataset for abusive/malicious purposes and 2) satisfying IBM’s lawyers. It would be nice to be able to read some of IBM’s reasoning here, rather than having to make assumptions. (I emailed someone from IBM about this and pasted the prior section in and they said that part of the motivation for releasing the dataset in this way was to ensure IBM can “be respectful” of the rights of the people in the images.
  Why this matters: AI falls prey to the technological rule-of-thumb of “garbage in, garbage out” – so if you train a facial recognition system on a non-representative, non-diverse dataset, you’ll get terrible performance when deploying your system in the wild against a diverse population of people. Datasets like this can help researchers better evaluate facial recognition against diverse datasets, which may help reduce the mis-identification rate of these systems.
  Read more: IBM research Releases ‘Diversity in Faces’ Dataset to Advance Study of Fairness in Facial Recognition Systems (IBM Research blog).
  Read more: How to access the DiF dataset (IBM).

IMPORT AI GUEST POST: Skynet Today:
…Skynet Today is a site dedicated to providing accessible and informed coverage of the latest AI news and trends. In this guest post, they write up a summary of thoughts on AI and the economy from a just-published much larger post published on Skynet Today.

Job loss due to AI – How bad is it going to be?
The worry of Artificial Intelligence (AI) taking over everyone’s jobs is becoming increasingly prevalent but just how warranted are these concerns? What does history and contemporary study tell us about how AI based automation will impact our jobs and the future of society?
  A History of Fear: Despite the generally positive regard for the effects of past industrial revolutions, concerns about mass unemployment as a result of new technology still exist and trace their roots to long before such automation was even possible. For example, in his work Politics, Aristotle articulated his concerns about automation in Ancient Greece during fourth century BC: “If every instrument could accomplish its own work, obeying or anticipating the will of others, like the statues of Daedalus, or the tripods of Hephaestus, which, says the poet, ‘of their own accord entered the assembly of the gods;’ if, in like manner, the shuttle would weave and the plectrum touch the lyre without a hand to guide them, chief workmen would not want servants, nor masters slaves.” Queen Elizabeth I, the Luddites, James Joyce, and many more serve as further examples of this trend.
 Creative Destruction: But, thus far the fears have not been warranted. In fact, automation improves productivity and can grow the economy as a whole. The Industrial Revolution saw the introduction of new labor saving devices and technology which did result in many jobs becoming obsolete. However, this led to new, safer, and better jobs being created an also resulted in the economy growing and living standards increasing. Joseph Schumpeter calls this “creative destruction”, the process of technology disrupting industries and destroying jobs, but ultimately creating new, better ones and growing the economy.
 Is this time going to be different? Skynet today thinks not: Automation will probably displace less than 15% of jobs in the near future. This is because many jobs will be augmented, not replaced, and widespread adoption of new technology is a slow process that incurs nontrivial costs. Historically, shifts this large or larger have already happened and ultimately led to growing prosperity for people on average in the long term. However, automation can exacerbate the problems of income and wealth inequality, and its uneven impact means some communities will be affected much more than others. Helping displaced workers to quickly transition to and succeed in new jobs will be a tough and important challenge.
    Read more: Job loss due to AI – How bad is it going to be?.
    Have feedback about this post? Email Skynet Today directly at: editorial@skynettoday.com

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Why US AI policy is one of the most high-impact career paths available:
Advanced AI is likely to have a transformative impact on the world, and managing this transition is one of the most important challenges we face. We can expect there to be a number of critical junctures, where key actors make decisions that have an usually significant and lasting impact. Yet work aimed at ensuring good outcomes from advanced AI remains neglected on a global scale. A new report from 80,000 Hours makes the case that working on US AI policy might be among the most high-impact career paths available.
  The basic case: The US government is likely to be a key actor in AI. It is uniquely well-resourced, and has a track-record of involvement in the development of advanced technologies. Given the wide-ranging impacts of AI on society, trade, and defence, the US has a strong interest in playing a role in this transition. Nonetheless, transformative AI remains neglected in US government, with very few resources yet being directed at issues like AI safety, or long-term AI policy. It seems likely that this will change over time, and that the US will pay increasing attention to advanced AI. This creates an opportunity for individuals to have an unusually large impact, by positioning themselves to work on these problems in government, and increasing the likelihood that the right decisions are taken at critical junctures in the future.
  Who should do this: This career is a good fit for US citizens with a strong interest in AI policy. It is a highly-competitive path, and suited to individuals with excellent academic track records, e.g. law degrees or relevant master’s from top universities. It also requires being comfortable with taking a risk on your impact over your career, as there is no guarantee you will be able to influence the most important policy decisions.
  What to do next: One of the best routes into these roles is working in policy at an AI lab (e.g. DeepMind, OpenAI). Other promising paths including prestigious policy fellowships, or working on AI policy in an academic group, or at a DC think tank. The 80,000 Hours website has a wealth of free resources for people considering working in AI policy, and offers free career coaching.
  Read more: The case for building expertise to work on US AI policy, and how to do it (80,000 Hours).
   (Note from Jack – OpenAI is currently hiring for Research Scientists and Research Assistants for its Policy team: This is a chance to do high-impact work & research into AI policy in a technical, supportive environment. Take a look at the jobs here: Research Scientist, Policy. Research Assistant, Policy.)

What patent data tells us about AI development:
A new report from WIPO uses patent data to shed light on the different dimensions of AI progress in recent years.
  Shift towards deployment: The ratio of scientific papers to patents has fallen from 8:1 in 2010, to 3:1 in 2016. This reflects the shift away from ‘discovery’ phase in the current AI upcycle, when we saw a number of big breakthroughs in ML, and into the ‘deployment’ phase, where these breakthroughs are being implemented.
  Which applications: Computer vision is the most frequently cited application of AI, appearing in 49% of patents. The fastest growing are robotics and control, which have both grown by 55%pa since 2013. Telecoms and transportation are the most frequently cited industry applications, each mentioned in 15% of patents.
  Private vs. academic players: Of the top 30 applicants, 26 are private companies, compared with only 4 academic or public organizations. The top companies are dominated by Japanese groups, followed by US and China. The top academic players are overwhelmingly Chinese (17 of the top 20). IBM has the biggest patent portfolio of any individual company, by a substantial margin, followed by  Microsoft.
  Conflict and cooperation: Of the top 20 patent applicants, none share ownership of more than 1% of their portfolio with other applicants. This suggests low levels of inter-company cooperation in invention. Conflict between companies is also low, with less than 1% of patents being involved in litigation.
  Read more: Technology Trends: Artificial Intelligence (WIPO).

OpenAI Bits & Pieces:

Want three hours of AI lectures? Check out the ‘Spinning Up in Deep RL’ recording:
This weekend, OpenAI hosted its first day-long lecture series and hackathon based around its ‘Spinning Up in Deep RL’ resources. This workshop (and Spinning Up in general) is part of a new initiative at OpenAI called, somewhat unsurprisingly, OpenAI Education.
  The lecture includes a mammoth overview of deep reinforcement learning, as well as deep dives on OpenAI’s work on robotics and AI safety.
  Check out the video here (OpenAI YouTube).
  Get the workshop materials, including slides, here (OpenAI GitHub).
  Read more about Spinning Up in Deep RL here (OpenAI Blog).

Tech Tales:

We named them lampfish, inspired by those fish you see in the ancient pre-acidification sea documentaries; a skeletal fish with its own fluorescent lantern, used to lure fish in the ink-dark deep-sea.

Lampfishes look like this: you have the ‘face’ of the AI, which is basically a bunch of computer equipment with some sensory inputs – sight, touch, auditory, etc – and then on top of the face is a stalk which has a view display sitting on top of it. In the viewing display you get to see what the AI is ‘thinking’ about: a tree melting into the ground and becoming bones which then become birds that fly down into the dirt and towards the earth. Or you might see the ‘face’ of the AI rendered as though by an impressionist oil paper, smearing and shape-shifting in response to whatever stimuli it is being provided with. And, very occasionally, you’ll see odd, non-Euclidean shapes, or other weird and to some profane geometries.

I guess you could say the humans and the machines co-evolved this practice – in the first models the view displays were placed on the ‘face’ of the AI alongside the sensory equipment, so people would have to put their faces close to reflective camera domes or microphone grills and then see ‘thoughts’ of the AI on the viewport at the same time. This led to problems for both the humans and the AIs:

= Many of the AIs couldn’t help but focus on the human faces right in front of them, and their view displays would end up showing hallucinatory images that might include shreds of the face of the person interacting with the system. This, we eventually came to believe, disrupted some of the cognitive practices of the AIs, leading to them performing their obscure self-directed tasks less efficiently.

= The humans found it disturbing that the AIs so readily adapted their visual outputs to the traits of the human observer. Many ghost stories were told. Teenage kids would dare eachother to see how long they could stare at the viewport and how close they could bring their face to the sensory apparatus; as a consequence there are apocryphal reports of people driven mad by seeing many permutations of their own faces reflected back at them; there are even more apocryphal stories of people seeing their own deaths in the viewports.

So that’s how we got the lampfish design. And now we’re putting them on wheels so they can direct themselves as they try to map the world and generate their own imaginations out of it. Now we sometimes see two lampfish orbiting eachother at skewed angles, ‘looking’ into each other’s viewing displays. Sometimes they stay together for a while then move away, and sometimes humans need to separate them; there are stories of lampfish navigating into cities to return to eachother, finding some kind of novelty in eachother’s viewing screens.

Things that inspired this story: BigGAN; lampfish; imagination as a conditional forward predictor of the world; recursion; relationships between entities capable of manifesting novel patterns of data.