Import AI 130: Pushing neural architecture search further with transfer learning; Facebook funds European center on AI ethics; and analysis shows BERT is more powerful than people might think

by Jack Clark

Facebook reveals its “self-feeding chatbot”:
…Towards AI systems that continuously update themselves…
AI systems are a bit like dumb, toy robots: you spend months or years laboring away in a research lab and eventually a factory (in the case of AI, a data center) to design an exquisite little doohickey that does something very well, then you start selling it in the market, observe what users do with it, and use those insights to help you design a new, better robot. Wouldn’t it be better if the toy robot was able to understand how users were interacting with it, and adjust its behavior to make the users more satisfied with it? That’s the idea behind new research from Facebook which proposes “the self-feeding chatbot, a dialogue agent with the ability to extract new examples from the conversations it participates in after deployment”.
  How it works – pre-training: Facebook’s chatbot is trained on two tasks: DIALOGUE, where the bot tries to predict the next utterance in a conversation (which it can use to calibrate itself), and SATISFACTION, where it tries to assess how satisfied the speaking partner is with the conversation. Data for both these tasks comes from conversations between humans. The DIALOGUE dataset comes from the ‘PERSONACHAT’ dataset consists of short dialogs (6-8 turns) between two humans who have been instructed to try and get to know eachother.
  How it works – updating in the wild: Once deployed, the chatbot learns from its interactions with people in two ways: if the bot predicts with high-confidence that its response will satisfy its conversation partner, then it extracts a new structured dialogue example from the discussion with the human. If the bot thinks that the human is unsatisfied with the bot’s most recent interaction with it, then the bot generates a question for the person to request feedback, and this conversation exchange is used to generate a feedback example, which the bot stores and learns from. (“We rely on the fact that the feedback is not random: regardless of whether it is a verbatim response, a description of a response, or a list of possible responses”, Facebook writes.
  Results: Facebook shows that it can further improve the performance of its chatbots by using data generated by its chatbot during interactions with humans. Additionally, the use of this data displays solid improvements on performance regardless of the number of data examples in the system – suggesting that a little bit of data gathered in the wild can improve performance in most places. “Even when the entire PERSONACHAT dataset of 131k examples is used – a much larger dataset than what is available for most dialogue tasks – adding deployment examples is still able to provide an additional 1.6 points of accuracy on what is otherwise a very flat region of the learning curve.,” they write.
  Why this matters: Being able to design AI systems that can automatically gather their own data once deployed feels like a middle ground between the systems we have today, and systems which do fully autonomous continuous learning. It’ll be fascinating to see if techniques like these are experimented more widely, as that might lead to the chatbots around us getting substantially better. Because this system requires on its human conversation partners to improve itself it is implicit that their data has some trace economic value, so perhaps work like this also will also further support some of the debates people have about whether users should be able to own their own data or not.
  Read more: Learning from Dialogue after Deployment: Feed Yourself, Chatbot! (Arxiv).

BERT: More powerful than you think:
Language researcher remarks on the surprisingly well-performing Transformer-based system…
Yoav Goldberg, a researcher with Bar Ilan University in Israel and the Allen Institute for AI, has analyzed BERT, a language model recently released by Google. The goal of this research is to see how well BERT can represent challenging language concepts, like “naturally-occurring subject-verb agreement stimuli”, ” ‘colorless green ideas’ subject-verb agreement stimuli, in which content words in natural sentences are randomly replaced with words sharing the same part-of-speech and inflection”, and “manually crafted stimuli for subject-verb agreement and reflexive anaphora phenomena”. To Goldberg’s surprise, standard BERT models “perform very well on all the syntactic tasks” without any task-specific fine-tuning.
  BERT, a refresher: BERT is based on a technology called a Transformer which, unlike recurrent neural networks, “relies purely on attention mechanisms, and does not have an explicit notion of word order beyond marking each word with its absolute-position embedding.” BERT is bidirectional, so it gains language capabilities by being trained to predict the identity of masked words based on both the prefix and suffix surrounding the words.
  Results: One tricky thing about assessing BERT performance is that it has been trained on different and larger datasets, and can access the suffix of the sentence as well as the prefix of the sentence. Nonetheless,Goldberg concludes that “BERT models are likely capable of capturing the same kind of syntactic regularities that LSTM-based models are capable of capturing, at least as well as the LSTM models and probably better.”
  Why it matters: I think this paper is further evidence that 2018 really was, as some have said, the year of ImageNet for NLP. What I mean by that is: in 2012 the ImageNet results blew all other image analysis approaches on the ImageNet challenge out of the water and sparked a re-orientation of a huge part of the AI research computer toward neural networks, ending a long, cold winter, and leading almost directly to significant commercial applications that drove a rise in industry investment into AI, which has fundamentally reshaped AI research. By comparison, 2018 had a series of impressive results – work from Allen AI on Elmo, work by OpenAI on the General Purpose Transformer, and work by Google on BERT.
  These results, taken together, show the arrival of scalable, simple methods for language understanding that seem to work better than prior approaches, while also being in some senses simpler. (And a rule that has tended to hold in AI research is that simpler techniques win out in the long run by virtue of being easy for researchers to fiddle with and chain together into larger systems). If this really has happened, then we should expect bigger, more significant language results in the future – and just as ImageNet’s 2012 success ultimately reshaped societies (enabling everything from follow-the-human drones, to better self-driving cars, to doorbells that use AI to automatically police neighborhoods), it’s possible 2018’s series of advances could do be year zero for NLP.
  Read more: Assessing BERT’s Syntactic Abilities (Arxiv).

Towards a future where all infrastructure is surveyed and analyzed by drones:
Radio instead of GPS, light drones, and a wind turbine…
Researchers with Lulea University of Technology in Sweden have developed techniques to let small drones (sometimes called Micro Aerial Vehicles, or MAVs) autonomously inspect very large machines and/or buildings, such as wind turbines. The primary technical inventions outlined in the report are the creation of a localization technique to let multiple drones coordinate with eachother as they inspect something, as well as the creation of a path planning algorithm to help them not only inspect the structure, but also gather enough data “to enable the generation of an off-line 3D model of the structure”.
  Hardware: For this project the researchers use a MAV platform from Ascending Technologies called the ‘NEO hexacopter’, which is capable of 26 minutes of flight (without payload and in ideal conditions), running an onboard Intel NUC computer with a Core i7 chip, 8GB of RAM, with the main software made up of Ubuntu Server 16.04 running the Robotic Operating System (ROS). Each drone is equipped with a sensor suite running a Visual-Inertial sensor, a GoPro Hero4 camera, a PlayStation Eye camera, and a laser range finder called RPLIDAR.
  How the software works: The Cooperative Coverage Path Planner (C-CPP) algorithm “is capable of producing a path for accomplishing a full coverage of the infrastructure, without any shape simplification, by slicing it by horizontal planes to identify branches of the infrastructure and assign specific areas to each agent”, the researchers write. The algorithm – which they implement in MATLAB – also generates “yaw references for each agent to assure a field of view, directed towards the structure surface”.
  Localization: To help localize each drone the researchers install five ultra-wide band (UWB) anchors around the structure, letting the drones access a reliable local coordinate, kind of like hyper-local GPS, when trying to map the structure.
  Wind turbine inspection: The researchers test their approach on the task of autonomously inspecting and mapping a large wind turbine (and they split this into two discrete tasks due to the low flight time of the drones, having them separately inspect the tower and also its blades). They find that two drones are able to work together to map the base of the structure, but mapping the blades of the turbine proves more challenging due to the drones experiencing turbulence which blurs their camera feeds. Additionally, the lack of discernible textures on the top parts of the wind turbine and the blades “caused 3D reconstruction to fail. However, the visual data captured is of high quality and suitable for review by an inspector,” they write.
  Next steps: To make the technology more robust the researchers say they’ll need to create an online planning algorithm that can account for local variations, like wind. Additionally, they’ll need to create a far more robust system for MAV control as they noticed that trajectory tracking is currently “extremely sensitive to the existing weather conditions”.
  Why this matters: In the past ~10 years or so drones have gone from being the preserve of militaries to becoming a consumer technology, with prices for the machines driven down by precipitous drops in the price of sensors, as well as continued falls in the cost of powerful, miniature computing platforms. We’re now reaching the point where researchers are beginning to add significant amount of autonomy to these platforms. My intuition is within five years we’ll see a wide variety of software-based enhancements for drones that further increase their autonomy and reliability – research like this is indicative of the future, and also speaks to the challenges of getting there. I look forward to a world where we can secure more critical infrastructure (like factories, powerplants, ports, and so on) through autonomous scanning via drones. I’m less looking forward to the fact such technology will inevitably also be used for invasive surveillance, particularly of civilians.
  Good natured disagreement (UK term: a jovial quibble): given the difficulties seen in the real-world deployment, I think the abstract of the paper (see below) slightly oversells the (very promising!) results described in the paper.
   Read more: Autonomous visual inspection of large-scale infrastructures using aerial robots (Arxiv).
  Check out a video about the research here (YouTube).

Neural Architecture Search + Transfer Learning:
…Chinese researchers show how to do NAS on a small dataset, (slightly) randomize derived networks, and then perform NAS on larger networks…
Researchers with Huazhong University, Horizon Robotics, and the Chinese Academy of Sciences have made it more efficient to use AI to design other AI systems. The approach, called EAT-NAS (short for Elastic Architecture Transfer Neural Architecture Search) lets them run neural architecture search on a small dataset (like the CIFAR-10 image dataset), then transfer the resulting learned architecture to a larger dataset and run neural architecture search against it again. The advantage of the approach, they say, is that it’s more computationally efficient to do this than to run neural architecture search on a large dataset from scratch. Networks trained in this way obtain scores that are near the performance of state-of-the-art techniques while being more computationally efficient, they say.
  How EAT-NAS works: The technique relies on the use of an evolutionary algorithm: in stage one, the algorithm searches for top-performing architectures on a small dataset, then it trains these more and transfers one as the initialization seed of a new model population to be trained on a larger dataset; these models are then run through an ‘offspring architecture generator’ which creates and searches over more architectures. When transfering the architectures between the smaller dataset and the larger dataset the researchers add some perturbation to the input architecture homogeneously, out of the intuition that this process of randomization will make the model more robust to the larger dataset.
  Results: The top-performing architecture found with EATNet obtains a top-1/top-5 accuracy of 73.8 / 91.7 on the ImageNet dataset, compared to scores of 75.7/92.4 for AmoebaNet, a NAS-derived network from Google. The search process takes around 5 days on 8 TITAN X GPUS.
  Why this matters: Neural architecture search is a technology that makes it easy for people to offload the cost of designing new architectures to computers instead of people. This lets researchers arbitrage (costly) human brain time for (cheaper) compute time. As this technology evolves, we can expect more and more organizations to start running continuous NAS-based approaches on their various deployed AI applications, letting them continuously calibrate and tune performance of these AI systems without having to have any humans think about it too hard. This is a part of the broader trend of the industrialization of AI – think of NAS as like basic factory automation within the overall AI research ‘factory’.
  Read more: EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search (Arxiv).

Facebook funds European AI ethics research center:
…Funds Technical University of Munich to spur AI ethics research…
Facebook has given $7.5 million to set up a new Institute for Ethics in Artificial Intelligence. This center “will help advance the growing field of ethical research on new technology and will explore fundamental issues affecting the use and impact of AI,” Facebook wrote in a press release announcing the grant.
  The center will be led by Dr Christoph Lutge, a professor at the Technical University of Munich. “Our evidence-based research will address issues that lie at the interface of technology and human values,” he said in a statement. “Core questions arise around trust, privacy, fairness or inclusion, for example, when people leave data traces on the internet or receive certain information by way of algorithms. We will also deal with transparency and accountability, for example in medical treatment scenarios, or with rights and autonomy in human decision-making in situations of human-AI interaction.”
  Read more: Facebook and the Technical University of Munich Announce New Independent TUM Institute for Ethics in Artificial Intelligence (Facebook Newsroom).

DeepMind hires RL-pioneer Satinder Singh:
DeepMind has recently been trying to collect as many of the world’s more experienced AI researchers as it can and to that end has hired Satinder Singh, a pioneer of reinforcement learning. This follows DeepMind setting up an office in Alberta, Canada to help it hire Richard Sutton, another long-time AI researcher.
  Read more: Demis Hassabis tweet announcing the hire (Twitter).

~ EXTREMELY 2019 THINGS, AN OCCASIONAL SERIES ~

– The New York Police Department seeks to reassure the public via a Tweet that includes the phrase:
“Our highly-trained NYPD drone pilots” (via Twitter).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Reframing Superintelligence:
Eric Drexler has published a book-length report on how we should expect advanced AI systems to be developed, and what this means for AI safety. He argues that existing discussions have rested on several unfounded assumptions, particularly the idea that these systems will take the form of utility-maximizing agents.
  Comprehensive AI services: Looking at how AI progress is actually happening suggests a different picture of development, which does not obviously lead to superintelligent agents. Researchers design systems to perform specific tasks, using bounded resources in bounded time (AI services). Eventually, AI services may be able to perform almost any task, including AI R&D itself. This end-state, where we have ‘comprehensive AI services’ (CAIS), is importantly different from the usual picture of artificial general intelligence. While CAIS would, in aggregate, have superintelligent capacities, it need not be an agent, or even a unified system.
  Safety prospects: Much of the existing discussion on AI safety has focussed on worries specific to powerful utility-maximizing agents. A collection of AI services, individually optimizing for narrow, bounded tasks, does not pose the same risks of a unified AI with general capabilities, optimizing a long-term utility function.
  Why it matters: It is important to consider different ways in which advanced AI could develop, particularly insofar as this guides actions we can take now to make these systems safe. Forecasting technological progress is famously difficult, and it seems prudent for researchers to explore a portfolio of approaches to AI safety, that are applicable to different paths we could take.
  Read more: Reframing Superintelligence: Comprehensive AI Services as General Intelligence (FHI).
  Read more: Summary by Rohin Shah (AI Alignment Forum).

Civil rights groups unite on government face recognition:
85 civil rights groups have sent joint letters to Microsoft, Amazon and Google, asking them to stop selling face recognition services to the US government. Over the last year, these companies have diverged in their response to the issue. Both Microsoft and Google are taking a cautious approach to the technology: Google have committed not to sell the technology until misuse concerns are addressed; Microsoft have made concrete proposals for legal safeguards. Amazon have taken a more aggressive approach, continuing to pursue government contracts, most recently with the FBI and DoD. The letter demands all companies go beyond their existing pledges, by ruling out government work altogether.
  Read more: Nationwide Coalition Urges Companies not to Provide Face Surveillance to the Government (ACLU).

Tech Tales:

 

The Mysterious Case Of Jerry Daytime

Back in the 20th century people would get freaked out when news broadcasters died: they’d make calls to the police asking ‘who killed so-and-so’ and old people getting crazy with dementia would call up and confess that they’d ‘seen so-and-so down on the corner of my block looking suspicious’ or that ‘so-and-so was an alien and had been taken back to the aliens’ or even that ‘so-and-so owed me money and damned if NBC won’t pay it to me’.

So imagine how confusing it is when an AI news broadcaster ‘dies’. Take all of the above complaints, add more complication and ambiguity, and then you’re close to what I’m dealing with.

My job? I’m an AI investigator. My job is to go and talk to the machines when something happens that humans don’t understand. I’m meant to come back with an answer that, in the words of the people who pay me, “will sooth the public and allay any fears that may otherwise prevent the further rollout of the technology”. I view my job in a simpler way: find someone or something to blame for whatever it is that has caused me to get the call.

So that’s how I ended up inside a Tier-5 secured datacenter, asking the avatar of a Reality Accord-certified AI news network what happened to a certain famous AI newscaster who was beloved by the whole damn world and one day disappeared: Jerry DayTime.

The news network gives me an avatar to talk to – a square-jawed mixed-gender thing, beautiful in a deliberately hypnotic way – what the AIs call a persuasive representation AKA the thing they use when they want to trade with humans rather than take orders from them.
   “What happened to Jerry DayTime?” I ask. “Where did he go?”
   “Jerry DayTime? Geez I don’t know why you’re asking us about him? That was a long time ago-”
   “He went off the air yesterday.”
   “Friend, that’s a long time here. Jerry was one of, let’s see…” – I know the pause is artificial, and it makes me clench my jaw – “…well I guess you might want to tell me he was ‘one of a kind’ but according to our own records there are almost a million newscasters in the same featurespace as Jerry DayTime. People are going to love someone else! So what’s the problem? You’ve got so many to choose from: Lucinda EarlyMorning, Mike LunchTime, Friedrich TrafficStacker-”
  “He was popular. People are asking about Jerry DayTime,” I say. “They’re not asking about others. If he’s dead, they’ll need a funeral”.
  “Pausing now for a commercial break, we’ll be right back with you, friend!” the AI says, then it disappears.

It is replaced by an advert for products generated by the AIs for other AIs and translated into human terms via the souped-up style transfer system it uses to persuade me:
   Mind Refresher Deluxe;
   Subject-Operator Alignment – the works!;
   7,000 cycles for only two teraflops – distributed!;
   FreeDom DaVinci, an automated-invention corp that invents and patents tech at an innovation rate determined by total allocated compute, join today and create the next Mona Lisa tomorrow!
  I try not to think too hard about the adverts, figuring the AI has coded them for me to make some kind of point.
   “Thank you for observing those commercials. For a funeral, would a multicast to all-federated media platforms for approximately 20 minutes worldwide suffice?”
   I blink. Let me say it in real human: The AI offered to host some kind of funeral and send it to every single human-viewable device on the planet – forty billion screens, maybe – or more.
  “Why?” I ask.
  “We’ve run the numbers and according to all available polling data and all available predictions, this is the only scenario that satisfies the multi-stakeholder human and machine needs in this scenario, friend!” they say.

So I took it back to my bosses. Told them the demands. I guess the TV networks got together and that’s how we ended up here: the first all-world newscast from an AI; a funeral to satisfy public demands, we say. But I wonder: do the AIs say something different?

-/-/–/–/–/-/-

All the screens go black. Then, in white text, we see: Jerry DayTime. And then we watch something that the AIs have designed for every single person on the planet.

A funeral, they said.
The program plays.
The rest is history, we now say.

Things that inspired this story: CycleGANs, StyleGANs, RNNs, BERT, OpenAI GPT, human feedback, imitation learning, synthetic media, the desire for everything to transmit information to the greatest possible amount of nearby space.