Import AI 126: What makes Microsoft’s biggest chatbot work; Europe tries to craft AI ethics; and why you should take AI risk seriously

by Jack Clark

Microsoft shares the secrets of XiaoIce, its popular Chinese chatbot:
…Real-world AI is hybrid AI…
Many people in the West are familiar with Tay, a chatbot developed by Microsoft and launched onto the public internet in early 2016, then shortly shutdown after people figured out how to compromise the chatbot’s language engine and make it turn into a – you guessed it – Nazi Racist. What people are probably less familiar with is XiaoIce, a chatbot Microsoft launched in China in 2014 which has since become one of the more popular chatbots deployed worldwide, having communicated with over 660 million users since its launch.
  What is XiaoIce? XiaoIce is “an AI companion with which users form long-term, emotional connections”, Microsoft researchers explain in a new paper describing the system. “XiaoIce aims to pass a particular form of Turing Test known as the time-sharing test, where machines and humans coexist in a companion system with a time-sharing schedule.”
  The chatbot has three main components: IQ, EQ, and Personality. The IQ component involves specific dialogue skills, like being able to answer questions, recommend questions, tell stories, and so on. EQ has two main components: empathy, which involves predicting traits about the individual user XiaoIce is conversing with; and social skills, which is about personalizing responses to the user. Personality: “The XiaoIce persona is designed as a 18-year-old girl who is always reliable, sympathetic, affectionate, and has a wonderful sense of humor,” the researchers write.
  How do you optimize a chatbot? Microsoft optimizes XiaoIce for a metric called Conversation-turns Per Session (CPS) – this represents “the average number of conversation-turns between the chatbot and the user in a conversational session”. The idea is that high numbers here correspond to a lengthy conversation, which seems like a good proxy for user satisfaction (mostly). XiaoIce is structured hierarchically, so it tracks the state of the conversation and selects from various skills and actions so that it can optimize responses over time.
  Data dividends for Microsoft: Since launching in 2014, XiaoIce has generated more than 30 billion conversation pairs (as of May 2018); this illustrates how powerful AI apps can themselves become generators of significant datasets, ultimately obviating dependence on so much external data. “Nowadays, 70% of XiaoIce responses are retrieved from her own past conversations,” they write.
  Hybrid-AI: XiaoIce doesn’t use a huge amount of learned components, though if you read through the system architecture it’s clear that neural networks are being used for certain aspects of the technology – for instance, when responding to a user, XiaoIce may use a ‘neural response generator’ (based on a GRU-RNN) to come up with potential verbal responses, or it may use a retrieval-based system to tap into an external knowledge store. It also uses learned systems for other components, like its ability to analyze images and extract entities from them then use this to talk with or play games with the user – though with a twist of trying to be personalized to the user.
  Just how big and effective is XiaoIce? Since launching in 2014 XiaoIce has grown to become a platform supporting a large set of other chatbots, beyond XiaoIce itself: “These charactrs include more than 60,000 official accounts, Lawson and Tokopedia’s customer service bots, Pokemon, Tencent and Neatease’s chatbots” and more, Microsoft explained.
Since launching XiaoIce’s CPS – the proxy for engagement from users – has grown from a CPS of 5 in version 1, to a CPS of 23 in mid-2018.
  Why this matters: As AI industrializes we’re starting to see companies build systems that hundreds of millions of people interact with, and which grow in capability over time. These products and services give us one of the best ways to calibrate our views about how AI will be deployed in the wild, and what AI technologies are robust enough for prime time.
  Jack’s highly-speculative prediction: I’d encourage people to go and check out Figure 19 in the paper, which gives an overview of the feature growth within XiaoIce since launch. Though the chatbot today is composed of a galaxy of different services and skills, many of which are hand-crafted by humans and a minority of which are learned via neural techniques, it’s also worth remembering that as usage of XiaoIce grows Microsoft will be generating vast amounts of data about how users interact with all these systems, and will also be generating metadata about how all these systems interact on a non-human infrastructure level. This means Microsoft is gathering the sort of data you might need to train some fully learned end-to-end XiaoIce-esque prototype systems – these will by nature by pretty rubbish compared to the primary system, but could be interesting from a research perspective.
  Read more: The Design and Implementation of XiaoIce, an Empathetic Social Chatbot (Arxiv).

US Government passes law to make vast amounts of data open and machine readable:
…Get ready for more data than you can imagine to be available…
Never say government doesn’t do anything for you: new legislation passed in the US House and Senate means federal agencies will be strongly encouraged to publish all their information as open data, using machine readable formats, under permissive software licenses. It will also compel agencies to publish an inventory of all data assets.
  Read more: Full details of the OPEN Government Data Act are available within H.R.4174 – Foundations for Evidence-Based Policymaking Act of 2017 (Congress.Gov).
  Read more: Summary of the OPEN Government Data Act (PDF, Data Coalition summary).
  Read more: OPEN Government Data Act explainer blog post (Data Coalition).

Facebook releases ultra-fast speech recognition system:
…wav2letter++ uses C++ so it runs very quickly…
Facebook AI Research has released wav2letter++, a state-of-the-art speech recognition system that uses convolutional networks (rather than recurrent nets). Wav2letter++ is written in C++ which makes it more efficient than other systems, which are typically written in higher-level languages. “In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition,” the researchers write.
  Results: wav2letter++ gets a word error rate of around 5% on the LibriSpeech corpus with a time per sample of 10ms  while consuming approximately 3.9GB of memory, compared to scores of 7.2% for ESPNet (time-per-sample of 1548ms), and OpenSeq2Seq with a score of 5% and a time-per-sample of 1700ms and memory consumption of 7.8GB. (Though it’s worth noting that OpenSeq2Seq can become more efficient through the usage of mixed precision at training time.)
  Why it matters: Speech recognition has gone from being a proprietary technology developed predominantly by the private sector and (secret) government actors to one that is more accessible to a greater number of people, with companies like Facebook producing high-performance versions of the technology and making it available to everyone for free. This can be seen as a broader sign of the industrialization of AI.
  Read more: Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native (Research in Brief, Code.FB blog).
  Read more: wav2letter++: THe Fastest Open-source Speech Recognition System (Arxiv).

Engineering or Research? ICLR paper review highlights debate:
…When is an AI breakthrough not a breakthrough? When it has required lots of engineering, say reviewers…
If the majority of the work that went into an AI breakthrough involves the engineering of exquisitely designed systems paired with scaled-up algorithms, then is it really an “AI” breakthrough? Or is it in fact merely engineering? This might sound like an odd question to ask, but it’s one that comes up with surprising regularity among AI researchers as a topic of discussion. Now, some of that discussion has been pushed into the open in the form of publicly readable comments from paper reviewers on a paper from DeepMind submitted to ICLR called Large-Scale Visual Speech Recognition.
  The paper obtained state-of-the-art scores on lipreading, significantly exceeding prior SOTAs. It achieved this via a lot of large-scale infrastructure, combined with some elegant algorithmic tricks. But ultimately it was rejected from ICLR, with a comment from a meta-reviewer saying ‘Excellent engineering work, but it’s hard to see how others can build on it’, among other things.
  Why this matters: The AI research community is currently struggling to deal with the massive growth in interest in AI research by a broader number of organizations, and tension is emerging between researchers who work in what I call the “small compute” domain and those that work in the “big compute” domain (like DeepMind, OpenAI, others); what happens when many researchers from one domain aren’t able to build systems that can work in another? That’s a phenomenon that’s already altering the AI research community, as many people who work in academic institutions double-down on development of novel algorithms and then test them on (relatively small) datasets (small compute), while those who work with access to large technical infrastructure – typically those in the private sector – are conducting more and more research which is involved in scaling-up algorithms.
  Read more: Large-Scale Visual Speech Recognition public comments (ICLR OpenReview).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback:…

First draft of EU ethics guidelines:
The European Commission’s High-Level Expert Group on AI has released their draft AI ethics guidelines. They are inviting public feedback on the working document, and will be releasing a final version in March 2019.
  Trustworthy AI: The EU’s framework is focused on ‘trustworthy AI’ as the goal for AI development and deployment. This is defined as respecting fundamental rights and ethical principles, and being technically robust. They identify several core ethical constraints: AI should be designed to improve human wellbeing, to preserve human agency, and to operate fairly, and transparently.
  The report specifies ten practical requirements for AI systems guided by these constraints: accountability; data governance; accessibility; human oversight; non-discrimination; respect for human autonomy; respect for privacy; robustness; safety; and transparency.
  Specific concerns: Some near-term applications of AI may conflict with these principles, like autonomous weapons, social credit systems, and certain surveillance technologies. Interestingly, they are asking for specific input from the public on long-term risks from AI and artificial general intelligence (AGI), noting that the issues have been “highly controversial” within the expert group.
  Why it matters: This is a detailed report, drawing together an impressive range of ethical considerations in AI. The long-run impact of these guidelines will depend strongly on the associated compliance mechanisms, and whether they are taken seriously by the major players, all of whom are non-European (with the partial exception of DeepMind, which is headquartered in London though owned by Alphabet, an American company). The apparent difficulty in making progress on long-term concerns is unfortunate, given how important these issues are (see below).
  Read more: Draft ethics guidelines for trustworthy AI (EU).

Taking AI risk seriously:
Many of the world’s leading AI experts take seriously the idea that advanced AI could pose a threat to humanity’s long-term future. This explainer from Vox, which I recommend reading in full, covers the core arguments for this view, and outlines current work being done on AI safety.
  In a 2016 survey, 50% of experts predict AI will exceed human performance in all tasks within 45 years. The same group place a 5% probability on human-level AI leading to extremely bad outcomes for humanity, such as extinction. AI safety is a nascent field of research, which aims to reduce the likelihood of these catastrophic risks. This includes technical work into aligning AI with human values, and research into the international governance of AI. Despite its importance, global spending on AI safety is in the order of $10m per year, compared to an estimated $19bn total spending on AI.
  Read more: The case for taking AI seriously as a threat to humanity (Vox).
  Read more: When will AI exceed human performance? Evidence from AI experts (arXiv).

Tech Tales:

They Say Ants and Computers Have A Lot In Common

[Extract from a class paper written by a foreign student at Tsinghua School of Business, when asked to “give a thorough overview of one event that re-oriented society in the first half of the 21st century”. The report was subsequently censored and designated to be read solely in “secure locations controlled by [REDACTED].]]

The ‘Festivus Gift Attack’ (FGA), as it is broadly known, was written up in earlier government reports as GloPhilE – short for Global Philanthropic Event – and was initially codenamed Saint_Company; FGA was originated by the multi-billionaire CEO of one of the world’s largest companies, and was developed primarily by a team within their Office of the CEO.

Several hundred people were injured in the FGA event. Following the attack, new legislation was passed worldwide relating to open data formats and standards for inter-robot communication. FGA is widely seen as one of the events that led to the souring of public sentiment against billionaires and indirectly played a role in the passage of both the Global Wealth Accords and the Limits To Private Sector Multi-National Events legislation.

The re-constructed timeline for FGA is roughly as follows. All dates given relative to the day of the event, so 0 corresponds to the day of the injuries and deaths, and -1 the day before, and +1 the day after, and so on.

-365: Multi-Billionaire CEO sends message to Office of the CEO (hereafter: OC) requesting ideas for a worldwide celebration of the festive season that will enhance our public reputation and help position me optimally for future brand-enhancement via political and philanthropic endeavors.

-330: OC responds with set of proposals, including: “$1 for every single person, worldwide [codename: Gini]”; “Free fresh water for every person in underdeveloped countries, subsidized opportunity for water donation in developed countries [codename: tableflip]”; “‘Air conditioning delivered to every single education institute in need of it, worldwide [codename: CoolSchool]”, and “Synchronized global gift delivery to every human on the planet [codename: Saint_Company].

-325: Multi-Billionaire CEO and OC select Saint_Company. “Crash Team” is created and resourced with initial budget of $10 million USD to – according to documents gained through public records request – “Scope out feasibility of project and develop aggressive action plan for rollout on upcoming Christmas Day”.

-250: Prototype Saint_Company event is carried out: drones & robots deliver dummy packages to Billionaire CEO’s 71 global residences; all the packages arrive within one minute of eachother worldwide. Multi-Billionaire CEO invests a further $100 million USD into “Crash Team”.

-150: Second prototype Saint_Company event is carried out: drones & robots deliver variably weight packages containing individualized gifts to 100,000 employees of multi-billionaire CEO’s company spread across 120 distinct countries; 98% of packages arrive within one minute of eachother, a further 1% arrive within one hour of eachother, 0.8% of packages arrive within one day, and 0.2% of packages are not delivered due to software failures (predominantly mapping & planning errors) or environmental failures (one drone was struck by lightning, for instance). Multi-billionaire CEO demands “action plan” to resolve errors.

-145: “Crash Team” delivers action plan to multi-billionaire CEO; CEO diverts resources from owned companies [REDACTED] and [REDACTED] for “scaled-up robot and drone production” and invests a further $[REDACTED]billion into initiative from various financial vehicles, including [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED], [REDACTED].

-140: Multi-billionaire CEO, OC, and personal legal counsel, contact G7 governments and inform them of plans; preliminary sign-off achieved, pending further work on advanced notification of automated air- and land-space defence and monitoring systems.

-130: OC commences meetings with [REDACTED] governments.

-80: The New York Times publishes a story about multi-billionaire CEO’s scaled-up funding of robots and drones; CEO is quoted describing these as part of broader investment into building “the convenience infrastructure of the 21st century, and beyond”.

-20: Multi-billionaire CEO; OC; “Crash Team”, and legal counsels from [REDACTED], [REDACTED], [REDACTED], and [REDACTED] meet to discuss plan for rollout of Saint_Company event. Multi-billionaire signs-off plan.

-5: Global team of [REDACTED]-million contractors are hired, NDA’d, and placed into temporary isolated accommodation at commercial airports, government-controlled airports, and airports controlled by multi-billionaire CEO’s companies.

0: Saint_Company is initiated:
Within first
ten seconds over two billion gifts are delivered worldwide, as majority of urban areas are rapidly saturated in gifts. First reports on social media arrived.
eleven seconds first FGA problems occur as a mis-configuration of [REDACTED] software leads to multiple gifts being assigned against one property. Several hundred packages are dropped around property and in surrounding area.
twenty seconds alerts begin to flow back to multi-billionaire CEO and OC of errors; by this point property has had more than ten thousand gifts delivered to it, causing boxes to pile up around the property eclipsing it from view, and damaging nearby properties.
twenty five seconds more than three billion people have recieved gifts worldwide; errors continue to affect [REDACTED] property and more than one hundred thousand gifts have been delivered to property and surrounding area; videos on social media show boxes falling from sky and hitting children, boxes piling up against people’s windows as they film from inside, boxes crushing other boxes, boxes rolling down streets, cars swerving to avoid them seen via dash-cam footage, various videos of birds being knocked out of sky, multiple pictures of sky blotted out by falling gifts, and so on.
thirty seconds more than four billion people worldwide have recieved gifts; more than one million gifts have been delivered to property and surrounding area; emergency response begins, OC recieves first call from local government regarding erroneous deliveries.
thirty four seconds order is given to cease program Saint_Company; more than 4.5 billion people worldwide have recieved gifts; more than 1.2 million gifts have been delivered to property.
80 seconds first emergency responders arrive to perimeter of affected FGA area and begin to local injured people and transport them to medical facilities.

+1: Emergency responders begin to use combination of heavy equipment and drone-based “catch and release” systems to remove packages from affected properties, forming a circle of activity across 10km across.

+2: All injured people accounted for. Family inside original house unaccounted for. Emergency responders and army begin to set fire to outer perimeter of packages while using fire-combating techniques to create inner “defensive ring” to prevent burning around property where residents are believed to be trapped inside.

+3: Army begins to use explosive charges on outer perimeter to more rapidly remove presents.

+5: Emergency responders reach property to discover severe property damage from aggregated weight of presents; upon going inside they find a family of four – all members are dehydrated and malnourished, but alive, having survived by eating chocolates and drinking fizzy pop from one of the first packages. The child (aged 5 at the time) subsequently develops a lifelong phobia of Christmas confectionery.

+10: Political hearings begin.

+[REDACTED]: Multi-billionaire CEO makes large donation to Tsinghua University; gains right to ‘selectively archive’ [REDACTED]% of student work.

Things that inspired this story: Emergent failures; incompatible data standards; quote from Google infra chief about “at scale, everything breaks“; as I wrote this during a family gathering for the festive season, I’m also duty bound to thank Will (an excitable eight year old), Olivia (spouse) and India (sarcastic teenage cousin) for helping me come up with a couple of the ideas for the narrative in this story.