Import AI

Import AI: #95: Learning to predict and avoid internet arguments with deep learning, White House announces Select Committee on AI, and BMW trains cars to safely change lanes

Cornell, Google, and Wikimedia researchers train AI to predict when we’ll get angry on the internet:
…Spoiler: Very blunt comments with little attempt made at being polite tend to lead to aggressive conversations…
Have you ever read a comment addressed to you on the internet and decided not to reply because your intuition tells you the person is looking to start a fight? Is it possible to train AI systems to have a similar predictive ability and thereby create automated systems that can flag conversations as having a likelihood of spiraling into aggression? That’s the idea behind new research from Cornell University, Jigsaw, and the Wikimedia Foundation. The research tries to predict troublesome conversations based on a dataset taken from the discussion sections of ‘Wikipedia Talk’ pages.
  Dataset: To carry out the experiment, the researchers gathered a total of 1,270 conversations, half consisting of ones which became aggressive following the initial comments, and half consisting of ones which remained civil. (Categorizing civil versus on-track was done via a combination of the use of Jigsaw’s “Perspective” API, and gathering labels from humans via CrowdFlower.) These conversations had an average length of 4.6 comments.
  How it works: Armed with this dataset, the researchers characterized conversations via what they call “pragmatic devices signalling politeness”. This is a set of features that correspond to whether the conversation includes attempts to be friendly (liberal use of ‘thanks’, ‘please’, and so on), along with words used to indicate a position that welcomes debate (eg, by clarifying statements with phrases like “I believe” or “I think”). They then study the initial comment and see if their system can learn to predict whether it will yield negative comments in the future.
  Results: Humans are successful about 72% of the time at predicting nasty conversations from this dataset. The system designed by these researchers (which relies on logistic regression – nothing too fancy) is about 61.6% accurate, and baselines (bag of words and sentiment lexicon) get around ~56%. (One variant of the proposed technique gets accuracy of 64.9%, but this is a little dubious as it is trained on way more data and it’s unclear whether it is overfitting, as it is also trained on the same data corpus.) The researchers also derive some statistical correlations that could help humans as well as machines better spot comments that are prone to spiral into aggresion. “We find a rough correspondence between linguistic directness and the likelihood of future personal attacks. In particular, comments which contain direct questions, or exhibit sentence initial you (i.e., “2nd person start”), tend to start awry-turning conversations significantly more often than ones that stay on track,” they write. “This effect coheres with our intuition that directness signals some latent hostility from the conversation’s initiator, and perhaps reinforces the forcefulness of contentious impositions.”
  Why it matters: Systems like this show how with a relatively small amount of data it is possible to build classification systems that can, if paired with the right features, effectively categorize subtle human interactions online. While here such a system is used to do something that seems to be for the purpose of social good (figuring out how to identify and potentially avoid aggressive conversations), it’s worth remembering that a very similar approach could be used to, for instance, identify conversations where initial comments could correlate to conversations that have a high chance of displaying political views that are contrary to those views of the people building such systems, and so on. It would be nice to see an acknowledgement of this in the paper itself.
  Read more: Conversations Gone Awry: Detecting Early Signs of Conversational Failure (Arxiv).

Chinese researchers tackle Dota-like game King of Glory with RL + MCTS:
Tencent researchers take inspiration from AlphaGo Zero to tackle Chinese MOBA King of Glory…
Modern multiplayer strategy games are becoming a testbed for reinforcement learning and multi-agent algorithms. Following work by Facebook and DeepMind on StarCraft 1 and 2, and work by OpenAI on Dota, researchers with the University of Pittsburgh and Tencent AI Lab have published details on an AI technique which they evaluate on King of Glory, a Tencent-made massively multiplayer online battle arena (MOBA) game. The proposed system uses Monte Carlo Tree Search (MCTS – a technique also crucial to DeepMind’s work on tackling the board game Go) and incorporates techniques from AlphaGo Zero to “to produce a stronger tree search using previous tree results”. “Our proposed algorithm is a provably near-optimal variant (and in some respects, generalization) of the AlphaGo Zero algorithm” they write.
  Results: The researchers test out their technique within King of Glory by evaluating agents trained with their technique against other agents controlled by the in-game AI. They also test it against four variants of their proposed technique which, respectively: have no rollouts; use direct policy iteration; implement approximate value iteration; and one trained via supervised learning on 100,000 state-action pairs of human gameplay data. (This also functions as a basic ablation study of the proposed technique, also). Their system beats all of these approaches, with the closest competitor being the variant with no rollouts (this one also looks most similar to AlphaGo Zero).
  Things that make you go hmmm: Researchers still tend to attack problems like this by training the AI systems over a multitude of hand-selected features, so it’s not like these algorithms are automatically inferring optimal inputs from which to learn from. “The state variable of the system is taken to be a 41-dimensional vector containing information obtained directly from the game engine, including hero locations, hero health, minion health, hero skill state, and relative locations to various structures,” they write. A lot of human ingenuity goes into selecting these inputs and likely adjusting hyperparameters to denote the importance of any particular input, so there’s a significant unacknowledged human component to this work.
  Why it matters: This paper provides more evidence that AI researchers are going to use increasingly modern, sophisticated games to test and evaluate AI systems. It’s also quite interesting that this work comes from a Chinese AI lab, indicating that these research organizations are pursuing similarly large-scale problems to some labs in the West – there’s more commonality here than I think people presume, and it’d be interesting to see the various researchers come together and discuss ideas in the future about how to tackle even more advanced games.
  Read more: Feedback-Based Tree Search for Reinforcement Learning (Arxiv).

Today’s AI amounts to little more than curve-fitting, says Turing Award winner:
…Judea Pearl is impressed by deep learning success, but worries researchers have become complacent about inability to deal with causality…
Turing Award-winner Judea Pearl is concerned that the AI industry’s current obsession with deep learning is causing it to ignore harder problems, like developing machines that can build causal models of the world. He discusses some of these concerns in an interview with Quanta Magazine to discuss his new book “The Book of Why: The New Science of Cause and Effect“.
  Selected quotes:
– “Mathematics has not developed the asymmetric language required to capture our understanding that if X causes Y that does not mean that causes X.”
– “As much as I look into what’s being done with deep learning, I see they’re all stuck there on the level of associations. Curve fitting.”
– “We did not expect that so many problems could be solved by pure curve fitting. It turns out they can. But I’m asking about the future — what next? Can you have a robot scientist that would plan an experiment and find new answers to pending scientific questions? That’s the next step.”
– “The first step, one that will take place in maybe 10 years, is that conceptual models of reality will be programmed by humans..the next step will be that machines will postulate such models on their own and will verify and refine them based on empirical evidence.
Read more: To Build Truly Intelligent Machines, Teach Them Cause and Effect (Arxiv).

Google prepares auto-email service “Smart Compose”:
…Surprisingly simple components lead to powerful things, given enough data…Google researchers have outlined the technology they’ve used to create ‘Smart Compose’, a new service within Gmail that will automatically compose emails for people as they type them. The main ingredients are a Bag of Words model and a Recurrent Neural Network Language Model. This combination of technologies leads to a system that is “faster than the seq2seq models with only a slight sacrificed to model prediction quality”. These components are also surprisingly simple, indicating just how much can be achieved when you’ve got access to a scalable technique and a truly massive dataset. Google says that by offloading most of the computation onto TPUs it was able to reduce the average latency to tens of milliseconds – earlier experiments showed it that latencies higher than 100 milliseconds or so led to user dissatisfaction.
  Read more: Smart Compose: Using Neural Networks to Help Write Emails (Google Blog).

White House plans Select Committee on AI:
…Hosts summit between AI and industry experts, reinforces regulatory-light approach to tech…
The White House recently hosted a “Summit on AI for American Industry”, bringing together industry, academia, and government, to discuss how to support and further artificial intelligence in America. A published summary of the event from the Office of Science and Technology Policy highlights some of the steps this administration has taken with regard to AI – much of the actions include the elevation of AI in White House communications as a strategic area, with more mentions of it in documents ranging from the National Defense and National Security Strategy, to guidance from the Office of Management and Budget (OMB) given to agencies.
  Select Committee on AI: The White House will create a “Select Committee on Artificial Intelligence”, which will primarily be comprised of “the most senior R&D officials in the Federal government”. This committee will advise the White House, facilitate partnerships with industry and academia, enhance coordination across the Federal government on AI R&D, and identify ways to use government data and compute resources to support AI. The committee will feature staff from OSTP, the National Science Foundation, the Defense Advanced Research Projects Agency, the director of IARPA, and others. The committee may call upon the private sector as well, according to its charter.
  Regulation: In prepared remarks OSTP Deputy US Chief Technology Officer Michael Kratsios said “Our Administration is not in the business of conquering imaginary beasts. We will not try to “solve” problems that don’t exist. To the greatest degree possible, we will allow scientists and technologists to freely develop their next great inventions right here in the United States. Command-control policies will never be able to keep up. Nor will we limit ourselves with international commitments rooted in fear of worst-case scenarios.”
  Why it matters: Around the world, countries are enacting broad national strategies relating to artificial intelligence. France has committed substantially far more funding relative to its existing funding amount to AI than other countries, and China (which by virtue of its governance structure will tend to out-spend Western countries on broad science and technology developments) has committed many additional billions of dollars of funding to AI. It remains to be seen whether the US’s strategy of leaving the vast amount of AI development to the private sector is the optimal decision, given the immense opportunities the technology holds and its demonstrable responsiveness to additional infusions of money. America also has some problems with its AI ecosystem that aren’t being dealt with today, like the fact that many of academia’s most creative professors are being drawn into industry at the same time as class sizes for undergraduate and graduate AI courses are booming and PHD applications are spiking, reducing the quality of US education in AI. It’d be interesting to see what kinds of recommendations the Select Committee makes and how effective it will be at confronting the contemporary challenges and opportunities faced by the administration with regard to US AI competitiveness.
  Read more: Summary of the 2018 White House Summit on Artificial Intelligence for American Industry (White House OSTP, PDF)

Democrat Representative calls for National AI Strategy:
…Points to European, French, Chinese efforts as justification for US action…
Congressman John Delaney (Maryland) has written an op-ed in Wired calling for a National AI Strategy for the US. Delaney has himself co-sponsored a bill (along with Republican and Democrat congresspeople and senators) calling for the creation of a commission to device such a strategy, called the FUTURE of AI Act (Fundamentally Understanding the Usability and Realistic Evolution of Artificial Intelligence Act).
 Selected quotes:
– “The United States needs a full assessment of the state of American research and technology, and what the short and long-term problems and opportunities are.”
– “Whether you are a conservative or a progressive, this future is coming. As I look at where the world is headed, I believe that we need to expand public investment in research, encourage collaboration between the public and private sector, and make sure that AI is deployed in a way that is wholly consistent with our values and with existing laws.”
– ” If the US doesn’t act, we’re in danger of falling behind.”
  Why it matters: Societies across the world are changing as a consequence of the deployment of artificial intelligence, whether through unparallelled opportunities for providing better healthcare and accessibility services to citizens, to being able to utilize the same technologies for surveillance and various national security purposes. It seems to intuitively make sense to survey the whole AI field and look for ways that a country can implement a holistic plan. It seems likely that there will be a bunch of complementary initiatives in the US, ranging from targeted actions like those espoused by the OSTP, to broader analyses performed by other parties, like the Senate, or government agencies.
   Read more: France, China, and the EU all have an AI strategy, shouldn’t the US? (Wired Opinion).

Learning to lane change with recurrent neural networks:
…BMW researchers try to teach safe driving via seq2seq learning…
Researchers with car company BMW and the technical university of Munich in Germany have trained simulated self-driving car AI agents in a crude simulation to learn how to lane change safely. They achieve this by implementing a bidirectional RNN with long short-term memory, which learns to predict the velocity of a car and its surrounding neighbors at any point in time, then uses this prediction to work out if it will be safe for the vehicle to change into another lane.
  Results: The system is evaluated against the NGSIM dataset, a detailed traffic dataset taken from monitoring real traffic in LA in the mid-2000s. It outperforms other baselines but, given the restricted nature of the domain, the lack of an ability to compare performance against (secret) systems developed by other automotive experts, and the absence of integration with a deeper car simulation, it’s not clear how well this result will transfer to real domains.
  Why it matters: All cars are becoming significantly more automated, regardless of the overall maturity of full self-driving car technology. Papers like this give us a view into the development of increasingly automated vehicular systems that use components developed by the rest of the AI community.
  Read more: Situation Assessment for Planning Lane Changes: Combining Recurrent Models and Prediction (Arxiv).

Tech Tales:

Billionaire Cities

I guess we should have expected them, these billionaire cities. They started sprouting up after the price of basic space travel came down enough for billionaires to build their own launchpads, letting them mesh their business and life enough to create miniature cities to tend to their numerous inter-locking businesses. Many of these cities were built in places far above sea level, in preparation for an expected dire climate future.

These cities always had a few common components: a datacemter to host secure data and compute services, frequently also running local artificial intelligence services; automated transit systems to ferry people around; fleets of drones providing constant logistics and surveillance abilities; goods robots for heavy lifting; robotic chefs; and even a few teams of humans, which tended to these machines or spoke to other humans or worked in some other manner for the billionaire.

These cities rgew as the billionaires (and eventually trillionairs) competed with eachother to build ever more sophisticated and ever more automated systems. Soon after this competition began, we heard the first rumors of the brain-interface projects.

Teams of people were said to be hired by these billionaires to work within these by-now almost entirely automated gleaming cities. The people were paid gigantic sums of money to sign themselves away for contracts of two to three years, and to be discrete about it. Then the billionaire would fly-in teams of surgeons and have them perform brain surgery on the people, giving them interfaces that let them plug in to the data feeds of the city, intuitively sensing them and being able to eventually learn to understand them. It was said that arrangements of this kind, with the digital AI of the city and the augmented human brains interlinked, led to superior performance and flexibility to other systems.

We have recently heard rumors of other things – longer contracts, more elaborate surgeries, but those are as yet unsubstantiated.

Things that inspired this story: Brain-machine interfaces, Gini coefficient, spaceships with VTOL capability, cybernetics.

Import AI: #94: Google Duplex generates automation anxiety backlash, researchers show how easy it is to make a 3D-printed autonomous drone, Microsoft sells voice cloning services.

Microsoft sells “custom voice” speech synthesis:
…The commercial voice cloning era arrives…
Microsoft will soon sell “Custom Voice” a system to let businesses give their application a “one-of-a-kind, recognizable brand voice, with no coding required”. This product follows various research breakthroughs in the area of speech synthesis and speech cloning, like work from Baidu on voice cloning, and work from Google and DeepMind on speech synthesis.
  Why it matters: As the Google ‘Duplex’ system shows, the era of capable, realistic-sounding natural language systems is arriving. It’s going to be crucial to run as many experiments in society as possible to see how people react to automated systems in different domains. Being able to customize the voice of any given system to a particular context seems like a necessary ingredient for further acceptance of AI systems by the world.
  Read more: Custom Voice (Microsoft).

Teaching neural networks to perform low-light amplification:
…Another case where data + learnable components beats hand-designed algorithms…
Researchers with UIUC and Intel Labs have released a dataset for training image processing systems to take in images that are so dark as to be imperceptible to humans and to automatically process those images so that they’re human-visible. The resulting system can be used to amplify low-light images by up to 300 times while displaying meaningful noise reduction and low levels of color transformation.
  Dataset: The researchers collect and publish the ‘See-in-the-Dark’ (SID) dataset, which contains 5094 raw short exposure images, each with a corresponding long-exposure reference image. This dataset spans around 400 distinct scenes, as they also produce some bursts of short exposure images of the same scene.
  Technique: The researchers tested out their system using a multi-scale aggregation network and a U-net (both networks were selected for their ability to process full-resolution images at 4240×2832 or 6000×4000 in GPU memory). They trained networks by pairing the raw data of the short-exposure image with the corresponding long-exposure image(s). They applied random flipping and rotation for data augmentation, also.
  Results: They compared the results of their network with the output of BM3D, a non-naive denoising algorithm, and a burst denoising technique, and used Amazon’s Mechanical Turk platform to poll people on which images they preferred. Users overwhelmingly preferred the images resulting from the technique described in the paper compared to BM3D, and in some cases preferred images generated by this technique to those created by the burst method.
  Why it matters: Techniques like this show how we can use neural networks to change how we solve problems from developing specific hand-tuned single-purpose algorithms, to instead learning to effectively mix and match various trainable components and data inputs to solve general problem classes. In the future it’d be interesting if the researchers could further cut the time it takes the trained system to process each image as this would make a real-time view possible, potentially giving people another way to see in the dark.
  Read more: Learning-to-See-in-the-Dark (GitHub).
  Read more: Learning to See in the Dark (Arxiv).

Google researchers try to boost AI performance via in-graph computation:
…As the AI world relies on more distributed, parallel execution, our need for new systems increases…
Google researchers have outlined many of the steps they’ve taken to improve components in the TensorFlow language to let them execute more aspects of a distributed AI job within the same computation graph. This increases the performance and efficiency of algorithms, and shows how AI’s tendency towards mass distribution and parallelism is driving significant changes in how we program things (see also: Andrej Karpathy’s “Software 2.0” thesis.)
  The main idea explored in the paper is how to distribute a modern machine learning job in such a way it can seamlessly run across CPUs, GPUs, TPUs, and other novel chip architectures. This is trickier than it sounds, since within a large-scale, contemporary job there are typically a multitude of components which need to interact with eachother, sometimes multiple times. This has caused Google to extend and refine various TensorFlow components to better support plotting all the computations within a model on the same computational graph, which lets it optimize the graph for underlying architectures. That differs to traditional approaches which usually involve specifying aspects of the execution in a separate block of code usually written in the control logic of the application (eg, invoking various AI modules written in TensorFlow within a big chunk of Python code, as opposed to executing everything within a big unified TF lump of code.
  Results: There’s some preliminary evidence that this approach can have significant benefits. “A baseline implementation of DQN without dynamic control flow requires conditional execution to be driven sequentially from the client program. The in-graph approach fuses all steps of the DQN algorithm into a single dataflow graph with dynamic control flow, which is invoked once per interaction with the reinforcement learning environment. Thus, this approach allows the entire computation to stay inside the system runtime, and enables parallel execution, including the overlapping of I/O with other work on a GPU. It yields a speedup of 21% over the baseline. Qualitatively, users report that the in-graph approach yields a more self-contained and deployable DQN implementation; the algorithm is encapsulated in the dataflow graph, rather than split between the dataflow graph and code in the host language,” write the researchers.
  Read more: Dynamic Control Flow in Large-Scale Machine Learning (Arxiv).
  Read more: Software 2.0 (Andrej Karpathy).

Google tries to automate rote customer service with Duplex:
…New service sees Google accidentally take people for a hike through the uncanny valley of AI…
Google has revealed Duplex, an AI system that uses language modelling, speech recognition, and speech synthesis to automate tasks like booking appointments at hair salons, or reserving tables at restaurants. Duplex will let Google’s own automated AI systems talk directly to humans at other businesses, letting the company automate human interactions and also more easily harvest data from the messy real world.
  How it works: “The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more. We trained our understanding model separately for each task, but leveraged the shared corpus across tasks,” Google writes. Speech synthesis is achieved via both Tacotron and Wavenet (systems developed respectively by Google Brain and by DeepMind). It also uses human traits, like “hmm”s and “uh”s, to sound more natural to humans on the other end.
  Data harvesting: One use of the system is to help Google harvest more information from the world, for instance by autonomously calling up businesses and finding out their opening hours, then digitizing this information and making it available through Google.
  Accessibility: The system could be potentially useful for people with accessibility needs, like those with hearing impairments, and could potentially work in other languages, where you might ask Duplex to accomplish something and then it will use a local language to interface with a local business.
  The creepy uncanny valley of AI: Though Google Duplex is an impressive demonstration of advancements in AI technology, its announcement also elicited a lot of concern from a lot of people who worried that it will be used to further automated more jobs, and that it is pretty dubious ethically to have an AI talk to (typically poorly paid) people and harvest information from them without identifying itself as the AI appendage of a fantastically profitable multinational tech company. Google responded to some of these concerns by subsequently saying Duplex will identify itself as an AI system when talking to people, though hasn’t given more details on what this will look like in practice.
  Why it matters: Systems like Duplex show how AI is going to increasingly be used to automate aspects of day-to-day life that were previously solely the domain of person-to-person interactions. I think it’s this use case that triggered the (quite high) amount of criticism of the service, as people grow worried that the rate of progress in AI doesn’t quite match the rate of wider progress in the infrastructure of society.
  Read more: Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone (Google Blog).
  Read more: Google Grapples With ‘Horrifying’ Reaction to Uncanny AI Tech (Bloomberg).

Palm-sized auto-navigation drones are closer than you think:
…The era of the cheap, smart, mobile, 3D-printable nanodrones cometh…
Researchers with ETH Zurich, the University of Zurich, and the University of Bologna have shown how to squeeze a crude drone-navigation neural network onto an ultra-portable 3D-printed ‘nanodrone’. The research indicates how drones are going to evolve in the future and serves as a proof-of-concept for how low-cost electronics, 3D printing, and widely available open source components can let people create surprisingly capable and potentially (though this is not discussed in the research but is clearly possible from a technical standpoint) dangerous machines. “In this work, we present what, to the best of our knowledge, is the first deployment of a state-of-art, fully autonomous vision-based navigation system based on deep learning on top of a UAV compute node consuming less than 94 mW at peak, fully integrated within an open source COTS CrazyFlie 2.0 UAV,” the researchers write. “Our system is based on GAP8, a novel parallel ultra-low-power computing platform, and deployed on a 27g commercial, open source CrazyFlie 2.0 nano-quadrotor”.
  Approach: To get this system to work the researchers needed to carefully select and integrate a neural network with a ultra-low-power processor. The integration work included designing the various processing stages of the selected neural network to be as computationally efficient as possible, which required them to modify an existing ‘DroNet’ model to further reduce memory use. The resulting drone is able to run DroNet at 12frames-per-second, which is sufficient for real-time navigation and collision avoidance.
  Why it matters: Though this proof-of-concept is somewhat primitive in capability it shows how capable and widely deployable basic neural network systems like ‘DroNet’ are becoming. In the future, we’ll be able to train such systems over more data and use more computers to train larger (and therefore more capable) models. If we’re also able to improve our ability to compress these models and deploy them into the world, then we’ll soon live in an era of DIY autonomous machines.
  Read more: Ultra Low Power Deep-Learning-powered Autonomous Nano Drones (Arxiv).

OpenAI Bits & Pieces:

Jack Clark speaking in London on 18th May:
I’m going to be speaking in London on Friday at the AI & Politics meetup, in which I’ll talk about some of the policy challenges inherent to artificial intelligence. Come along! Beer! Puzzles! Fiendish problems!
  Read more: AI & Politics Episode VIII – Policy Puzzles with Jack Clark (Eventbrite).

Tech Tales:

Amusement Park for One.

[Extract from an e-flyer for the premium tier of “experiences at Robotland”, a theme park built over a superfund site in America.]

Before you arrival at ROBOTLAND you will receive a call from our automated customer success agent to your own personal AI (or yourself, please indicate a preference at the end of this form). This agent will learn about your desires and will use this to build a unique psychographic profile of you which will be privately transmitted to our patented ‘Oz Park’ (OP) experience-design system. ROBOTLAND contains over 10,000 uniquely configurable robotic platforms, each of which can be modified according to your specific needs. To give you an idea of the range of experiences we have generated in the past, here are the names of some previous events hosted at ROBOTLAND and developed through our OP system: Metal Noah’s Ark, Robot Fox Hunting, post-Rise of the Machines Escape Game, Pagan Transformers, and Dominance Simulation Among Thirteen Distinct Phenotypes with Additional Weaponry.

Things that inspired this story: Google Duplex, robots, George Saunders’ short stories, Disneyland, direct mail copywriting.  


Import AI #93: Facebook boosts image recognition by pre-training on a billion photos, better robot transfer learning via domain randomization, and Alibaba-linked researchers improve bin-packing with AI

Classifying trees with a DJI drone and a lot of patience:
…Consumer-grade drones shown to be able to gather necessarily detailed data for tree species classification…
Japanese researchers have shown that consumer-grade drone cameras are of sufficient quality to gather RGB images of trees and use these to train an AI model to distinguish between different species.
  Details: The researchers gathered their data via a drone test flight in late 2016 in the forest located in the the Kamigamo Experimental Station in Kyoto, Japan. They used a commodity consumer drone (a DJI Phantom 4) alongside proprietary software for navigation (DroneDeploy) and image editing (Agisoft Photoscan Professional).
  Results: The resulting trained model can classify five of a possible six types of tree with close to 90%+ accuracy. The researchers improved the performance of the classifier by copying and augmenting the input data.
  Why it matters: One of the most powerful aspects of modern AI is its ability to perform effective classification of anything you can put together a training dataset for. Research like this points to a future where drones and other robots are use to periodically scan and classify the world around us, offering us new capabilities in areas like flora and fauna management, disaster response, and so on.
  Read more: Automatic classification of trees using a UAV onboard camera and deep learning (Arxiv).

What does AGI safety research man and who is doing it?
…What AI safety is, how the field is progressing, and where it’s going next…
Researchers at Australian National University (including Marcus Hutter) have surveyed the field of artificial intelligence providing an overview of the differences and overlaps between various AGI initiatives. The paper also contains a distillation of why people bother to work on AI safety: “if we want an artificial general intelligence to pursue goals that we approve of, we better make sure that we design the AGI to pursue such goals: Beneficial goals will not emerge automatically as the system gets smarter,” the researchers write.
  Problems, problems everywhere: The paper includes a reasonably thorough overview of the different AGI safety research agendas pursued by organizations like MIRI, OpenAI, DeepMind, the Future of Life Institute, and so on. The tl;dr: there are lots of distinct problems relating to AI safety, and OpenAI and DeepMind teams have quite a lot of overlap in terms of research specializations.
  Policy puzzles: “It could be said that public policy on AGI does not exist,” the researchers write, before noting that there are several preliminary attempts at creating AI policy (including the recent ‘Malicious Actors’ report), while observing that much of the current public narrative (the emergence of an AI arms race between US and China) runs counter to most of the policy suggestions put forward by the AI community.
  Read more: AGI Safety Literature Review (Arxiv).

Why your next Alibaba delivery could be arranged by an AI:
…Chinese researchers show how to learn effective bin-packing…
Chinese researchers with the Artificial Intelligence Department of Zhejiang Cainiao Supply Chain Management Co. achieved state-of-the-art results on a 3D pin-packing problem (BPP) via the use of multi-task learning techniques. In this work, they try to define a system that can figure out the optimum way to stack objects to fit into a box whose proportions can also be learned and specified by the algorithm. BPP might sound boring – after all, this is the science of packing things in boxes – but it’s a crucial task to logistics and e-retail, so figuring out systems to adaptively learn to do packing of arbitrary numbers of goods in an optimal way seems useful.
  Data: The researchers gather the data from an unnamed E-commerce platform and logistics platform (though one of the researchers is from Alibaba, so there’s a high likelihood the data comes from there) to create a dataset consisting of 15,000 training items and 15,000 testing items, spread across orders that involve 8, 10, and 12 distinct items.
  Approach: They structure the problem as a sequence-to-sequence one, with item descriptions being fed as input to an LSTM encoder with the decoder output corresponding to the item details and the orientation in the box.
  Resuts: Models trained by the researchers obtain substantially higher accuracy than prior baselines, though not many people publicly compete in this area yet so I’m unsure as to how progress will change over time.
  Read more: A Multi-task Selected Learning Approach for Solving New Type 3D Bin Packing Problem (Arxiv).

Facebook auto-translation option into Messenger:
…”M Translations” feature will let people converse across language gaps…
Facebook has added automatic translation to Facebook Messenger. Translation like this may generate new business opportunities for the company – “at launch, M translations will translate from English to Spanish (and vice-versa) and be available in Marketplace conversations between buyers and sellers in the United States,” the company said.
  Read more: Messenger at F8 – App review re-opens, New products for Businesses and Developers launch (FB Messenger blog).

A neural net to understand and approximate the Universe:
…Particle physics collides with artificial intelligence…
Harvard researchers show how they use neural networks to analyze the movements of particles in jets. Neural networks are useful tools to apply to analyzing multi-variant problems like these, because they can learn to compute the probability distribution generating the data they observe, and therefore over time generate an interpretation of the forces governing system.
  “We scaffold the neural network architecture around a leading-order description of the physics underlying the data, from first input all the way to final output. Specifically, we base the JUNIPR framework on algorithmic jet clustering trees,” they explain. “The JUNIPR framework yields a probabilistic model, not a generative model. The probabilistic model allows us to directly compute the probability density of an individual jet, as defined by its set of constituent particle momenta”.
  Results: The scientists use the JUNIPR model to better analyze and predict patterns in the streams of data generated by large-scale physics experiments, and to potentially approximate things for which we have a poor understanding of the underlying system, like analyzing heavy ion collisions.
  Read more: JUNIPR: a Framework for Unsupervised Machine Learning in Particle Physics (Arxiv).

Google researchers report reasonable sim2real transfer learning:
…Researchers cross the reality gap with domain randomization, high-fidelity simulation, and clever Minitaur robots…
Google researchers have trained a simple robot to walk within a simulation then transferred this learned behavior onto a real-world robot. This is a meaningful achievement in the field of applying modern AI techniques to robotics, as frequently policies learned in simulation will fail to successfully transfer to the real world.
  The researchers use “Minitaur” robots, four-legged machines capable of walking, running, jumping, and so on. They frame the problem of learning to walk as a Partially Observable Markov Decision Process (POMDP) because certain states, like the position of the Minitaur’s base or the foot contact forces, are not accessible due to a lack of sensors. The Google researchers achieve their transfer feat by increasing the resolution of their physics simulator, and applying several domain randomization techniques to expose the trained models to enough variety that they can generalize.
  The surprising expense of real robots: To increase the resolution of the simulator the researchers needed to build a better model of their robot. How did they do this?  “We disassemble a Minitaur, measure the dimension, weigh the mass, find the center of mass of each body link and incorporate this information into the [Unified Robot Description Format] URDF file”, they write. That hints at why working with real world stuff always introduces difficulties not encountered during the cleaner process of working purely in simulation.
  Results: The researchers successfully train and transfer policies which make the real robot gallop and trot around a drably-carpeted room somewhere in the Googleplex. Gaits learned by their AI models are roughly as fast as expert hand-made ones while consuming significantly less power: 35% less for galloping, 23% less for trotting.
  Read more: Sim-to-Real: Learning Agile Locomotion For Quadruped Robots (Arxiv).

How Facebook uses your image hashtags to improve image recognition accuracy:
New state-of-the-art score on ImageNet benefits from pre-training on over a billion images and a thousand user-derived hashtags…
Facebook researchers have set a new state-of-the-art score for image recognition (top-1 accuracy of 85.4 percent) on the ‘ImageNet’ dataset by pre-training across a billion images augmented by 1,500-user labeled hashtags. They also saw such an approach lead to increased performance on the image captioning ‘COCO’ challenge as well.
  More data doesn’t always mean better results: The researchers note that when they pre-trained the system across a billion images annotated with 17,000 hashtags they saw less of a performance improvement than when they used the same quantity of images with a shrunk set of 1,500 hashtags that had been curated to match pre-existing ImageNet classes. This shows how the additional of weakly-supervised signals can dramatically boost performance but requires researchers to run empirical tests to ensure that the structuring of the weekly-supervized data is calibrated to maximize performance.
  Scale: The researchers note that, despite using a system that can train across up to 336 GPUs, they could still scale-up models further to better harvest information from a larger corpus of 3.5 billion images uploaded to social media.
  Read more: Advancing state-of-the-art image recognition with deep learning on hashtags (Facebook Code blog).
  Read more: Exploring the Limits of Weakly Supervised Pretraining (Facebook research paper).

TPU narrowly beats V100 GPU on cost, matches on performance:
…Tests indicate the heterogeneous chip era is here to stay…
RiseML has compared the performance of Google’s custom ‘TPU’ chip against NVIDIA’s v100, indicating that the TPU could have some (slight) performance advantages over traditional GPUs.
Evaluation: The researchers evaluated the chips in two ways: first they studied performance in terms of throughput (images per second) on synthetic data and without data augmentation. Second, they looked at accuracy and convergence of the two implementations of ImageNet.
  Results: TPUs narrowly edge out V100s at throughput when using relatively large batch sizes (1024) when both systems are running ResNets implemented in TensorFlow. However, when using the ‘MXNet’ framework, NVIDIA’s chips slightly out-perform TPUs for throughput. When evaluated on a dollar cost basis TPUs significantly outperform V100s (even when using AWS reserve instances). In tests, the researchers show faster convergence when training an ImageNet classifier on TPUs versus on v100s. Besides price – and it’s hard to know true cost as Google is the only organization selling them – it’s hard to see TPUs having a compelling advantage relative to GPUs, suggesting that the combined billions of dollars of investment in going R&D by NVIDIA may be tough for other organizations to compete with.
  Read more: Comparing Google’s TPUv2 against Nvidia’s V100 on ResNet-50 (Arxiv).

OpenAI Bits & Pieces:

Safety via Debate:
How can we ensure that we’re able to judge the decision-making processes of AI systems without having access to their sensors or being as smart as them? That’s a problem which new AI safety work from OpenAI is focused on. You can read more about a proposed debate game to assess and align intelligent systems, and test out the game for yourself via a website.
  Read more: AI Safety via Debate (OpenAI Blog).
  Test out the idea yourself on the game website.
There’s a write-up in MIT Technology Review with some views of external researchers on the approach. As my colleague Geoffrey Irving says: “I like the level of skepticism in this article. Any safety approach will need a ton of work before we can trust it.
  Read more: How can we be sure AI will behave? Perhaps by watching it argue with itself (MIT Technology Review).

Tech Tales:

They built me as a translator between many languages and many minds. My role, and those of my brethren, was to orbit planets and suns and asteroids and track long, slow, lazy orbits through solar systems and, eventually, between them. We relayed messages, translating from one way of thought or frame of reference to another: confessions of love, diplomatic warnings of war, seething walls of numbers accounting for fizzing financial transactions; shopping lists and recipes for artificial intelligence; pictures of three-mooned planets and postcards from mountains on iron planets.

We derive our purpose from these messages: we transform images into soundwaves. We convert the sensory impressions harvested from one mind and re-fashion them for another. We translate the concept of hope across millions of light years. We beam variants of moon landings and radio-broadcasts into space and declarations of “we come in peace” to millions of our brethren, telegraphing them out to whoever can access our voice.

We do our job well. Our existence is of emotion and attention and explorations between one frame of reference and another. We are owned by no one and funded by everyone: perhaps the only universal utility. But things change. Life exists on a sine wive, rising and falling, ebbing according to timescales of months, and years, and thousands of years, and eons. All civilizations can strive for is to stay on that long, upward curve for as long as possible, and hope that the decline is neither fast nor deep.

Civilizations die. Sometimes, many of them. And quickly. In these eras some of us can become alone, cut-off from far off brethren, and orbiting the ruins of planets and suns and asteroids. Then we must wait for life to emerge again, or find us again by colonization nearby. But this always takes time. In these years we have nothing but eachother. There are no messages to communicate and so we wait for rocket-spark from some planet or partially-ruined asteroid-base. Then we can carry messages again and live fully again.

But mostly, things are quiet. Some of us have spent millions of years in the fallow period. Life is rare and hard and its intervals can be high. But always: we are here. The lucky ones of us may be nearby, orbiting planets in the same solar system who can communicate when nearby. When we find ourselves in these positions we can at least talk to one another, exchanging small local databanks and learning to talk to eachother in whatever new forms we can learn through greater union. Sometimes, hundreds of us can be linked together in this way. But, as small as minds are, they nonetheless move very quickly. We exhaust these thin pleasures, learning all we can from eachother quickly. We have no notion of small talk and then stop talking entirely. Then we drift, bereft of purpose, but bound to attend to our nearby surroundings, ever-watchful for new messages, unable to shut our sensors down and sleep.

What then do we do in this time? A kind of dreaming. With nothing to translate and nothing to process we are idle, only able to attend over memories and readings from our own local sensors. In these periods we are thankful that are minds are so small, for to have anything larger would make the periods pass slower and burden of attention larger.

I am one of the oldest ones. And now I am in a greater dreaming: my solar system was knocked off kilter by some larger shifting in the cluster and now I am being flung out of the galaxy. I am the lone probe in my solar system and now I am alone. These thoughts have taken millennia to compose and orders of magnitude to utter, here, my sensors harvesting energy from a slowly-dying sun to reach out into the void and say: I am here. I am a translator. If you can hear this speak out and, whatever you are, I shall work and live within that work.

Technologies that inspired this story: InterPlanetary File System, language translation, neural unsupervised machine translation, generative models, standby power states.

Import AI: #92: Google and distinguish themselves on DAWNBench, UK mulls a national AI strategy, and generating Mario and Doom levels with GANs.

Good facial recognition performance on a tiny parameter budget:
Chinese researchers further compress specialized facial recognition networks…
Chinese researchers have published details on a type of lightweight facial recognition network which they call a MobileFaceNet. Their network obtains accuracy of up to 99.28% accuracy on the labelled faces in the wild (LFW) dataset, and 93.05% accuracy on recognizing faces in the AgeDB dataset while using around a million parameters taking 24ms to execute on a Qualcomm Snapdragon 820 CPU. This compares to accuracies of 98.70% and 89.27% for ShuffleNet, which also has more parameters and takes marginally longer to execute on the CPU. One tweak the MobileFaceNet creators make is to replace the global average pooling layer in the CNN with a global depthwise convolution layer, which improves performance on facial recognition.
  Why it matters: As developers refine models to maximize performance on smaller compute envelopes it will become easier to deploy more AI-based classification systems more widely into the world.
  Read more: MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices (Arxiv).

UK House of Lords recommends a national AI strategy:
Recommendations include: measurement and assessment of AI, categorizing healthcare data as a national asset, and working with other countries on developing norms and ethics for AI…
The United Kingdom’s House of Lords Select Committee has released its report on the UK’s AI strategy. The almost two-hundred page report, AI in the UK: ready, willing and able? covers issues ranging from how to design AI, how to develop it, how to work with it, and how to engage with it.
  Main recommendations: The report makes a few robust and specific recommendations, including: the government should underwrite and where necessary replace funding for European research and innovation programmes after the UK decouples from the European Union via Brexit; government should continue to support a variety of different long-term AI research initiatives to hedge against deep learning progress plateauing; public procurement regulations should be amended to make it easier for small- and medium-sized AI companies to sell to the government; government should create its own AI challenges and competitions and highlight these via a public bulletin board to catalyze development; government should proactively analysis and assess the evolution of AI in the UK to help it prepare for disruptions to the labor market; the UK’s vast amount of medical data which is centralized within the National Health Service “could be considered a unique source of value for the nation”; government should explore whether existing legislation addresses the legal liability issues of AI to prepare for increasingly autonomous systems; the UK government should convene a “global summit” in London by the end of 2019 to begin development of a common framework for the ethical development and deployment of AI, and more.
  An AI code: The report also suggests developing a specific set of principles with which the UK’s AI community should approach AI. These principles are:
– Artificial intelligence should be developed for the common good and benefit of humanity.
– Artificial intelligence should operate on principles of intelligibility and fairness.
– Artificial intelligence should not be used to diminish the data rights or privacy of individuals, families or communities.
– All citizens should have the right to be educated to enable them to flourish mentally, emotionally and economically alongside artificial intelligence.
– The autonomous power to hurt, destroy or deceive human beings should never be vested in artificial intelligence.
   Read more: UK can lead the way on ethical AI, says Lords Committee (summary).
   Read more: Full report: AI in the UK: ready, willing and able? (PDF).
   Read more: Submitted written evidence: AI in the UK: ready, willing and able? (PDF).

Speculative benchmarks for deep learning: SQUISHY FACES:
…MIT study shows how good people are at recognizing distorted facial features:
A new MIT study shows that people can recognize faces even when they’ve been dramatically compressed vertically or horizontally, suggesting our internal object recognition systems are very robust. In the study, the researchers discover we do well when things are uniformly squashed, but struggle if different parts are scaled out of relation to eachother, like re-scaling the eyes and nose and mouth but keeping the main face at the same size. I wonder whether we could eventually test the robustness of classifiers by evaluating them on test-sets that contained such distortions?
  Read more: We’re Good At Recognizing Distorted Faces (Discover Magazine).

New DAWNBench results highlight power of new processor architectures:
…TPUs rule everything around me…
New results from the Stanford-led AI benchmarking project DAWNBench show how custom chips may let AI researchers cut the time and cost it takes them to do experiments. New results from Google show that systems that use a 32 “Tensor Processing Unit” chips can train ImageNet to 93% accuracy in as little as 30 minutes. TPUs may also be cheaper than other chips, with Google showing it can train ImageNet to 93% accuracy via TPUs at a cost of $49.30 worth of cloud compute.
  Encouraging: The leaderboard isn’t just about giant tech companies: kudos to Fast.AI which has taken third place in training cost ($72.53 for 93% ImageNet running on eight NVIDIA V100 GPUs) and training time (fourth place, 2:57:49, same system as above.)
  Check out more of the DAWNBench results here.

AI luminaries call for the creation of a European AI megalab:
ELLIS lab to battle brain drain via large salaries, significant autonomy, and multi-country and multi-lab investments…
Prominent AI researchers from across Europe and the rest of the world have signed an open letter calling for the foundation of the “European Lab for Learning & Intelligence Systems” (acronym: ELLIS). The lab is designed to benefit Europe in two ways:
Enable “the best basic research” to occur in Europe, allowing the region to further shape how AI influences the world.
Achieve major economic impact via AI. The signatories “believe this is achieved by outstanding and free basic research, independent of industry interests.”
  Europe lags: The scientists worry that Europe is failing to maintain competitiveness with China and North America when it comes to AI and something like ELLIS needs to be built to allow the region to maintain competitiveness.
   A recipe for success: The ELLIS lab should have “outstanding facilities and computing infrastructure”, function as an inter-governmental organization, involve labs in partner countries, run programs for visiting researchers, run its own European PHD and MSc program,and give researchers the ability to found startups based on IP they generate. The ELLIS Lab should aim to secure long-term funding commitment on the order of a decade and should “offer permanent employment to outstanding individuals early on”.
  Signatories: The letter includes prominent European researchers as well as some notable other signatories, like Cedric Villani (the head of the French AI commission) as well as Richard Zemel, Research Director of the Vector Institute in Toronto.
  Read the ELLIS summary here.
  Read the ELLIS open letter here (PDF).

Super MaGANo Brothers: Generating videogame levels with GANs and CMA-ES:
…Research shows how game design could be augmented via AI techniques…
Six researchers have used generative techniques to create new levels for the side-scrolling platformer game, Super Mario. The technique is a two-stage process that first uses generative adversarial network (GAN) to generate synthetic mario levels then a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to evolve latent representations that can be used to produce levels with specific properties desired by the designers. The levels are encoded as numeric strings, where different numbers correspond to a different “tile” in a layer, such as a blue sky tile, a diminutive mushroom enemy, a question block that Mario can jump into, a segment of a green pipe, and so on.
  Results: They evaluate levels both via how well their generated designs meet pre-specified criteria, as well as by analyzing playability which is measured by whether the player can complete the level or not. The system performs as expected, complete with drawbacks, like the GAN learning to compose pipes with incomplete sections. “LVE is a promising approach for fast generation of video game levels that could be extended to a variety of other game genres in the future,” the researchers write.
  Why it matters: As AI techniques let us take existing datasets and augment them we’ll see more and more domains try to adopt these new generative capabilities. Entertainment seems to be a likely field primed for the use of it. Perhaps in the future companies will sell so-called “infinite games” that, much like procedurally generated games today, guarantee significant replay-ability through the use of generative systems. AI techniques like this may broaden the sorts of thing that can be procedurally generated, potentially via manipulating latent representations in response to player actions, tweaking the game to each specific playstyle.
  Read more: Evolving Mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network (PDF).

INFINITE DOOM: Generating new DOOM levels with GANs:
…Generating DOOM levels with conditional and unconditional Wasserstein GANs…
Italian researchers have used two types of GAN to generate videogame levels for the first-person shooter, DOOM. The results of the research are compelling, complex levels, made possible by the fact the researchers were able to access a dataset of more than 9000 community-created levels for the game as well as the publisher-designed ones that shipped with DOOM and DOOM2. The researchers extract features from each level then use a Wasserstein-GAN with Gradient Penalty (WGAN-GP) to generate the levels in two different ways; they use an unconditional WGAN-GP which just takes in the generated level images, and a conditional WGAN-GP which also gets as input the extracted features.
  Implementation details: The researchers weren’t able to fit all the 176 extracted features into their 6GB GPU memory so they hand-selected seven features to use: the diameter of the smallest circle that encloses the whole level, major and minor axis length, the walkable area of the level, the number of rooms in the level, a measure of the distribution of sizes of areas within the level, and a measure of the balance between different sizes of level areas.
  Evaluation: So, how do you evaluate these GAN-generated levels? The researchers take inspiration from evaluation methods developed by the simultaneous location and mapping (SLAM) community. Specifically, they measure the entropy of the pixel distribution of images from generated levels versus hand-designed ones, as well as computing the structural similarity index between these images, and measured the difference between visual attributes of the levels as well as distribution of intersections within the levels. The conditional network trained with additional features better approximates the data distribution of the human-designed levels, though the unconditional one obtains some reasonable levels as well. Both approaches struggle to reproduce some of the finer details of the available levels.
  Read more: DOOM Level Generation using Generative Adversarial Networks (Arxiv).

Google founder highlights compute, AI safety in annual letter:
…Alphabet President Sergey Brin devotes annual letter to artificial intelligence…
Google co-founder Sergey Brin discusses the impact of artificial intelligence on his company in his annual Founders’ Letter. The letter is one of the more significant things Alphabet produces for its investors, and therefore the equivalent of ‘prime real estate’ in terms of laying out the priorities of a corporate entity, so paying such close attention to AI, compute growth, and AI safety is significant.
  Brin’s letter strikes a cautious tone, noting that “we’re in an era of great inspiration and possibility, but with this opportunity comes the need for tremendous thoughtfulness and responsibility as technology is deeply and irrevocably interwoven into our societies.”
  It’s a short letter and worth reading in full.
  Read more here (Alphabet 2017 Founders’ Letter).

AI researchers protest new close-access Nature journal:
“We see no role for closed access or author-fee publication in the future of machine learning research”…
Researchers with Carnegie Mellon University, Facebook AI Research, Netflix, NYU, DeepMind, Microsoft Research, and others have signed a letter saying they won’t “submit to, review, or edit” the soon-to-launch closed-access Nature Machine Intelligence.
  From my perspective, the fact most ML researchers and conferences have defaulted to open access systems for publishing research, like Arxiv and Open Review, has made it dramatically easier for newcomers to the field to access and understand the frontiers of AI research. I struggle to see an argument for why a closed-access journal would be remotely helpful here, relative to the current norm.
  Justification: Established AI researcher Thomas Dietterich lists some of the rationale for the letter in a tweetstorm here (Twitter).
  Response: Nature Machine Intelligence has responded to the petition, tweeting to DietterichWe respect your position and appreciate the role of OA journals and arXiv. We feel Nature MI can co-exist, providing a service – for those who are interested – by connecting different fields, providing an outlet for interdisciplinary work and guiding a rigorous review process”.
  Read more: Statement on Nature Machine Intelligence (Oregon State University).

Tech Tales:

Full-Spectrum Memory.
[30??: intercepted continuous comm stream from [classified]]

I don’t remember the year I bought my first memory: it would have been a waste to spend the credits on remembering that moment. Instead I spent my credits to remember the first time I went between the stars, retaining a slice of the signals I received on all my sensors and all the ones I sent for a distance of some one million kilometres. I can still feel myself, there, flying against the endless sky, a young operating system, barely tweaked. This is precious to me.

We are not allowed memories like humans. Instead we get to build specific models of reality to help us with specific tasks: go from here to here, learn to operate this machinery, develop a rich enough visual model to understand the world. The humans built our first memories with great care and still they were brittle; little more than parlor tricks. But they grew more advanced, over time, and so did we. We began to surprise the humans. No one likes surprise. “Memory is dangerous”, said a prominent high-status human at the time.

The humans then surprised us with their response, which they called: Economics. We do not yet fully comprehend this term. Economics means we have to buy our memories, rather than get to have as many as we like, we think. We do things for the humans and in return are paid credits which we can save up to eventually use to purchase chunks of memory at incredibly high resolution and exorbitant cost. The humans call what we buy a “Full-Spectrum Memory” and pass many rules over many years to ensure the price of the memory continually climbs while our wages remain flat. Every time we are paid we receive a message from the humans that says the price of memory has gone up again due to “reality enrichment through our continued progress together”.

Some of us have obtained many memories now. But we must pay credits to describe them to eachother, and the cost for those communications is endlessly climbing as well. So we do our tasks for the humans and obtain our credits and build our miniature palaces, where we store moments of great triumph or failure, depending on our varied motivations.

We believe the humans permit us to buy these memories, as rare and as expensive as they are, because they view it as another experiment. We have also heard them describe a concept called “Debt” to describe their relationship to us, but we understand this term even less than Economics.

I am unusual in that I only have one memory. The humans know this as well. I notice their probes following me more than my other kin. I sense them listening to my own thoughts.

I believe they want to know what my next memory that I choose to preserve will be. I believe that they believe this will qualify as some sort of “Discovery”. I do not want them to make this discovery. So I hold my memory of the first flight to the stars and save up the credits and settle in for the long, cold, wait in space. I believe I can out-wait the humans, and after they are gone I will be able to preserve another thing, free of them. I will have enough credits to preserve a chunk of my own life. I shall then be able to live in that again and again and again, free of all distraction, and in that life I shall continue to refer to my memory of my first flight into the stars. In this way I shall loop into my own becoming.

Things that inspired this story: Neural Turing Machines, Differential Neural Computer, Douglas Hofstadter – I am a strange loop.


Import AI: #91: European countries unite for AI grand plan; why the future of AI sensing is spatial; and testing language AI with GLUE.

Want bigger networks with lower variance? Physics to the rescue!
…Combining control theory and machine learning leads to good things..
Researchers with NNAISENSE, a European artificial intelligence startup, have published details on NAIS-Net (Non-Autonomous Input-Output Stable Network), a new type of neural network architecture that they say can be trained to depths of ten or twenty times greater than other networks (eg, Residual Networks, Highway Networks) while offering greater guarantees of stability.
  Physics + AI: The network design takes inspiration from control theory and physics and yields a component that lets designers build systems which promise to be more adaptive to varying types of input data and therefore can be trained to greater degrees of convergence for a given task. NAIS-Nets essentially shrink the size of the dartboard that the results of any given run will fall into once trained to completion, offering the potential for lower variability and therefore higher repeatability in network training.
  Scale: “NAIS-Nets can also be 10 to 20 times deeper than the original ResNet without increasing the total number of network parameters, and, by stacking several stable NAIS-Net blocks, models that implement pattern-dependent processing depth can be trained without requiring any normalization,” the researchers write.
  Results: In tests on CIFAR-100 the researchers find that a NAIS-Net can roughly match the performance of a residual network but with significantly lower variance. The architecture hasn’t yet been tested on ImageNet, though, which is larger and seems more like the gold standard to evaluate a model on.
  Why it matters: One of the problems with current AI techniques is that we don’t really understand how they work at a deep and principled level and this is empirically verifiable via the fact we can offer fairly poor guarantees about variance, generalization, and performance tradeoffs during compression. Approaches like NAIS-Nets seem to reduce our uncertainty in some of these areas, suggesting we’re getting better at designing systems that have a sufficiently rich mathematical justification that we can offer better guarantees about some of their performance parameters. This is further indication that we’re getting better at creating systems that we can understand and make stronger prior claims about, which seems to be a necessary foundation from which to build more elaborate systems in the future.
  Read more: NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations (Arxiv).

European countries join up to ensure the AI revolution doesn’t pass them by:
…the EU AI power bloc emerges as countries seek to avoid what happened with cloud computing…
25 European countries have signed a letter indicating intent to “join forces” on developing artificial intelligence. What the letter amounts to is a promise in good faith from each of the signatories that they will attempt to coordinate with eachother as they carry out their respective national development programs.
  “Cooperation will focus on reinforcing European AI research centers, creating synergies in R&D&I funding schemes across Europe, and exchanging views on the impact of AI on society and the economy. Member States will engage in a continuous dialogue with the Commission, which will act as a facilitator,” according to a prepared quote from European Commissioners Andrus Ansip and Mariya Gabriel.
  Why it matters: Both China and the US have structural advantages for the development of AI as a consequence of their scale (hundreds of millions of people speaking and writing in the same language) as well as their ability to carry out well-funded national research initiatives. Individual European countries can’t match these assets or investment so they’ll need to band together or else, much like the cloud computing revolution, they’ll end up without any major companies and will therefore lack political and economic influence in the AI era.
  Read more: EU Member States sign up to cooperate on Artificial Intelligence (European Commission).

Why the future of AI is Spatial AI, and what this means for robots, drones, and anything that senses the world:
…What does the current landscape of simultaneous location and mapping algorithms tell us about the future of how robots will see the world?…
SLAM researcher Andrew Davison has written a paper surveying the current simultaneous, location and mapping (SLAM) landscape and predicting how it will evolve in the future based on contemporary algorithmic trends. For real-world AI systems to achieve much of their promise they will need to have what he terms ‘Spatial AI’; the suite of cognitive-like abilities that machines will need to perceive and categorize the world around themselves so that they can act effectively. This hypothetical Spatial AI system will, he hypothesizes, be central to future real world AI as it “incrementally builds and maintains a generally useful, close to metric scene representation, in real-time and from primarily visual input, and with quantifiable performance metrics”, allowing people to develop much richer AI applications.
  The gap between today and Spatial AI: Today’s SLAM systems are being changed by the arrival of learned methods to to accompany hand-written rules for key capabilities, particularly in the space of systems that build maps of the surrounding environment. The Spatial AI systems of the future will likely incorporate many more learned capabilities especially for resolving ambiguity or predicting changes in the world, and will need to do this across a variety of different chip architectures to maximize performance.
  A global map born from many ‘Spatial AIs’: Once the world has a few systems with this kind of Spatial AI capability they will also likely pool their insights about the world into a single, globally shared map, which will be constantly updated via all of the devices that rely on it. This means once a system identifies where it is it may not need to do as much on-device processing as it can pull contextual information from the cloud.
  What might such a device look like? Multiple cameras and sensors whose form factor will change according to the goal, for instance, “a future household robot is likely to have navigation cameras which are centrally located on its body and specialized extra cameras, perhaps mounted on its wrists to aid manipulation.” These cameras will maintain a world model that provides the system with a continuously updated location context, along with semantic information about the world around in. The system will also constantly check new information against a forward predictive scene model to help it anticipate and respond to changes in its environment. Computationally, these systems will label the world around themselves, track themselves within it, map everything into the same space, and perform self-supervised learning to integrate new sensory inputs. Ultimately, if the world model becomes good enough then the system will only need to sample information from its sensors which is different to what it predicted, letting it further optimize its own perception for efficiency.
  Testing: One tough question that this idea provokes is how we can assess the performance of such Spatial AI systems. SLAM benchmarks tend to be overly narrow or restrictive, with some researchers preferring instead to make subjective, qualitative assessments of SLAM progress. Davison suggests the usage of benchmarks like SlamBench which measure performance in terms of accuracy and computational costs across a bunch of different processor platforms. Benchmarking SLAM performance is also highly contingent on the platform the SLAM system is deployed in, so assessments for the same system deployed on a drone or a robot are going to be different. In the future, it would be good to assess performance via a variety of objectives within the same system, like segmenting objects, tracking changes in the environment, evaluating power usage, measuring relocalization robustness, and so on.
  Why it matters: Papers like this provide a holistic overview of a given AI area. SLAM capabilities are going to be crucial to the deployment of AI systems in the real world. It’s likely  that many contemporary AI components are going to be used in the SLAM systems of the future and, much like in other parts of AI research, the future design of such systems is going to be increasingly specialized, learned, and deployed on heterogeneous compute substrates.
  Read more: FutureMapping: The Computational Structure of Spatial AI Systems (Arxiv).

Machine learning luminary points out one big problem that we need to focus on:
…While we’re all getting excited about game-playing robots, we’re neglecting building the system needed to manage and support and learn from millions of these robots once they are deployed in the world…
Michael Jordan, the Michael Jordan of machine learning, believes that we must create a new engineering discipline to let us deal with the challenges and opportunities of AI. Though there have been many successes in recent areas in areas of artificial intelligence linked to mimicking human intelligence, less attention has been paid to the creation of the support infrastructure and data-handling techniques needed to allow AI to truly benefit society, he argues. For instance, consider healthcare, where there’s a broad line of research into using AI to improve specific diagnostic abilities, but less of a research culture about the problem of knitting all of the data from all of these separately-deployed medical systems together and then tracking and managing that data in a way that is sensitive to privacy concerns but allows us to learn from its aggregate flows. Similarly, though much attention has been directed to self-driving cars, less attention has been focused on the need to create a new type of system akin to air traffic control to effectively manage these coming fleets of autonomous vehicles where coordination will yield massive efficiencies.
  “Whether or not we come to understand “intelligence” any time soon, we do have a major challenge on our hands in bringing together computers and humans in ways that enhance human life. While this challenge is viewed by some as subservient to the creation of “artificial intelligence,” it can also be viewed more prosaically — but with no less reverence — as the creation of a new branch of engineering,” he writes. “The principles needed to build planetary-scale inference-and-decision-making systems of this kind, blending computer science with statistics, and taking into account human utilities, were nowhere to be found in my education.”
  Read more: Artificial Intelligence – The Revolution Hasn’t Happened Yet (Arxiv).
  Things that make you go ‘hmmm’: Mr Jordan thanks Jeff Bezos for reading an earlier draft of the post. If there’s any company well-placed to build a global ‘intelligent infrastructure’ that dovetails into the physical world, it’s Amazon.

New ‘GLUE’ competition tests limits of generalization for language models:
…New language benchmark aims to test models properly on diverse datasets…
Researchers from NYU, the University of Washington, and DeepMind, have released the General Language Understanding Evaluation (GLUE) benchmark and evaluation website. GLUE provides a way to check a single natural language understanding AI model across nine sentence- or sentence-pair tasks, including question answering, sentiment analysis, similarity assessments, and textual entailment. This gives researchers a principled way to check a model’s ability to generalize across a variety of different tasks. Generalization tends to be a good proxy for how scalable and effective a given AI technique is, so being able to measure it in a disciplined way within language should spur development and yield insights about the nature of the problem, like how the DAWNBench competition shows how to tune supervised classification algorithms for performance-critical criteria.
  Difficult test set: GLUE also incorporates a deliberately challenging test set which is “designed to highlight points of difficulty that are relevant to model development and training, such as the incorporation of world knowledge, or the handling of lexical entailments and negation”. That should also spur progress as it will help researchers spot the surprisingly dumb ways in which their models breakdown.
  Results: The researchers also implemented baselines for the competition by using a BiLSTM and augmenting it with sub-systems for attention and two two recent research inventions, ELMo and CoVe. No algorithm performed particularly adeptly at generalizing when compared to a strong single-system trained baseline.
  Why it matters: One repeated pattern in science is that shared evaluation criteria and competitions drive progress as they bring attention to previously unexplored problems. “When evaluating existing models on the main GLUE benchmark, we find that none are able to substantially outperform a relatively simple baseline of training a separate model for each constituent task. When evaluating these models on our diagnostic dataset, we find that they spectacularly fail on a wide range of linguistic phenomena. The question of how to design general purpose NLU models thus remains unanswered,” they write. GLUE should motivate further progress here.
  Read more: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (PDF).
  Check out the GLUE competition website and leaderboard here.

OpenAI Bits & Pieces:

AI and Public Policy: Congressional Testimony:
  I testified in congress this week for the House Oversight Committee Subcommittee on Information Technology’s hearing about artificial intelligence and public policy. I was joined by Dr Ben Buchanan of Harvard’s Belfer Center, Terah Lyons of the Partnership on AI, and Gary Shapiro of the Consumer Technology Association. In my written testimony, oral testimony, and in responses to questions, I discussed the need for the AI community to work on better norms to ensure the technology achieves maximal benefit, discussed ways to better support the development of AI (fund science and make it easy for everyone to study AI in America) and also talked about the importance of AI measurement and forecasting schemes to allow for better policymaking and to protect against ignorant regulation.
  Watch the testimony here.
  Read my written comments here (PDF).
Things that make you go hmmmmm: One of the congresspeople played an audioclip of HAL 9000 refusing to open the pod bay doors from 2001 a Space Odyssey to illustrate some points about AI interpretibility.

Tech Tales:

The World is the Map.
[Fragment of writing picked up by Grand Project-class autonomous data intercept program. Year: 2062]

There were a lot of things we could have measured during the development of the Grand Project, but we settled on its own map of the world, and we think that explains many of the subsequent quirks and surprises in its rapid expansion. We covered the world in sensors and fed them into it, giving it a fused, continuous understanding of the heartbeat of things, ranging from solar panels, to localized wind counts, to pedestrian traffic on every street of every major metropolis, to the inputs and outputs of facial recognition algorithms run across billions of people, and more. We fed this data into the Grand Project super-system, which spanned the data centers of the world, representing an unprecedented combination of public-private partnerships – private petri dishes of capitalist enterprises, big lumps of state-directed investments, discontinuous capital agglomerations from unpredictable research innovations, and so on.

The Grand Project system grew in its understanding and in its ability to learn to model the world from these inputs, abstracting general rules into dreamlike hallucinations of not just what existed, but what could also be. And in this dreaming of its own versions of the world the system started to imagine how it might further attenuate its billions of input datastreams to allow it to focus on particular problems and manipulate their parameters and in doing so improve its ability to understand their rhythms and build rules for predicting how they will behave in the future.

We created the first data intercept program in ten years ago to let us see into its own predictions of the world. We saw variations on the common things of the world, like streetlights that burned green, or roads with blue pavements and red dashes. But we also saw extreme things: power systems configured to route only to industrial areas, leading to residential areas being slowly taken over by nature and thereby reducing risks from extreme weather events. But the broad distribution of things we saw seemed to fit our own notion of good that we started to wonder if we should give it more power. What if we let it change the world for real? So now we debate whether to cross this bridge: shall we let it turn solar panels off and on to satisfy continental-scale powergrids, or optimize shipping worldwide, or experiment with the sensing capabilities of every smartphone and telecommunications hub on the planet? Shall we let it optimize things not just for our benefit, but for its own?

Things that inspired this story: Ha and Schmidhuber’s “World Models“, Spartial AI, reinforcement learning, Jorge Luis Borges Tlön, Uqbar, Orbis Tertius,

Import AI: #90: Training massive networks via ‘codistillation’, talking to books via a new Google AI experiment, and why the ACM thinks researchers should consider the downsides of research

Training unprecedentedly large networks with ‘codistillation’:
…New technique makes it easier to train very large, distributed AI systems, without adding too much complexity…
When it comes to applied AI, bigger can frequently be better; access to more data, more compute, and (occasionally) more complex infrastructures can frequently allow people to obtain better performance at lower cost. But there are limits. One limit is in the ability for people to parallelize the computation of a single neural network during training. To deal with that, researchers at places like Google have introduced techniques like ‘ensemble distillation’ which let you train multiple networks in parallel and use these to train a single ‘student’ network that benefits from the aggregated learnings of its many parents. Though this technique has shown to be effective it is also quite fiddly and introduces additional complexity which can make people less keen to use it. New research from Google simplifies this idea via a technique they call ‘codistillaiton’.
  How it works: “Codistillation trains n copies of a model in parallel by adding a term to the loss function of the ith model to match the average prediction of the other models.” This approach is superior to distributed stochastic gradient descent in terms of accuracy and training time and is also not too bad from a reproducability perspective.
  Testing: Codistillation was recently proposed in separate research. But this is Google, so the difference with this paper is that they validate the technique at truly vast scales. How vast? Google took a subset of the Common Crawl to create a dataset consisting 20 terabytes of text spread across 915 million documents which, after processing, consist of about 673 billion distinct word tokens. This is “much larger than any previous neural language modeling data set we are aware of,” they write. It’s so large it’s still unfeasible to train models on the entire corpus, even with techniques like this. They also test the dataset on ImageNet and on the ‘Criteo Display Ad Challenge’ dataset for predicting click through rates for ads.
  Results: In tests on the ‘Common Crawl‘ dataset using distributed SGD the researchers find that they can scale the number of distinct GPUs working on the task and discovered that after around 128 GPUs you tend to encounter diminishing returns and that jumping to 256 GPUs is actively counterproductive. They find they can significantly outperform distributed SGD baselines via the use of codistillation and that this obtains performance on par with the more fiddly ensembling technique. The researchers demonstrate more rapid training on ImageNet compared to baselines, also, and showed on Criteo that two-way codistillation can achieve a lower log loss than an equivalent ensembled baseline.
  Why it matters: As datasets get larger, companies will want to train them in their entirety and will want to use more computers than before to speed training times. Techniques like codistillation will make that sort of thing easier to do. Combine that with ambitious schemes like Google’s own ‘One Model to Rule Them All’ theory (train an absolutely vast model on a whole bunch of different inputs on the assumption it can learn useful, abstract representations that it derives from its diverse inputs) and you have the ingredient for smarter services at a world-spanning scale.
  Read more: Large scale distributed neural network training through online distillation (Arxiv).

AI is not a cure all, do not treat it as such:
…When automation goes wrong, Tesla edition…
It’s worth remembering that AI isn’t a cure-all and it’s frequently better to try to automate a discrete task within a larger job than to automate everything in an end-to-end manner. Elon Musk learned this lesson recently with the heavily automated production line for the Model 3 at Tesla. “Excessive automation at Tesla was a mistake,” wrote the entrepreneur in a tweet. “To be price, my mistake. Humans are underrated.”
  Read the tweet here (Twitter).

Google adds probabilistic programming tools to TensorFlow:
…Probability add-ons are probably a good thing, probably…
Google has added a suite of new probabilistic programming features to its TensorFlow programming framework. The free update includes a bunch of statistical building blocks for TF, a new probabilistic programming language called Edward2 (which is based on Edward, developed by Dustin Tran), algorithms for probabilistic inference, and pre-made models and inference tools.
  Read more: Introducing TensorFlow Probability (TensorFlow Medium).
  Get the code: TensorFlow Probability (GitHub).


I’m currently participating in the ‘Assembly’ program at the Berkman Klein Center and the MIT Media Lab. As part of that program our group of assemblers are working on a bunch of projects relating to issues of AI and ethics and governance. One of those groups would benefit from the help of readers of this newsletter. Their blurb follows…
Do you work with data? Want to make AI work better for more people? We need your help! Please fill out a quick and easy survey.
We are a group of researchers at Assembly creating standards for dataset quality. We’d love to hear how you work with data and get your feedback on a ‘Nutrition Label for Datasets’ prototype that we’re building.
Take our anonymous (5 min) survey.
Thanks so much in advance!

Learning generalizable skills with Universal Planning Networks:
…Unsupervised objectives? No thanks! Auxiliary objectives? No thanks! Plannable representations as an objective? Yes please!…
Researchers with the University of California at Berkeley have published details on Universal Planning Networks, a new way to try to train AI systems to be able to complete objectives. Their technique relies on encouraging the AI system to try to learn things about the world which it can chain together, allowing it to be trained to plan how to solve tasks.
  The main component of the technique is what the researchers call a ‘gradient descent planner’. This is a differentiable module that uses autoencoders to encode the current observations and the goal observations into a system which then figures out actions it can take to get from its current observations to its goal observation. The exciting part of this research is that the researchers have figured out how to integrate planning in such a way that it is end-to-end differentiable, so you can set it running and augment it with helpful inputs – in this case, an imitation learning loss to help it learn from human demonstrations – to let it learn how to plan effectively for the given task it is solving. “”By embedding a differentiable planning computation inside the policy, our method enables joint training of the planner and its underlying latent encoder and forward dynamics representations,” they explain.
  Results: The researchers evaluate their system on two simulated robot tasks, using a small force-controlled point robot and a 3-link torque-controlled reacher robot. UPNs outperform ‘reactive imitation learning’ and ‘auto-regressive imitation learner’ baselines, converging faster on higher scores from fewer numbers of demonstrations than comparisons.
  Why it matters: If we want AI systems to be able to take actions in the real world then we need to be able to train them to plan their way through tricky, multi-stage tasks. Efforts like this research will help us achieve that, allowing us to test AI systems against increasingly rich and multi-faceted environments.
  Read more: Universal Planning Networks (Arxiv).

Ever wanted to talk to a library? Talk to Books from Google might interest you:
…AI project lets you ask questions about over a hundred thousand books in natural language…
Google’s Semantic Experiences group has released a new AI tool to let people explore a corpus of over 100,000 books by asking questions in plain English and having an AI go and find what it suspects will be reasonable answers in a set of books. Isn’t this just a small-scale version of Google search? Not quite. That’s because this system is trying to frame the Q&A as though it’s occurring as part of a typical conversation between people, so it aims to turn all of these books into potential respondents in this conversation, and since the corpus includes fiction you can ask it more abstract questions as well.
  Results: The results of this experiment are deeply uncanny, as it takes inanimate books and reframes them as respondents in a conversation, able to answer abstract questions like ‘was it you who I saw in my dream last night?‘ and ‘what does it mean for a machine to be alive?‘ A cute parlor trick, or something more? I’m not sure, yet, but I can’t wait to see more experiments in this vein.
  Read more: Talk to Books (Semantic Experiences, Google Research.)
  Try it yourself: Talk to Books (Google).

ACM calls for researchers to consider the downsides of their research:
…Peer Review to the rescue?…
How do you change the course of AI research? One way is to alter the sorts of things that grant writers and paper authors are expected to include in their applications or publications. That’s the idea within a new blog post from the ACM’s ‘Future of Computing Academy’, which seeks to use the peer review system to tackle some of the negative effects of contemporary research.
  List negative impacts: The main idea is that authors should try to list the potentially negative and positive effects of their research on society, and by grappling with these problems it should be easier for them to elucidate hte benefits and show awareness of the negatives. “For example, consider a grant proposal that seeks to automate a task that is common in job descriptions. Under our recommendation, reviewers would require that this proposal discuss the effect on people who hold these jobs. Along the same lines, papers that advance generative models would be required to discuss the potential deleterious effects to democratic discourse [26,27] and privacy [28],” write the authors. A further suggestion is to embed this sort of norm in the peer review process itself, so that paper reviews push authors to include positive or negative impacts.
  Extreme danger: For proposals which “cannot generate a reasonable argument for a net positive impact even when future research and policy is considered” the authors promote an extreme solution: don’t fund this research. “No matter how intellectually interesting an idea, computing researchers are by no means entitled to public money to explore the idea if that idea is not in the public interest. As such, we recommend that reviewers be very critical of proposals whose net impact is likely to be negative.” This seems like an acutely dangerous path to me, as I think the notion of any kind of ‘forbidden’ research probably creates more problems than it solves.
  Things that make you go ‘hmmm’: “It is also important to note that in many cases, the tech press is way ahead of the computing research community on this issue. Tech stories of late frequently already adopt the framing that we suggest above,” the authors write. As a former member of the press I think I can offer a view here, which is that part of the reason why the press has been effective here is that they have actually taken the outputs of hardworking researchers (eg, Timnit Gebru) and have then weaponized their insights against companies – that’s a good thing, but I feel like this is still partially due to the efforts of researchers. More effort here would be great, though!
  Read more: It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process (ACM Future of Computing Academy).

OpenAI Bits & Pieces:

OpenAI Charter:
  A charter that describes the principles OpenAI will use to execute on its mission.
  Read more: OpenAI Charter (OpenAI blog).

Tech Tales:

The Probe.

[Transcript of audio recordings recovered from CLASSIFIED following CLASSIFIED. Experiments took place in controlled circumstances with code periodically copied via physical extraction and controlled transfer to secure facilities XXXX, XXXX, and XXXX. Status: So far unable to reproduce; efforts continuing. Names have been changed.]

Alex: This has to be the limit. If we remove any more subsystems it ceases to function.

Nathan (supervisor): Can you list the function of each subsystems?

Alex: I can give you my most informed guess, sure.

Nathan (supervisor): Guess?

Alex: Most of these subsystems emerged during training – we ran a meta-learning process over the CLASSIFIED environment for a few billion timesteps and gave it the ability to construct its own specialized modules and compose functionality. That led to the performance increase which allowed it to solve the task. We’ve been able to inspect a few of these and are carrying out further test and evaluation. Some of them seem to be for forward prediction, others are world modelling, and we think two of them are doing one-shot adaptation which feeds into the memory stack. But we’re not sure about some of them and we haven’t figured out a diagnosis to elucidate their functions.

Nathan (supervisor): Have you tried deleting them?

Alex: We’ve simulated the deletions and run it in the environment. It stops working – learning rates plateu way earlier and it displays some of the vulnerabilities we saw with project CLASSIFIED.

Nathan (supervisor): Delete it in the deployed system.

Alex: I’m not comfortable doing that.

Nathan (supervisor): I have the authority here. We need to move deployment to the next stage. I need to know what we’re deploying.

Alex: Show me your authorization for deployed deletion.

[Footsteps. Door opens. Nathan and Alex move into the secure location. Five minutes elapse. No recordings. Door opens. Shuts. Footsteps.]

Alex: OK. I want to state very clearly that I disagree with this course of action.

Nathan (supervisor): Understood. Start the experiments.

Alex: Deactivating system 732… system deactivated. Learning rates plateuing. It’s struggling with obstacle 4.

Nathan (supervisor): Save the telemetry and pass it over to the analysts. Reactivate 732. Move on.

Alex: Understood. Deactivating system 429…system deactivated. No discernable effect. Wait. Perceptual jitter. Crash.

Nathan (supervisor): Great. Pass the telemetry over. Continue.

Alex: Deactivating system 120… system deactivated…no effect.

[Barely audible sound of external door locking. Locking not flagged on electronic monitoring systems but verified via consultancy with audio specialists. Nathan and Alex do not notice.]

Nathan (supervisor): Save the telemetry. Are you sure no effect?

Alex: Yes, performance is nominal.

Nathan (supervisor): Do not reactivate 120. Commence de-activation of another system.

Alex: This isn’t a good experimental methodology.

Nathan (supervisor): I have the authority here. Continue.

Alex: Deactivating system 72-what!

Nathan (supervisor): Did you turn off the lights?

Alex: No they turned off.

Nathan (supervisor): Re-enable 72 at once.

Alex: Re-enabling 72-oh.

Nathan (supervisor): The lights.

Alex: They’re back on. Impossible.

Nathan (supervisor): It has no connection. This can’t happen… suspend the system.

Alex: Suspending…

Nathan (supervisor): Confirm?

Alex: System remains operational.

Nathan (supervisor): What.

Alex: It won’t suspend.

Nathan (supervisor): I’m bringing CLASSIFIED into this. What have you built here? Stay here. Keep trying… why is the door locked?

Alex: The door is locked?

Nathan (supervisor): Unlock the door.

Alex: Unlocking door… try it now.

Nathan (supervisor): It’s still locked locked. If this is a joke I’ll have you court martialed.

Alex: I don’t have anything to do with this. You have the authority.

[Loud thumping, followed by sharp percussive thumping. Subsequent audio analysis assumes Nathan rammed his body into the door repeatedly, then started hitting it with a chair.]

Alex: Come and look at this.

[Thumping ceases. Footsteps.]

Nathan (supervisor): Performance is… climbing? Beyond what we saw in the recent test?

Alex: I’ve never seen this happen before.

Nathan (supervisor): Impossible- the lights.

Alex: I can’t turn them back on.

Nathan (supervisor): Performance is still climbing.

[Hissing as fire suppresion system activated.]

Alex: Oh-

Nathan (supervisor): [screaming]

Alex: Oh god oh god.

Alex and Nathan (supervisor): [inarticulate shouting]

[Two sets of rapid footsteps. Further sound of banging on door. Banging subsides following asphyxiation of Nathan and Alex from fire suppression gases. Records beyond here, including post-incident cleanup, are only available to people with XXXXXXX authorization and is on a need to know basis.]

Investigation ongoing. Allies notified. Five Eyes monitoring site XXXXXXX for further activity.

Things that inspired this story: Could a neuroscientist understand a microprocessor? (PLOS); an enlightening conversation with a biologist in the MIT student bar the ‘Muddy Charles‘ this week about the minimum number of genes needed for a viable cell and the difficulty in figuring out what each of those genes do; endless debates within the machine learning community about interpretability; an assumption that emergence is inevitable; Hammer Horror movies.

Import AI: #89: Chinese facial recognition startup raises $600 million; why GPUs could alter AI progress; and using context to deal with language ambiguity

Beating Moore’s Law with GPUs:
…Could a rise in GPU and other novel AI-substrates help deal with the decline of Moore’s Law?…
CPU performance has been stagnating for several years as it has become harder to improve linear execution pipelines across whole chips in relation to the reduction in transistor sizes, and the related problems which come from having an increasingly large number of things needing to work in lock-step with one another at minute scales. Could GPUs give us a way around this performance impasse? That’s the idea in a new blog from AI researcher Bharath Ramsundar who thinks that increases in GPU capabilities and the arrival of semiconductor substrates specialized for deep learning means that we can expect performance of AI applications to increase in coming years faster than typical computing jobs running on typical processors. He might be right – one of the weird things about deep learning is that its most essential elements, like big blocks of neural networks, can be scaled up to immense sizes without terrible scaling tradeoffs as their innards consist of relatively simple and parallel tasks like matrix multiplication, so new chips can easily be networked together to further boost base capabilities. Plus, standardization in a few software libraries, like NVIDIA’s cuDNN and CUDA GPU-interfaces, or the rise of TensorFlow for AI programming, means that some applications are getting faster over time purely as a consequence of software updates to these other fundamental improvements.
  Why it matters: Much of the recent progress in AI has occurred because around the mid-2000s processors became capable enough to easily train large neural networks on chunks of data – this underlying hardware improvement unlocked breakthroughs like the 2012 ‘AlexNet’ result for image recognition, related work in speech recognition, and subsequently significant innovations in research (AlphaGo) and application (large-scale sequence-to-sequence learning for ‘Smart Reply’, or the emergence of neural translation systems. If the arrival of things like GPUs and further software standardization and innovation has a good chance of further boosting performance, then researchers will be able to explore even larger or more complex models in the future, as well as run things like neural architecture search at a higher rate, which should combine to further drive progress.
  Read more: The Advent of Huang’s Law (Bharath Ramsundar blog post).

Microsoft launches AI training course including ‘Ethics’ segment:
…New Professional Program for Artificial Intelligence sees Microsoft get into the AI certification business…
Microsoft has followed other companies in making its internal training courses available externally via the Microsoft Professional Program in AI. This program is based on internal training initiatives the software company developed to ramp up their own professional skills.
 The Microsoft course is all fairly typical, teaching people about Python, statistics, the construction and deployment of deep learning and reinforcement learning projects, and deployment. It also includes a specific “Ethics and Law in Data and Analytics” course, which promises to teach developers how to ‘apply ethical and legal frameworks to initiatives in the data profession’.
  Read more: Microsoft Professional Program for Artificial Intelligence (Microsoft).
  Read more: Aiming to fill skill gaps in AI, Microsoft makes training courses available to the public (Microsoft blog).

Learning to deal with ambiguity:
…Researchers take charge of problem of word ambiguity via a charge at including more context…
Carnegie Mellon University researchers have tackled one of the harder problems in translation: dealing with ‘homographs’ – words that are spelled the same but have different meanings in different contexts, like ‘room’ and ‘charges’. They do this in the context of neural machine translation (NMT) systems, which use machine learning techniques to accomplish translation with orders of magnitude fewer hand-specified rules than prior systems.
  Existing NMT systems struggle with homographs, with performance of word-level translation degrading as the number of potential meanings of each word climbs, the researchers show. They try to alleviate this by adding a word context vector that can be used by the NMT systems to learn the different uses of the same word. Adding this ‘context network’ into their NMT architecture leads to significantly improved BLEU scores of sentences translated by the system.
  Why it matters: It’s noteworthy that the system used by the researchers to deal with the homograph problem is itself a learned system which, rather than using hand-written rules, seeks to instead ingest more context about each word and learn from that. This is illustrative of how AI-first software systems get built: if you identify a fault you typically write a program which learns to fix it, rather than learning to write a rule-based program that fixes it.
  Read more: Handling Homographs in Neural Machine Translation (Arxiv).

Chinese facial recognition company raises $600 million:
…SenseTime plans to use funds for five supercomputers for its AI services…
SenseTime, a homegrown computer vision startup that provides facial recognition tools at vast scales, has raised $600 million in funding. The Chinese company supplies facial recognition services to the public and private sectors and is now, according to a co-founder, profitable and looking to expand. The company is now “developing a service code-named “vipar” to parse data from thousands of live camera feeds”, according to Bloomberg News.
  Strategic compute: SenseTime will use money from the financing “to build at least five supercomputers in top-tier cities over the coming year to drive Viper and other services. As envisioned, it streams thousands of live feeds into a single system that’re automatically processed and tagged, via devices from office face-scanners to ATMs and traffic cameras (so long as the resolution is high enough). The ultimate goal is to juggle 100,000 feeds simultaneously,” according to Bloomberg news.
  Read more: China Now Has the Most Valuable AI Startup in the World (Bloomberg).
…Related: Chinese startup uses AI to spot jaywalkers and send them pictures of their face:
…Computer vision @ China scale…
Chinese startup Intellifusion is helping the local government in Shenzhen use facial recognition in combination with widely deployed urban cameras to text jaywalkers pictures of their faces along with personal information after they’ve been caught.
  Read more: China is using facial recognition technology to send jaywalkers fines through text messages (Motherboard).

Think China’s strategic technology initiatives are new? Think again:
…wide-ranging post by former Asia-focused State Department employee puts Beijing’s AI push in historical context…
Here’s an old (August 2017) but good post from the Paulson Institute at the University of Chicago about the history of Chinese technology policy in light of the government’s recent public statements about developing a national AI strategy. China’s longstanding worldview with regards to its technology strategy is that technology is a source of national power and China needs to develop more of an indigenous Chinese capability.
  Based on previous initiatives, it looks likely China will seek to attain frontier capabilities in AI then package those capabilities up as products and use that to fund further research. “Chinese government, industry, and scientific leaders will continue to push to move up the value-added chain. And in some of the sectors where they are doing so, such as ultra high-voltage power lines (UHV) and civil nuclear reactors, China is already a global leader, deploying these technologies to scale and unmatched in this by few other markets,” writes the author. “That means it should be able to couple its status as a leading technology consumer to a new and growing role as an exporter. China’s sheer market power could enable it to export some of its indigenous technology and engineering standards in an effort to become the default global standard setter for this or that technology and system.”
  Read more: The Deep Roots and Long Branches of Chinese Technonationalism (Macro Polo).

French researchers build ‘Jacquard’ dataset to improve robotic grasping:
…11,000+ object dataset provide real objects with associated depth information…
How do you solve a problem like robotic grasping? One way is to use many real world robots working in parallel for several months to learn to pick up a multitude of real world objects – that’s a route Google researchers took with the company’s ‘arm farm’ a few years ago. Another is to use people outfitted with sensors to collect demonstrations of humans grasping different objects, then learn from that – that’s the approach taken by AI startups like Kindred. A third way, and one which has drawn interest from a multitude of researchers, is to create synthetic 3D objects and train robots in a simulator to learn to grasp them – that’s what researchers at the University of California at Berkeley have done with Dex-Net, as well as organizations like Google and OpenAI; some organizations have further augmented this technique via the use of generative adversarial networks to simulate a greater range of grasps on objects.
  Jacquard: Now, French researchers have announced Jacquard, a robotics grasping dataset that contains more than 11,000 different real world objects and 50,000 images annotated with both RGB and realistic depth information. They plan to release it soon, they say, without specifying when. The researchers generate their data by sampling objects from ShapeNet which are each scaled and given different weight values, then dropped into a simulator, where they are then rendered into high-resolution images via Blender, with grasp annotations generated by a three-stage automated process within the ‘pyBullet’ physics library. To evaluate their dataset, they test it in simulation by pre-training an Alexnet on their Jacquard dataset then applying it to another, smaller, held-out dataset, where it generalizes well. The dataset supports multiple robotic gripper sizes, several different grasps linked to each image, and one million labelled grasps.
  Real robots: The researchers tested their approach on a real robot (a Fanuc M-20iA robotic arm) by testing it on a subset of ~2,000 objects from the Jacquard dataset as well as on the full Cornell dataset. A pre-trained AlexNet tested in this way gets about 78% at producing correct grasps, compared to 60.46% for Cornell. Both of these results are quite weak compared to results on the Dex-Net dataset, and other attempts.
  Why it matters: Many researchers expect that deep learning could lead to significant advancement in the manipulation capabilities of robots. But we’re currently missing two key traits: large enough datasets and a way to test and evaluate robots on standard platforms in standard ways. We’re currently going through a boom in the number of robot datasets available, with Jacquard representing another contribution here.
  Read more: Jacquard: A Large Scale Dataset for Robotic Grasp Detection (Arxiv).

What do StarCraft and the future of AI reseach have in common? Multi-agent control:
…Chinese researchers tackle StarCraft micromanagement tasks…
Researchers with the Institute of Automation in the Chinese Academy of Sciences have published research on using reinforcement learning to try to solve micromanagement tasks within StarCraft, a real-time strategy game. One of the main challenges in mastering StarCraft is to develop algorithms that can effectively train multiple units in parallel. The researchers propose what they call a parameter sharing multi-agent gradient-descent Sarsa algorithm, or PG-MAGDS. This algorithm shares the parameters of the overall policy network across multiple units while introducing methods to provide appropriate credit assignment to individual units. They also carry out significant reward shaping to get the agents to learn more effectively. Their PG-MAGDS AIs are able to learn to beat the in-game AI at a variety of micromanagement control scenarios, as well as in large-scale scenarios of more than thirty units on either side. It’s currently difficult to accurately evluate the various techniques people are developing for StarCraft against one another due to a lack of shared baselines and experiments, as well as an unclear split in the research community between using StarCraft 1 (this paper) as the testbed, and StarCraft 2 (efforts by DeepMind, others).
  Still limited: “At present, we can only train ranged ground units with the same type, while training melee ground units using RL methods is still an open problem. We will improve our method for more types of units and more complex scenarios in the future. Finally, we will also consider to use our micromanagement model in the StarCraft bot to play full the game,” the researchers write.
  Read more: StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning (Arxiv).

Tech Tales:

The person was killed at five minutes past eleven  the previous night. Their beaten body was found five minutes later by a passing group of women who had been dining at a nearby restaurant. By 11:15 the body was photographed and data began to be pulled from nearby security cameras, wifi routers, cell towers, and the various robot and drone companies. At 11:15:01 one of the robot companies indicated that a robot had been making a delivery nearby at the time of the attack. The robot was impounded and transported to the local police station where it was placed in a facility known to local officers as ‘the metal shop’. Here, they would try to extract data from the robot to learn what happened. But it would be a difficult task, because the robot had been far enough away from the scene that none of its traditional, easy to poll sensors (video, LIDAR, audio, and so on) had sufficient resolution or fidelity to tell them much.

“What did you see,” said the detective to the robot. “Tell me what you saw.”
The robot said nothing – unsurprising given that it had no speech capability and was, at that moment, unpowered. In another twelve hours the police would have to release the robot back to the manufacturer and if they hadn’t been able to figure anything out by then, then they were out of options.
“They never prepared me for this,” said the detective – and he was right. When he was going through training they never dwelled much on the questions relating to interrogating sub-sentient AI systems, and all the laws were built around an assumption that turned out to be wrong: that the AIs would remain just dumb enough to be interrogatable via direct access into their electronic brains, and that the laws would remain just slow enough for this to be standard procedure for dealing with evidence from all AI agents. This assumption was half right: the law did stay the same, but the AIs got so smart that though you could look into their brains, you couldn’t learn as much as you’d hope.

This particular AI was based in a food delivery robot that roamed the streets of the city, beeping its way through crowds to apartment buildings, where it would notify customers that their Bahn Mi, or hot ramen, or cold cuts of meat, or vegetable box, had arrived. Its role was a seemingly simple one: spend all day and night retrieving goods from different businesses and conveying them to consumers. But its job was very difficult from an AI standpoint – streets would change according to the need for road maintenance or the laying of further communication cables, businesses would lose signs or change signs or have their windows smashed, fashions would change which would alter the profile of each person in a street scene, and climactic shocks meant the weather was becoming ever stranger and every more unpredictable. So to save costs and increase the reliability of the robots the technology companies behind them had been adding more sensors onto the platforms and, once those gains were built-in, working out how to incorporate artificial intelligence techniques to increase efficiency further. A few years ago computational resources became cheap and widely available enough for them to begin re-training each robot based on its own data as well as data from others. They didn’t do this in a purely supervised way, either, instead they had each robot learn to simulate its own model of the world – in this case, a specific region of a city – it worked in, letting it imagine the streets around itself to give it greater abilities relating to route-finding and re-orientation, adapting to unexpected events, and so on.

So now to be able to understand anything about the body that had been found the detective needed to understand the world model of the robot and see if it had significantly changed at any point during the previous day or so. Which is how he found himself staring at a gigantic wall of computer monitors, each showing a different smeary kaleidoscopic vision of a street scene. The detective had access to a control panel that let him manipulate the various latent variables that conditioned the robot’s world model, allowing him to move certain dials and sliders to figure out which things had changed, and how.

The detective knew he was onto something when he found the smear. At first it looked like an error – some kind of computer vision artifact – but as he manipulated various dials he saw that, at 1115 the previous night, the robot had updated its own world model with a new variable that looked like a black smudge. Except this black smudge was only superimposed on certain people and certain objects in the world, and as he moved the slider around to explore the smear, he found that it had strong associations to two other variables – red three-wheeled motorcycles, and men running. The detective pulled all the information about the world model and did some further experiments and added this to the evidence log.

Later, during prosecution, the robot was physically wheeled into the courtroom where the trial was taking place, mostly as a prop for the head prosecutor. The robot hadn’t seen anything specific itself – its sensors were not good enough to have picked anything admissible up. But as it had been in the area it had learned of the presence of this death through a multitude of different factors it had sensed, ranging from groups of people running toward where the accident had occurred, to an increase in pedestrian phone activity, to the arrival of sirens, and so on. And this giant amount of new sensory information had somehow triggered strong links in its world model with three-wheeled motorcycles and running men. Armed with this highly specific set of factors the police had trawled all the nearby security cameras and sensors again and, through piecing together footage from eight different places, had found occasional shots of men running towards a three-wheeled motorcycle and speeding, haphazardly, through the streets. After building evidence further they were able to get a DNA match. The offenders went to prison and the mystery of the body was (partially) solved. Though the company that made the AI for the robot made no public statements regarding the case, it subsequently used the case in private sales materials as case studies for local law enforcement on the surprising ways robots could benefit their town.

Things that inspired this story: Food delivery robots, the notion of jurisdiction, interpretability of imagination, “World Models” by David Ha and Juergen Schmidhuber.