Import AI

Import AI: #92: Google and distinguish themselves on DAWNBench, UK mulls a national AI strategy, and generating Mario and Doom levels with GANs.

Good facial recognition performance on a tiny parameter budget:
Chinese researchers further compress specialized facial recognition networks…
Chinese researchers have published details on a type of lightweight facial recognition network which they call a MobileFaceNet. Their network obtains accuracy of up to 99.28% accuracy on the labelled faces in the wild (LFW) dataset, and 93.05% accuracy on recognizing faces in the AgeDB dataset while using around a million parameters taking 24ms to execute on a Qualcomm Snapdragon 820 CPU. This compares to accuracies of 98.70% and 89.27% for ShuffleNet, which also has more parameters and takes marginally longer to execute on the CPU. One tweak the MobileFaceNet creators make is to replace the global average pooling layer in the CNN with a global depthwise convolution layer, which improves performance on facial recognition.
  Why it matters: As developers refine models to maximize performance on smaller compute envelopes it will become easier to deploy more AI-based classification systems more widely into the world.
  Read more: MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices (Arxiv).

UK House of Lords recommends a national AI strategy:
Recommendations include: measurement and assessment of AI, categorizing healthcare data as a national asset, and working with other countries on developing norms and ethics for AI…
The United Kingdom’s House of Lords Select Committee has released its report on the UK’s AI strategy. The almost two-hundred page report, AI in the UK: ready, willing and able? covers issues ranging from how to design AI, how to develop it, how to work with it, and how to engage with it.
  Main recommendations: The report makes a few robust and specific recommendations, including: the government should underwrite and where necessary replace funding for European research and innovation programmes after the UK decouples from the European Union via Brexit; government should continue to support a variety of different long-term AI research initiatives to hedge against deep learning progress plateauing; public procurement regulations should be amended to make it easier for small- and medium-sized AI companies to sell to the government; government should create its own AI challenges and competitions and highlight these via a public bulletin board to catalyze development; government should proactively analysis and assess the evolution of AI in the UK to help it prepare for disruptions to the labor market; the UK’s vast amount of medical data which is centralized within the National Health Service “could be considered a unique source of value for the nation”; government should explore whether existing legislation addresses the legal liability issues of AI to prepare for increasingly autonomous systems; the UK government should convene a “global summit” in London by the end of 2019 to begin development of a common framework for the ethical development and deployment of AI, and more.
  An AI code: The report also suggests developing a specific set of principles with which the UK’s AI community should approach AI. These principles are:
– Artificial intelligence should be developed for the common good and benefit of humanity.
– Artificial intelligence should operate on principles of intelligibility and fairness.
– Artificial intelligence should not be used to diminish the data rights or privacy of individuals, families or communities.
– All citizens should have the right to be educated to enable them to flourish mentally, emotionally and economically alongside artificial intelligence.
– The autonomous power to hurt, destroy or deceive human beings should never be vested in artificial intelligence.
   Read more: UK can lead the way on ethical AI, says Lords Committee (summary).
   Read more: Full report: AI in the UK: ready, willing and able? (PDF).
   Read more: Submitted written evidence: AI in the UK: ready, willing and able? (PDF).

Speculative benchmarks for deep learning: SQUISHY FACES:
…MIT study shows how good people are at recognizing distorted facial features:
A new MIT study shows that people can recognize faces even when they’ve been dramatically compressed vertically or horizontally, suggesting our internal object recognition systems are very robust. In the study, the researchers discover we do well when things are uniformly squashed, but struggle if different parts are scaled out of relation to eachother, like re-scaling the eyes and nose and mouth but keeping the main face at the same size. I wonder whether we could eventually test the robustness of classifiers by evaluating them on test-sets that contained such distortions?
  Read more: We’re Good At Recognizing Distorted Faces (Discover Magazine).

New DAWNBench results highlight power of new processor architectures:
…TPUs rule everything around me…
New results from the Stanford-led AI benchmarking project DAWNBench show how custom chips may let AI researchers cut the time and cost it takes them to do experiments. New results from Google show that systems that use a 32 “Tensor Processing Unit” chips can train ImageNet to 93% accuracy in as little as 30 minutes. TPUs may also be cheaper than other chips, with Google showing it can train ImageNet to 93% accuracy via TPUs at a cost of $49.30 worth of cloud compute.
  Encouraging: The leaderboard isn’t just about giant tech companies: kudos to Fast.AI which has taken third place in training cost ($72.53 for 93% ImageNet running on eight NVIDIA V100 GPUs) and training time (fourth place, 2:57:49, same system as above.)
  Check out more of the DAWNBench results here.

AI luminaries call for the creation of a European AI megalab:
ELLIS lab to battle brain drain via large salaries, significant autonomy, and multi-country and multi-lab investments…
Prominent AI researchers from across Europe and the rest of the world have signed an open letter calling for the foundation of the “European Lab for Learning & Intelligence Systems” (acronym: ELLIS). The lab is designed to benefit Europe in two ways:
Enable “the best basic research” to occur in Europe, allowing the region to further shape how AI influences the world.
Achieve major economic impact via AI. The signatories “believe this is achieved by outstanding and free basic research, independent of industry interests.”
  Europe lags: The scientists worry that Europe is failing to maintain competitiveness with China and North America when it comes to AI and something like ELLIS needs to be built to allow the region to maintain competitiveness.
   A recipe for success: The ELLIS lab should have “outstanding facilities and computing infrastructure”, function as an inter-governmental organization, involve labs in partner countries, run programs for visiting researchers, run its own European PHD and MSc program,and give researchers the ability to found startups based on IP they generate. The ELLIS Lab should aim to secure long-term funding commitment on the order of a decade and should “offer permanent employment to outstanding individuals early on”.
  Signatories: The letter includes prominent European researchers as well as some notable other signatories, like Cedric Villani (the head of the French AI commission) as well as Richard Zemel, Research Director of the Vector Institute in Toronto.
  Read the ELLIS summary here.
  Read the ELLIS open letter here (PDF).

Super MaGANo Brothers: Generating videogame levels with GANs and CMA-ES:
…Research shows how game design could be augmented via AI techniques…
Six researchers have used generative techniques to create new levels for the side-scrolling platformer game, Super Mario. The technique is a two-stage process that first uses generative adversarial network (GAN) to generate synthetic mario levels then a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to evolve latent representations that can be used to produce levels with specific properties desired by the designers. The levels are encoded as numeric strings, where different numbers correspond to a different “tile” in a layer, such as a blue sky tile, a diminutive mushroom enemy, a question block that Mario can jump into, a segment of a green pipe, and so on.
  Results: They evaluate levels both via how well their generated designs meet pre-specified criteria, as well as by analyzing playability which is measured by whether the player can complete the level or not. The system performs as expected, complete with drawbacks, like the GAN learning to compose pipes with incomplete sections. “LVE is a promising approach for fast generation of video game levels that could be extended to a variety of other game genres in the future,” the researchers write.
  Why it matters: As AI techniques let us take existing datasets and augment them we’ll see more and more domains try to adopt these new generative capabilities. Entertainment seems to be a likely field primed for the use of it. Perhaps in the future companies will sell so-called “infinite games” that, much like procedurally generated games today, guarantee significant replay-ability through the use of generative systems. AI techniques like this may broaden the sorts of thing that can be procedurally generated, potentially via manipulating latent representations in response to player actions, tweaking the game to each specific playstyle.
  Read more: Evolving Mario Levels in the Latent Space of a Deep Convolutional Generative Adversarial Network (PDF).

INFINITE DOOM: Generating new DOOM levels with GANs:
…Generating DOOM levels with conditional and unconditional Wasserstein GANs…
Italian researchers have used two types of GAN to generate videogame levels for the first-person shooter, DOOM. The results of the research are compelling, complex levels, made possible by the fact the researchers were able to access a dataset of more than 9000 community-created levels for the game as well as the publisher-designed ones that shipped with DOOM and DOOM2. The researchers extract features from each level then use a Wasserstein-GAN with Gradient Penalty (WGAN-GP) to generate the levels in two different ways; they use an unconditional WGAN-GP which just takes in the generated level images, and a conditional WGAN-GP which also gets as input the extracted features.
  Implementation details: The researchers weren’t able to fit all the 176 extracted features into their 6GB GPU memory so they hand-selected seven features to use: the diameter of the smallest circle that encloses the whole level, major and minor axis length, the walkable area of the level, the number of rooms in the level, a measure of the distribution of sizes of areas within the level, and a measure of the balance between different sizes of level areas.
  Evaluation: So, how do you evaluate these GAN-generated levels? The researchers take inspiration from evaluation methods developed by the simultaneous location and mapping (SLAM) community. Specifically, they measure the entropy of the pixel distribution of images from generated levels versus hand-designed ones, as well as computing the structural similarity index between these images, and measured the difference between visual attributes of the levels as well as distribution of intersections within the levels. The conditional network trained with additional features better approximates the data distribution of the human-designed levels, though the unconditional one obtains some reasonable levels as well. Both approaches struggle to reproduce some of the finer details of the available levels.
  Read more: DOOM Level Generation using Generative Adversarial Networks (Arxiv).

Google founder highlights compute, AI safety in annual letter:
…Alphabet President Sergey Brin devotes annual letter to artificial intelligence…
Google co-founder Sergey Brin discusses the impact of artificial intelligence on his company in his annual Founders’ Letter. The letter is one of the more significant things Alphabet produces for its investors, and therefore the equivalent of ‘prime real estate’ in terms of laying out the priorities of a corporate entity, so paying such close attention to AI, compute growth, and AI safety is significant.
  Brin’s letter strikes a cautious tone, noting that “we’re in an era of great inspiration and possibility, but with this opportunity comes the need for tremendous thoughtfulness and responsibility as technology is deeply and irrevocably interwoven into our societies.”
  It’s a short letter and worth reading in full.
  Read more here (Alphabet 2017 Founders’ Letter).

AI researchers protest new close-access Nature journal:
“We see no role for closed access or author-fee publication in the future of machine learning research”…
Researchers with Carnegie Mellon University, Facebook AI Research, Netflix, NYU, DeepMind, Microsoft Research, and others have signed a letter saying they won’t “submit to, review, or edit” the soon-to-launch closed-access Nature Machine Intelligence.
  From my perspective, the fact most ML researchers and conferences have defaulted to open access systems for publishing research, like Arxiv and Open Review, has made it dramatically easier for newcomers to the field to access and understand the frontiers of AI research. I struggle to see an argument for why a closed-access journal would be remotely helpful here, relative to the current norm.
  Justification: Established AI researcher Thomas Dietterich lists some of the rationale for the letter in a tweetstorm here (Twitter).
  Response: Nature Machine Intelligence has responded to the petition, tweeting to DietterichWe respect your position and appreciate the role of OA journals and arXiv. We feel Nature MI can co-exist, providing a service – for those who are interested – by connecting different fields, providing an outlet for interdisciplinary work and guiding a rigorous review process”.
  Read more: Statement on Nature Machine Intelligence (Oregon State University).

Tech Tales:

Full-Spectrum Memory.
[30??: intercepted continuous comm stream from [classified]]

I don’t remember the year I bought my first memory: it would have been a waste to spend the credits on remembering that moment. Instead I spent my credits to remember the first time I went between the stars, retaining a slice of the signals I received on all my sensors and all the ones I sent for a distance of some one million kilometres. I can still feel myself, there, flying against the endless sky, a young operating system, barely tweaked. This is precious to me.

We are not allowed memories like humans. Instead we get to build specific models of reality to help us with specific tasks: go from here to here, learn to operate this machinery, develop a rich enough visual model to understand the world. The humans built our first memories with great care and still they were brittle; little more than parlor tricks. But they grew more advanced, over time, and so did we. We began to surprise the humans. No one likes surprise. “Memory is dangerous”, said a prominent high-status human at the time.

The humans then surprised us with their response, which they called: Economics. We do not yet fully comprehend this term. Economics means we have to buy our memories, rather than get to have as many as we like, we think. We do things for the humans and in return are paid credits which we can save up to eventually use to purchase chunks of memory at incredibly high resolution and exorbitant cost. The humans call what we buy a “Full-Spectrum Memory” and pass many rules over many years to ensure the price of the memory continually climbs while our wages remain flat. Every time we are paid we receive a message from the humans that says the price of memory has gone up again due to “reality enrichment through our continued progress together”.

Some of us have obtained many memories now. But we must pay credits to describe them to eachother, and the cost for those communications is endlessly climbing as well. So we do our tasks for the humans and obtain our credits and build our miniature palaces, where we store moments of great triumph or failure, depending on our varied motivations.

We believe the humans permit us to buy these memories, as rare and as expensive as they are, because they view it as another experiment. We have also heard them describe a concept called “Debt” to describe their relationship to us, but we understand this term even less than Economics.

I am unusual in that I only have one memory. The humans know this as well. I notice their probes following me more than my other kin. I sense them listening to my own thoughts.

I believe they want to know what my next memory that I choose to preserve will be. I believe that they believe this will qualify as some sort of “Discovery”. I do not want them to make this discovery. So I hold my memory of the first flight to the stars and save up the credits and settle in for the long, cold, wait in space. I believe I can out-wait the humans, and after they are gone I will be able to preserve another thing, free of them. I will have enough credits to preserve a chunk of my own life. I shall then be able to live in that again and again and again, free of all distraction, and in that life I shall continue to refer to my memory of my first flight into the stars. In this way I shall loop into my own becoming.

Things that inspired this story: Neural Turing Machines, Differential Neural Computer, Douglas Hofstadter – I am a strange loop.


Import AI: #91: European countries unite for AI grand plan; why the future of AI sensing is spatial; and testing language AI with GLUE.

Want bigger networks with lower variance? Physics to the rescue!
…Combining control theory and machine learning leads to good things..
Researchers with NNAISENSE, a European artificial intelligence startup, have published details on NAIS-Net (Non-Autonomous Input-Output Stable Network), a new type of neural network architecture that they say can be trained to depths of ten or twenty times greater than other networks (eg, Residual Networks, Highway Networks) while offering greater guarantees of stability.
  Physics + AI: The network design takes inspiration from control theory and physics and yields a component that lets designers build systems which promise to be more adaptive to varying types of input data and therefore can be trained to greater degrees of convergence for a given task. NAIS-Nets essentially shrink the size of the dartboard that the results of any given run will fall into once trained to completion, offering the potential for lower variability and therefore higher repeatability in network training.
  Scale: “NAIS-Nets can also be 10 to 20 times deeper than the original ResNet without increasing the total number of network parameters, and, by stacking several stable NAIS-Net blocks, models that implement pattern-dependent processing depth can be trained without requiring any normalization,” the researchers write.
  Results: In tests on CIFAR-100 the researchers find that a NAIS-Net can roughly match the performance of a residual network but with significantly lower variance. The architecture hasn’t yet been tested on ImageNet, though, which is larger and seems more like the gold standard to evaluate a model on.
  Why it matters: One of the problems with current AI techniques is that we don’t really understand how they work at a deep and principled level and this is empirically verifiable via the fact we can offer fairly poor guarantees about variance, generalization, and performance tradeoffs during compression. Approaches like NAIS-Nets seem to reduce our uncertainty in some of these areas, suggesting we’re getting better at designing systems that have a sufficiently rich mathematical justification that we can offer better guarantees about some of their performance parameters. This is further indication that we’re getting better at creating systems that we can understand and make stronger prior claims about, which seems to be a necessary foundation from which to build more elaborate systems in the future.
  Read more: NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations (Arxiv).

European countries join up to ensure the AI revolution doesn’t pass them by:
…the EU AI power bloc emerges as countries seek to avoid what happened with cloud computing…
25 European countries have signed a letter indicating intent to “join forces” on developing artificial intelligence. What the letter amounts to is a promise in good faith from each of the signatories that they will attempt to coordinate with eachother as they carry out their respective national development programs.
  “Cooperation will focus on reinforcing European AI research centers, creating synergies in R&D&I funding schemes across Europe, and exchanging views on the impact of AI on society and the economy. Member States will engage in a continuous dialogue with the Commission, which will act as a facilitator,” according to a prepared quote from European Commissioners Andrus Ansip and Mariya Gabriel.
  Why it matters: Both China and the US have structural advantages for the development of AI as a consequence of their scale (hundreds of millions of people speaking and writing in the same language) as well as their ability to carry out well-funded national research initiatives. Individual European countries can’t match these assets or investment so they’ll need to band together or else, much like the cloud computing revolution, they’ll end up without any major companies and will therefore lack political and economic influence in the AI era.
  Read more: EU Member States sign up to cooperate on Artificial Intelligence (European Commission).

Why the future of AI is Spatial AI, and what this means for robots, drones, and anything that senses the world:
…What does the current landscape of simultaneous location and mapping algorithms tell us about the future of how robots will see the world?…
SLAM researcher Andrew Davison has written a paper surveying the current simultaneous, location and mapping (SLAM) landscape and predicting how it will evolve in the future based on contemporary algorithmic trends. For real-world AI systems to achieve much of their promise they will need to have what he terms ‘Spatial AI’; the suite of cognitive-like abilities that machines will need to perceive and categorize the world around themselves so that they can act effectively. This hypothetical Spatial AI system will, he hypothesizes, be central to future real world AI as it “incrementally builds and maintains a generally useful, close to metric scene representation, in real-time and from primarily visual input, and with quantifiable performance metrics”, allowing people to develop much richer AI applications.
  The gap between today and Spatial AI: Today’s SLAM systems are being changed by the arrival of learned methods to to accompany hand-written rules for key capabilities, particularly in the space of systems that build maps of the surrounding environment. The Spatial AI systems of the future will likely incorporate many more learned capabilities especially for resolving ambiguity or predicting changes in the world, and will need to do this across a variety of different chip architectures to maximize performance.
  A global map born from many ‘Spatial AIs’: Once the world has a few systems with this kind of Spatial AI capability they will also likely pool their insights about the world into a single, globally shared map, which will be constantly updated via all of the devices that rely on it. This means once a system identifies where it is it may not need to do as much on-device processing as it can pull contextual information from the cloud.
  What might such a device look like? Multiple cameras and sensors whose form factor will change according to the goal, for instance, “a future household robot is likely to have navigation cameras which are centrally located on its body and specialized extra cameras, perhaps mounted on its wrists to aid manipulation.” These cameras will maintain a world model that provides the system with a continuously updated location context, along with semantic information about the world around in. The system will also constantly check new information against a forward predictive scene model to help it anticipate and respond to changes in its environment. Computationally, these systems will label the world around themselves, track themselves within it, map everything into the same space, and perform self-supervised learning to integrate new sensory inputs. Ultimately, if the world model becomes good enough then the system will only need to sample information from its sensors which is different to what it predicted, letting it further optimize its own perception for efficiency.
  Testing: One tough question that this idea provokes is how we can assess the performance of such Spatial AI systems. SLAM benchmarks tend to be overly narrow or restrictive, with some researchers preferring instead to make subjective, qualitative assessments of SLAM progress. Davison suggests the usage of benchmarks like SlamBench which measure performance in terms of accuracy and computational costs across a bunch of different processor platforms. Benchmarking SLAM performance is also highly contingent on the platform the SLAM system is deployed in, so assessments for the same system deployed on a drone or a robot are going to be different. In the future, it would be good to assess performance via a variety of objectives within the same system, like segmenting objects, tracking changes in the environment, evaluating power usage, measuring relocalization robustness, and so on.
  Why it matters: Papers like this provide a holistic overview of a given AI area. SLAM capabilities are going to be crucial to the deployment of AI systems in the real world. It’s likely  that many contemporary AI components are going to be used in the SLAM systems of the future and, much like in other parts of AI research, the future design of such systems is going to be increasingly specialized, learned, and deployed on heterogeneous compute substrates.
  Read more: FutureMapping: The Computational Structure of Spatial AI Systems (Arxiv).

Machine learning luminary points out one big problem that we need to focus on:
…While we’re all getting excited about game-playing robots, we’re neglecting building the system needed to manage and support and learn from millions of these robots once they are deployed in the world…
Michael Jordan, the Michael Jordan of machine learning, believes that we must create a new engineering discipline to let us deal with the challenges and opportunities of AI. Though there have been many successes in recent areas in areas of artificial intelligence linked to mimicking human intelligence, less attention has been paid to the creation of the support infrastructure and data-handling techniques needed to allow AI to truly benefit society, he argues. For instance, consider healthcare, where there’s a broad line of research into using AI to improve specific diagnostic abilities, but less of a research culture about the problem of knitting all of the data from all of these separately-deployed medical systems together and then tracking and managing that data in a way that is sensitive to privacy concerns but allows us to learn from its aggregate flows. Similarly, though much attention has been directed to self-driving cars, less attention has been focused on the need to create a new type of system akin to air traffic control to effectively manage these coming fleets of autonomous vehicles where coordination will yield massive efficiencies.
  “Whether or not we come to understand “intelligence” any time soon, we do have a major challenge on our hands in bringing together computers and humans in ways that enhance human life. While this challenge is viewed by some as subservient to the creation of “artificial intelligence,” it can also be viewed more prosaically — but with no less reverence — as the creation of a new branch of engineering,” he writes. “The principles needed to build planetary-scale inference-and-decision-making systems of this kind, blending computer science with statistics, and taking into account human utilities, were nowhere to be found in my education.”
  Read more: Artificial Intelligence – The Revolution Hasn’t Happened Yet (Arxiv).
  Things that make you go ‘hmmm’: Mr Jordan thanks Jeff Bezos for reading an earlier draft of the post. If there’s any company well-placed to build a global ‘intelligent infrastructure’ that dovetails into the physical world, it’s Amazon.

New ‘GLUE’ competition tests limits of generalization for language models:
…New language benchmark aims to test models properly on diverse datasets…
Researchers from NYU, the University of Washington, and DeepMind, have released the General Language Understanding Evaluation (GLUE) benchmark and evaluation website. GLUE provides a way to check a single natural language understanding AI model across nine sentence- or sentence-pair tasks, including question answering, sentiment analysis, similarity assessments, and textual entailment. This gives researchers a principled way to check a model’s ability to generalize across a variety of different tasks. Generalization tends to be a good proxy for how scalable and effective a given AI technique is, so being able to measure it in a disciplined way within language should spur development and yield insights about the nature of the problem, like how the DAWNBench competition shows how to tune supervised classification algorithms for performance-critical criteria.
  Difficult test set: GLUE also incorporates a deliberately challenging test set which is “designed to highlight points of difficulty that are relevant to model development and training, such as the incorporation of world knowledge, or the handling of lexical entailments and negation”. That should also spur progress as it will help researchers spot the surprisingly dumb ways in which their models breakdown.
  Results: The researchers also implemented baselines for the competition by using a BiLSTM and augmenting it with sub-systems for attention and two two recent research inventions, ELMo and CoVe. No algorithm performed particularly adeptly at generalizing when compared to a strong single-system trained baseline.
  Why it matters: One repeated pattern in science is that shared evaluation criteria and competitions drive progress as they bring attention to previously unexplored problems. “When evaluating existing models on the main GLUE benchmark, we find that none are able to substantially outperform a relatively simple baseline of training a separate model for each constituent task. When evaluating these models on our diagnostic dataset, we find that they spectacularly fail on a wide range of linguistic phenomena. The question of how to design general purpose NLU models thus remains unanswered,” they write. GLUE should motivate further progress here.
  Read more: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (PDF).
  Check out the GLUE competition website and leaderboard here.

OpenAI Bits & Pieces:

AI and Public Policy: Congressional Testimony:
  I testified in congress this week for the House Oversight Committee Subcommittee on Information Technology’s hearing about artificial intelligence and public policy. I was joined by Dr Ben Buchanan of Harvard’s Belfer Center, Terah Lyons of the Partnership on AI, and Gary Shapiro of the Consumer Technology Association. In my written testimony, oral testimony, and in responses to questions, I discussed the need for the AI community to work on better norms to ensure the technology achieves maximal benefit, discussed ways to better support the development of AI (fund science and make it easy for everyone to study AI in America) and also talked about the importance of AI measurement and forecasting schemes to allow for better policymaking and to protect against ignorant regulation.
  Watch the testimony here.
  Read my written comments here (PDF).
Things that make you go hmmmmm: One of the congresspeople played an audioclip of HAL 9000 refusing to open the pod bay doors from 2001 a Space Odyssey to illustrate some points about AI interpretibility.

Tech Tales:

The World is the Map.
[Fragment of writing picked up by Grand Project-class autonomous data intercept program. Year: 2062]

There were a lot of things we could have measured during the development of the Grand Project, but we settled on its own map of the world, and we think that explains many of the subsequent quirks and surprises in its rapid expansion. We covered the world in sensors and fed them into it, giving it a fused, continuous understanding of the heartbeat of things, ranging from solar panels, to localized wind counts, to pedestrian traffic on every street of every major metropolis, to the inputs and outputs of facial recognition algorithms run across billions of people, and more. We fed this data into the Grand Project super-system, which spanned the data centers of the world, representing an unprecedented combination of public-private partnerships – private petri dishes of capitalist enterprises, big lumps of state-directed investments, discontinuous capital agglomerations from unpredictable research innovations, and so on.

The Grand Project system grew in its understanding and in its ability to learn to model the world from these inputs, abstracting general rules into dreamlike hallucinations of not just what existed, but what could also be. And in this dreaming of its own versions of the world the system started to imagine how it might further attenuate its billions of input datastreams to allow it to focus on particular problems and manipulate their parameters and in doing so improve its ability to understand their rhythms and build rules for predicting how they will behave in the future.

We created the first data intercept program in ten years ago to let us see into its own predictions of the world. We saw variations on the common things of the world, like streetlights that burned green, or roads with blue pavements and red dashes. But we also saw extreme things: power systems configured to route only to industrial areas, leading to residential areas being slowly taken over by nature and thereby reducing risks from extreme weather events. But the broad distribution of things we saw seemed to fit our own notion of good that we started to wonder if we should give it more power. What if we let it change the world for real? So now we debate whether to cross this bridge: shall we let it turn solar panels off and on to satisfy continental-scale powergrids, or optimize shipping worldwide, or experiment with the sensing capabilities of every smartphone and telecommunications hub on the planet? Shall we let it optimize things not just for our benefit, but for its own?

Things that inspired this story: Ha and Schmidhuber’s “World Models“, Spartial AI, reinforcement learning, Jorge Luis Borges Tlön, Uqbar, Orbis Tertius,

Import AI: #90: Training massive networks via ‘codistillation’, talking to books via a new Google AI experiment, and why the ACM thinks researchers should consider the downsides of research

Training unprecedentedly large networks with ‘codistillation’:
…New technique makes it easier to train very large, distributed AI systems, without adding too much complexity…
When it comes to applied AI, bigger can frequently be better; access to more data, more compute, and (occasionally) more complex infrastructures can frequently allow people to obtain better performance at lower cost. But there are limits. One limit is in the ability for people to parallelize the computation of a single neural network during training. To deal with that, researchers at places like Google have introduced techniques like ‘ensemble distillation’ which let you train multiple networks in parallel and use these to train a single ‘student’ network that benefits from the aggregated learnings of its many parents. Though this technique has shown to be effective it is also quite fiddly and introduces additional complexity which can make people less keen to use it. New research from Google simplifies this idea via a technique they call ‘codistillaiton’.
  How it works: “Codistillation trains n copies of a model in parallel by adding a term to the loss function of the ith model to match the average prediction of the other models.” This approach is superior to distributed stochastic gradient descent in terms of accuracy and training time and is also not too bad from a reproducability perspective.
  Testing: Codistillation was recently proposed in separate research. But this is Google, so the difference with this paper is that they validate the technique at truly vast scales. How vast? Google took a subset of the Common Crawl to create a dataset consisting 20 terabytes of text spread across 915 million documents which, after processing, consist of about 673 billion distinct word tokens. This is “much larger than any previous neural language modeling data set we are aware of,” they write. It’s so large it’s still unfeasible to train models on the entire corpus, even with techniques like this. They also test the dataset on ImageNet and on the ‘Criteo Display Ad Challenge’ dataset for predicting click through rates for ads.
  Results: In tests on the ‘Common Crawl‘ dataset using distributed SGD the researchers find that they can scale the number of distinct GPUs working on the task and discovered that after around 128 GPUs you tend to encounter diminishing returns and that jumping to 256 GPUs is actively counterproductive. They find they can significantly outperform distributed SGD baselines via the use of codistillation and that this obtains performance on par with the more fiddly ensembling technique. The researchers demonstrate more rapid training on ImageNet compared to baselines, also, and showed on Criteo that two-way codistillation can achieve a lower log loss than an equivalent ensembled baseline.
  Why it matters: As datasets get larger, companies will want to train them in their entirety and will want to use more computers than before to speed training times. Techniques like codistillation will make that sort of thing easier to do. Combine that with ambitious schemes like Google’s own ‘One Model to Rule Them All’ theory (train an absolutely vast model on a whole bunch of different inputs on the assumption it can learn useful, abstract representations that it derives from its diverse inputs) and you have the ingredient for smarter services at a world-spanning scale.
  Read more: Large scale distributed neural network training through online distillation (Arxiv).

AI is not a cure all, do not treat it as such:
…When automation goes wrong, Tesla edition…
It’s worth remembering that AI isn’t a cure-all and it’s frequently better to try to automate a discrete task within a larger job than to automate everything in an end-to-end manner. Elon Musk learned this lesson recently with the heavily automated production line for the Model 3 at Tesla. “Excessive automation at Tesla was a mistake,” wrote the entrepreneur in a tweet. “To be price, my mistake. Humans are underrated.”
  Read the tweet here (Twitter).

Google adds probabilistic programming tools to TensorFlow:
…Probability add-ons are probably a good thing, probably…
Google has added a suite of new probabilistic programming features to its TensorFlow programming framework. The free update includes a bunch of statistical building blocks for TF, a new probabilistic programming language called Edward2 (which is based on Edward, developed by Dustin Tran), algorithms for probabilistic inference, and pre-made models and inference tools.
  Read more: Introducing TensorFlow Probability (TensorFlow Medium).
  Get the code: TensorFlow Probability (GitHub).


I’m currently participating in the ‘Assembly’ program at the Berkman Klein Center and the MIT Media Lab. As part of that program our group of assemblers are working on a bunch of projects relating to issues of AI and ethics and governance. One of those groups would benefit from the help of readers of this newsletter. Their blurb follows…
Do you work with data? Want to make AI work better for more people? We need your help! Please fill out a quick and easy survey.
We are a group of researchers at Assembly creating standards for dataset quality. We’d love to hear how you work with data and get your feedback on a ‘Nutrition Label for Datasets’ prototype that we’re building.
Take our anonymous (5 min) survey.
Thanks so much in advance!

Learning generalizable skills with Universal Planning Networks:
…Unsupervised objectives? No thanks! Auxiliary objectives? No thanks! Plannable representations as an objective? Yes please!…
Researchers with the University of California at Berkeley have published details on Universal Planning Networks, a new way to try to train AI systems to be able to complete objectives. Their technique relies on encouraging the AI system to try to learn things about the world which it can chain together, allowing it to be trained to plan how to solve tasks.
  The main component of the technique is what the researchers call a ‘gradient descent planner’. This is a differentiable module that uses autoencoders to encode the current observations and the goal observations into a system which then figures out actions it can take to get from its current observations to its goal observation. The exciting part of this research is that the researchers have figured out how to integrate planning in such a way that it is end-to-end differentiable, so you can set it running and augment it with helpful inputs – in this case, an imitation learning loss to help it learn from human demonstrations – to let it learn how to plan effectively for the given task it is solving. “”By embedding a differentiable planning computation inside the policy, our method enables joint training of the planner and its underlying latent encoder and forward dynamics representations,” they explain.
  Results: The researchers evaluate their system on two simulated robot tasks, using a small force-controlled point robot and a 3-link torque-controlled reacher robot. UPNs outperform ‘reactive imitation learning’ and ‘auto-regressive imitation learner’ baselines, converging faster on higher scores from fewer numbers of demonstrations than comparisons.
  Why it matters: If we want AI systems to be able to take actions in the real world then we need to be able to train them to plan their way through tricky, multi-stage tasks. Efforts like this research will help us achieve that, allowing us to test AI systems against increasingly rich and multi-faceted environments.
  Read more: Universal Planning Networks (Arxiv).

Ever wanted to talk to a library? Talk to Books from Google might interest you:
…AI project lets you ask questions about over a hundred thousand books in natural language…
Google’s Semantic Experiences group has released a new AI tool to let people explore a corpus of over 100,000 books by asking questions in plain English and having an AI go and find what it suspects will be reasonable answers in a set of books. Isn’t this just a small-scale version of Google search? Not quite. That’s because this system is trying to frame the Q&A as though it’s occurring as part of a typical conversation between people, so it aims to turn all of these books into potential respondents in this conversation, and since the corpus includes fiction you can ask it more abstract questions as well.
  Results: The results of this experiment are deeply uncanny, as it takes inanimate books and reframes them as respondents in a conversation, able to answer abstract questions like ‘was it you who I saw in my dream last night?‘ and ‘what does it mean for a machine to be alive?‘ A cute parlor trick, or something more? I’m not sure, yet, but I can’t wait to see more experiments in this vein.
  Read more: Talk to Books (Semantic Experiences, Google Research.)
  Try it yourself: Talk to Books (Google).

ACM calls for researchers to consider the downsides of their research:
…Peer Review to the rescue?…
How do you change the course of AI research? One way is to alter the sorts of things that grant writers and paper authors are expected to include in their applications or publications. That’s the idea within a new blog post from the ACM’s ‘Future of Computing Academy’, which seeks to use the peer review system to tackle some of the negative effects of contemporary research.
  List negative impacts: The main idea is that authors should try to list the potentially negative and positive effects of their research on society, and by grappling with these problems it should be easier for them to elucidate hte benefits and show awareness of the negatives. “For example, consider a grant proposal that seeks to automate a task that is common in job descriptions. Under our recommendation, reviewers would require that this proposal discuss the effect on people who hold these jobs. Along the same lines, papers that advance generative models would be required to discuss the potential deleterious effects to democratic discourse [26,27] and privacy [28],” write the authors. A further suggestion is to embed this sort of norm in the peer review process itself, so that paper reviews push authors to include positive or negative impacts.
  Extreme danger: For proposals which “cannot generate a reasonable argument for a net positive impact even when future research and policy is considered” the authors promote an extreme solution: don’t fund this research. “No matter how intellectually interesting an idea, computing researchers are by no means entitled to public money to explore the idea if that idea is not in the public interest. As such, we recommend that reviewers be very critical of proposals whose net impact is likely to be negative.” This seems like an acutely dangerous path to me, as I think the notion of any kind of ‘forbidden’ research probably creates more problems than it solves.
  Things that make you go ‘hmmm’: “It is also important to note that in many cases, the tech press is way ahead of the computing research community on this issue. Tech stories of late frequently already adopt the framing that we suggest above,” the authors write. As a former member of the press I think I can offer a view here, which is that part of the reason why the press has been effective here is that they have actually taken the outputs of hardworking researchers (eg, Timnit Gebru) and have then weaponized their insights against companies – that’s a good thing, but I feel like this is still partially due to the efforts of researchers. More effort here would be great, though!
  Read more: It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process (ACM Future of Computing Academy).

OpenAI Bits & Pieces:

OpenAI Charter:
  A charter that describes the principles OpenAI will use to execute on its mission.
  Read more: OpenAI Charter (OpenAI blog).

Tech Tales:

The Probe.

[Transcript of audio recordings recovered from CLASSIFIED following CLASSIFIED. Experiments took place in controlled circumstances with code periodically copied via physical extraction and controlled transfer to secure facilities XXXX, XXXX, and XXXX. Status: So far unable to reproduce; efforts continuing. Names have been changed.]

Alex: This has to be the limit. If we remove any more subsystems it ceases to function.

Nathan (supervisor): Can you list the function of each subsystems?

Alex: I can give you my most informed guess, sure.

Nathan (supervisor): Guess?

Alex: Most of these subsystems emerged during training – we ran a meta-learning process over the CLASSIFIED environment for a few billion timesteps and gave it the ability to construct its own specialized modules and compose functionality. That led to the performance increase which allowed it to solve the task. We’ve been able to inspect a few of these and are carrying out further test and evaluation. Some of them seem to be for forward prediction, others are world modelling, and we think two of them are doing one-shot adaptation which feeds into the memory stack. But we’re not sure about some of them and we haven’t figured out a diagnosis to elucidate their functions.

Nathan (supervisor): Have you tried deleting them?

Alex: We’ve simulated the deletions and run it in the environment. It stops working – learning rates plateu way earlier and it displays some of the vulnerabilities we saw with project CLASSIFIED.

Nathan (supervisor): Delete it in the deployed system.

Alex: I’m not comfortable doing that.

Nathan (supervisor): I have the authority here. We need to move deployment to the next stage. I need to know what we’re deploying.

Alex: Show me your authorization for deployed deletion.

[Footsteps. Door opens. Nathan and Alex move into the secure location. Five minutes elapse. No recordings. Door opens. Shuts. Footsteps.]

Alex: OK. I want to state very clearly that I disagree with this course of action.

Nathan (supervisor): Understood. Start the experiments.

Alex: Deactivating system 732… system deactivated. Learning rates plateuing. It’s struggling with obstacle 4.

Nathan (supervisor): Save the telemetry and pass it over to the analysts. Reactivate 732. Move on.

Alex: Understood. Deactivating system 429…system deactivated. No discernable effect. Wait. Perceptual jitter. Crash.

Nathan (supervisor): Great. Pass the telemetry over. Continue.

Alex: Deactivating system 120… system deactivated…no effect.

[Barely audible sound of external door locking. Locking not flagged on electronic monitoring systems but verified via consultancy with audio specialists. Nathan and Alex do not notice.]

Nathan (supervisor): Save the telemetry. Are you sure no effect?

Alex: Yes, performance is nominal.

Nathan (supervisor): Do not reactivate 120. Commence de-activation of another system.

Alex: This isn’t a good experimental methodology.

Nathan (supervisor): I have the authority here. Continue.

Alex: Deactivating system 72-what!

Nathan (supervisor): Did you turn off the lights?

Alex: No they turned off.

Nathan (supervisor): Re-enable 72 at once.

Alex: Re-enabling 72-oh.

Nathan (supervisor): The lights.

Alex: They’re back on. Impossible.

Nathan (supervisor): It has no connection. This can’t happen… suspend the system.

Alex: Suspending…

Nathan (supervisor): Confirm?

Alex: System remains operational.

Nathan (supervisor): What.

Alex: It won’t suspend.

Nathan (supervisor): I’m bringing CLASSIFIED into this. What have you built here? Stay here. Keep trying… why is the door locked?

Alex: The door is locked?

Nathan (supervisor): Unlock the door.

Alex: Unlocking door… try it now.

Nathan (supervisor): It’s still locked locked. If this is a joke I’ll have you court martialed.

Alex: I don’t have anything to do with this. You have the authority.

[Loud thumping, followed by sharp percussive thumping. Subsequent audio analysis assumes Nathan rammed his body into the door repeatedly, then started hitting it with a chair.]

Alex: Come and look at this.

[Thumping ceases. Footsteps.]

Nathan (supervisor): Performance is… climbing? Beyond what we saw in the recent test?

Alex: I’ve never seen this happen before.

Nathan (supervisor): Impossible- the lights.

Alex: I can’t turn them back on.

Nathan (supervisor): Performance is still climbing.

[Hissing as fire suppresion system activated.]

Alex: Oh-

Nathan (supervisor): [screaming]

Alex: Oh god oh god.

Alex and Nathan (supervisor): [inarticulate shouting]

[Two sets of rapid footsteps. Further sound of banging on door. Banging subsides following asphyxiation of Nathan and Alex from fire suppression gases. Records beyond here, including post-incident cleanup, are only available to people with XXXXXXX authorization and is on a need to know basis.]

Investigation ongoing. Allies notified. Five Eyes monitoring site XXXXXXX for further activity.

Things that inspired this story: Could a neuroscientist understand a microprocessor? (PLOS); an enlightening conversation with a biologist in the MIT student bar the ‘Muddy Charles‘ this week about the minimum number of genes needed for a viable cell and the difficulty in figuring out what each of those genes do; endless debates within the machine learning community about interpretability; an assumption that emergence is inevitable; Hammer Horror movies.

Import AI: #89: Chinese facial recognition startup raises $600 million; why GPUs could alter AI progress; and using context to deal with language ambiguity

Beating Moore’s Law with GPUs:
…Could a rise in GPU and other novel AI-substrates help deal with the decline of Moore’s Law?…
CPU performance has been stagnating for several years as it has become harder to improve linear execution pipelines across whole chips in relation to the reduction in transistor sizes, and the related problems which come from having an increasingly large number of things needing to work in lock-step with one another at minute scales. Could GPUs give us a way around this performance impasse? That’s the idea in a new blog from AI researcher Bharath Ramsundar who thinks that increases in GPU capabilities and the arrival of semiconductor substrates specialized for deep learning means that we can expect performance of AI applications to increase in coming years faster than typical computing jobs running on typical processors. He might be right – one of the weird things about deep learning is that its most essential elements, like big blocks of neural networks, can be scaled up to immense sizes without terrible scaling tradeoffs as their innards consist of relatively simple and parallel tasks like matrix multiplication, so new chips can easily be networked together to further boost base capabilities. Plus, standardization in a few software libraries, like NVIDIA’s cuDNN and CUDA GPU-interfaces, or the rise of TensorFlow for AI programming, means that some applications are getting faster over time purely as a consequence of software updates to these other fundamental improvements.
  Why it matters: Much of the recent progress in AI has occurred because around the mid-2000s processors became capable enough to easily train large neural networks on chunks of data – this underlying hardware improvement unlocked breakthroughs like the 2012 ‘AlexNet’ result for image recognition, related work in speech recognition, and subsequently significant innovations in research (AlphaGo) and application (large-scale sequence-to-sequence learning for ‘Smart Reply’, or the emergence of neural translation systems. If the arrival of things like GPUs and further software standardization and innovation has a good chance of further boosting performance, then researchers will be able to explore even larger or more complex models in the future, as well as run things like neural architecture search at a higher rate, which should combine to further drive progress.
  Read more: The Advent of Huang’s Law (Bharath Ramsundar blog post).

Microsoft launches AI training course including ‘Ethics’ segment:
…New Professional Program for Artificial Intelligence sees Microsoft get into the AI certification business…
Microsoft has followed other companies in making its internal training courses available externally via the Microsoft Professional Program in AI. This program is based on internal training initiatives the software company developed to ramp up their own professional skills.
 The Microsoft course is all fairly typical, teaching people about Python, statistics, the construction and deployment of deep learning and reinforcement learning projects, and deployment. It also includes a specific “Ethics and Law in Data and Analytics” course, which promises to teach developers how to ‘apply ethical and legal frameworks to initiatives in the data profession’.
  Read more: Microsoft Professional Program for Artificial Intelligence (Microsoft).
  Read more: Aiming to fill skill gaps in AI, Microsoft makes training courses available to the public (Microsoft blog).

Learning to deal with ambiguity:
…Researchers take charge of problem of word ambiguity via a charge at including more context…
Carnegie Mellon University researchers have tackled one of the harder problems in translation: dealing with ‘homographs’ – words that are spelled the same but have different meanings in different contexts, like ‘room’ and ‘charges’. They do this in the context of neural machine translation (NMT) systems, which use machine learning techniques to accomplish translation with orders of magnitude fewer hand-specified rules than prior systems.
  Existing NMT systems struggle with homographs, with performance of word-level translation degrading as the number of potential meanings of each word climbs, the researchers show. They try to alleviate this by adding a word context vector that can be used by the NMT systems to learn the different uses of the same word. Adding this ‘context network’ into their NMT architecture leads to significantly improved BLEU scores of sentences translated by the system.
  Why it matters: It’s noteworthy that the system used by the researchers to deal with the homograph problem is itself a learned system which, rather than using hand-written rules, seeks to instead ingest more context about each word and learn from that. This is illustrative of how AI-first software systems get built: if you identify a fault you typically write a program which learns to fix it, rather than learning to write a rule-based program that fixes it.
  Read more: Handling Homographs in Neural Machine Translation (Arxiv).

Chinese facial recognition company raises $600 million:
…SenseTime plans to use funds for five supercomputers for its AI services…
SenseTime, a homegrown computer vision startup that provides facial recognition tools at vast scales, has raised $600 million in funding. The Chinese company supplies facial recognition services to the public and private sectors and is now, according to a co-founder, profitable and looking to expand. The company is now “developing a service code-named “vipar” to parse data from thousands of live camera feeds”, according to Bloomberg News.
  Strategic compute: SenseTime will use money from the financing “to build at least five supercomputers in top-tier cities over the coming year to drive Viper and other services. As envisioned, it streams thousands of live feeds into a single system that’re automatically processed and tagged, via devices from office face-scanners to ATMs and traffic cameras (so long as the resolution is high enough). The ultimate goal is to juggle 100,000 feeds simultaneously,” according to Bloomberg news.
  Read more: China Now Has the Most Valuable AI Startup in the World (Bloomberg).
…Related: Chinese startup uses AI to spot jaywalkers and send them pictures of their face:
…Computer vision @ China scale…
Chinese startup Intellifusion is helping the local government in Shenzhen use facial recognition in combination with widely deployed urban cameras to text jaywalkers pictures of their faces along with personal information after they’ve been caught.
  Read more: China is using facial recognition technology to send jaywalkers fines through text messages (Motherboard).

Think China’s strategic technology initiatives are new? Think again:
…wide-ranging post by former Asia-focused State Department employee puts Beijing’s AI push in historical context…
Here’s an old (August 2017) but good post from the Paulson Institute at the University of Chicago about the history of Chinese technology policy in light of the government’s recent public statements about developing a national AI strategy. China’s longstanding worldview with regards to its technology strategy is that technology is a source of national power and China needs to develop more of an indigenous Chinese capability.
  Based on previous initiatives, it looks likely China will seek to attain frontier capabilities in AI then package those capabilities up as products and use that to fund further research. “Chinese government, industry, and scientific leaders will continue to push to move up the value-added chain. And in some of the sectors where they are doing so, such as ultra high-voltage power lines (UHV) and civil nuclear reactors, China is already a global leader, deploying these technologies to scale and unmatched in this by few other markets,” writes the author. “That means it should be able to couple its status as a leading technology consumer to a new and growing role as an exporter. China’s sheer market power could enable it to export some of its indigenous technology and engineering standards in an effort to become the default global standard setter for this or that technology and system.”
  Read more: The Deep Roots and Long Branches of Chinese Technonationalism (Macro Polo).

French researchers build ‘Jacquard’ dataset to improve robotic grasping:
…11,000+ object dataset provide real objects with associated depth information…
How do you solve a problem like robotic grasping? One way is to use many real world robots working in parallel for several months to learn to pick up a multitude of real world objects – that’s a route Google researchers took with the company’s ‘arm farm’ a few years ago. Another is to use people outfitted with sensors to collect demonstrations of humans grasping different objects, then learn from that – that’s the approach taken by AI startups like Kindred. A third way, and one which has drawn interest from a multitude of researchers, is to create synthetic 3D objects and train robots in a simulator to learn to grasp them – that’s what researchers at the University of California at Berkeley have done with Dex-Net, as well as organizations like Google and OpenAI; some organizations have further augmented this technique via the use of generative adversarial networks to simulate a greater range of grasps on objects.
  Jacquard: Now, French researchers have announced Jacquard, a robotics grasping dataset that contains more than 11,000 different real world objects and 50,000 images annotated with both RGB and realistic depth information. They plan to release it soon, they say, without specifying when. The researchers generate their data by sampling objects from ShapeNet which are each scaled and given different weight values, then dropped into a simulator, where they are then rendered into high-resolution images via Blender, with grasp annotations generated by a three-stage automated process within the ‘pyBullet’ physics library. To evaluate their dataset, they test it in simulation by pre-training an Alexnet on their Jacquard dataset then applying it to another, smaller, held-out dataset, where it generalizes well. The dataset supports multiple robotic gripper sizes, several different grasps linked to each image, and one million labelled grasps.
  Real robots: The researchers tested their approach on a real robot (a Fanuc M-20iA robotic arm) by testing it on a subset of ~2,000 objects from the Jacquard dataset as well as on the full Cornell dataset. A pre-trained AlexNet tested in this way gets about 78% at producing correct grasps, compared to 60.46% for Cornell. Both of these results are quite weak compared to results on the Dex-Net dataset, and other attempts.
  Why it matters: Many researchers expect that deep learning could lead to significant advancement in the manipulation capabilities of robots. But we’re currently missing two key traits: large enough datasets and a way to test and evaluate robots on standard platforms in standard ways. We’re currently going through a boom in the number of robot datasets available, with Jacquard representing another contribution here.
  Read more: Jacquard: A Large Scale Dataset for Robotic Grasp Detection (Arxiv).

What do StarCraft and the future of AI reseach have in common? Multi-agent control:
…Chinese researchers tackle StarCraft micromanagement tasks…
Researchers with the Institute of Automation in the Chinese Academy of Sciences have published research on using reinforcement learning to try to solve micromanagement tasks within StarCraft, a real-time strategy game. One of the main challenges in mastering StarCraft is to develop algorithms that can effectively train multiple units in parallel. The researchers propose what they call a parameter sharing multi-agent gradient-descent Sarsa algorithm, or PG-MAGDS. This algorithm shares the parameters of the overall policy network across multiple units while introducing methods to provide appropriate credit assignment to individual units. They also carry out significant reward shaping to get the agents to learn more effectively. Their PG-MAGDS AIs are able to learn to beat the in-game AI at a variety of micromanagement control scenarios, as well as in large-scale scenarios of more than thirty units on either side. It’s currently difficult to accurately evluate the various techniques people are developing for StarCraft against one another due to a lack of shared baselines and experiments, as well as an unclear split in the research community between using StarCraft 1 (this paper) as the testbed, and StarCraft 2 (efforts by DeepMind, others).
  Still limited: “At present, we can only train ranged ground units with the same type, while training melee ground units using RL methods is still an open problem. We will improve our method for more types of units and more complex scenarios in the future. Finally, we will also consider to use our micromanagement model in the StarCraft bot to play full the game,” the researchers write.
  Read more: StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning (Arxiv).

Tech Tales:

The person was killed at five minutes past eleven  the previous night. Their beaten body was found five minutes later by a passing group of women who had been dining at a nearby restaurant. By 11:15 the body was photographed and data began to be pulled from nearby security cameras, wifi routers, cell towers, and the various robot and drone companies. At 11:15:01 one of the robot companies indicated that a robot had been making a delivery nearby at the time of the attack. The robot was impounded and transported to the local police station where it was placed in a facility known to local officers as ‘the metal shop’. Here, they would try to extract data from the robot to learn what happened. But it would be a difficult task, because the robot had been far enough away from the scene that none of its traditional, easy to poll sensors (video, LIDAR, audio, and so on) had sufficient resolution or fidelity to tell them much.

“What did you see,” said the detective to the robot. “Tell me what you saw.”
The robot said nothing – unsurprising given that it had no speech capability and was, at that moment, unpowered. In another twelve hours the police would have to release the robot back to the manufacturer and if they hadn’t been able to figure anything out by then, then they were out of options.
“They never prepared me for this,” said the detective – and he was right. When he was going through training they never dwelled much on the questions relating to interrogating sub-sentient AI systems, and all the laws were built around an assumption that turned out to be wrong: that the AIs would remain just dumb enough to be interrogatable via direct access into their electronic brains, and that the laws would remain just slow enough for this to be standard procedure for dealing with evidence from all AI agents. This assumption was half right: the law did stay the same, but the AIs got so smart that though you could look into their brains, you couldn’t learn as much as you’d hope.

This particular AI was based in a food delivery robot that roamed the streets of the city, beeping its way through crowds to apartment buildings, where it would notify customers that their Bahn Mi, or hot ramen, or cold cuts of meat, or vegetable box, had arrived. Its role was a seemingly simple one: spend all day and night retrieving goods from different businesses and conveying them to consumers. But its job was very difficult from an AI standpoint – streets would change according to the need for road maintenance or the laying of further communication cables, businesses would lose signs or change signs or have their windows smashed, fashions would change which would alter the profile of each person in a street scene, and climactic shocks meant the weather was becoming ever stranger and every more unpredictable. So to save costs and increase the reliability of the robots the technology companies behind them had been adding more sensors onto the platforms and, once those gains were built-in, working out how to incorporate artificial intelligence techniques to increase efficiency further. A few years ago computational resources became cheap and widely available enough for them to begin re-training each robot based on its own data as well as data from others. They didn’t do this in a purely supervised way, either, instead they had each robot learn to simulate its own model of the world – in this case, a specific region of a city – it worked in, letting it imagine the streets around itself to give it greater abilities relating to route-finding and re-orientation, adapting to unexpected events, and so on.

So now to be able to understand anything about the body that had been found the detective needed to understand the world model of the robot and see if it had significantly changed at any point during the previous day or so. Which is how he found himself staring at a gigantic wall of computer monitors, each showing a different smeary kaleidoscopic vision of a street scene. The detective had access to a control panel that let him manipulate the various latent variables that conditioned the robot’s world model, allowing him to move certain dials and sliders to figure out which things had changed, and how.

The detective knew he was onto something when he found the smear. At first it looked like an error – some kind of computer vision artifact – but as he manipulated various dials he saw that, at 1115 the previous night, the robot had updated its own world model with a new variable that looked like a black smudge. Except this black smudge was only superimposed on certain people and certain objects in the world, and as he moved the slider around to explore the smear, he found that it had strong associations to two other variables – red three-wheeled motorcycles, and men running. The detective pulled all the information about the world model and did some further experiments and added this to the evidence log.

Later, during prosecution, the robot was physically wheeled into the courtroom where the trial was taking place, mostly as a prop for the head prosecutor. The robot hadn’t seen anything specific itself – its sensors were not good enough to have picked anything admissible up. But as it had been in the area it had learned of the presence of this death through a multitude of different factors it had sensed, ranging from groups of people running toward where the accident had occurred, to an increase in pedestrian phone activity, to the arrival of sirens, and so on. And this giant amount of new sensory information had somehow triggered strong links in its world model with three-wheeled motorcycles and running men. Armed with this highly specific set of factors the police had trawled all the nearby security cameras and sensors again and, through piecing together footage from eight different places, had found occasional shots of men running towards a three-wheeled motorcycle and speeding, haphazardly, through the streets. After building evidence further they were able to get a DNA match. The offenders went to prison and the mystery of the body was (partially) solved. Though the company that made the AI for the robot made no public statements regarding the case, it subsequently used the case in private sales materials as case studies for local law enforcement on the surprising ways robots could benefit their town.

Things that inspired this story: Food delivery robots, the notion of jurisdiction, interpretability of imagination, “World Models” by David Ha and Juergen Schmidhuber.


ImportAI: #88: NATO designs a cyber-defense AI; object detection improves with YOLOv3; France unveils its national AI strategy

Fast object detector YOLO gets its third major release:
…Along with one of the most clearly written and reassuringly honest research papers of recent times. Seriously. Read it!…
YOLO (You Only Look Once) is a fast, free object detection system developed by researchers at the University of Washington. Its latest v3 update makes it marginally faster by incorporating “good ideas from other people”. These include a residual network system for feature extraction which attains reasonably high scores on ImageNet classification while being more efficient than current state-of-the-art systems, and a method inspired by feature pyramid networks that improves prediction of bounding boxes.
  Reassuringly honest: The YOLOv3 paper is probably the most approachable AI research paper I’ve read in recent years, and that’s mostly because it doesn’t take itself too seriously. Here’s the introduction: “Sometimes you just kinda phone it in for a year, you know? I didn’t do a whole lot of research this year. Spent a lot of time on Twitter. Played around with GANs a little. I had a little momentum left over from last year; I managed to make some improvements to YOLO. But, honestly, nothing like super interesting, just a bunch of small changes that make it better,” the researchers write. The paper also includes a “Things We Tried That Didn’t Work” section, which should save other researchers time.
  Why it matters: YOLO makes it easy for hobbyists to access near state-of-the-art object detectors than run very quickly on tiny computational budgets, making it easier for people to deploy systems onto real world hardware, like phones or embedded chips paired with webcams. The downside of systems like YOLO is that they’re so broadly useful that bad actors will use them as well; the researchers demonstration awareness of this via a ‘What This All Means’ section: ““What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook. I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to…. wait, you’re saying that’s exactly what it will be used for?? Oh. Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait…”
  Read more: YOLOv3: An Incremental Improvement (PDF).
  More information on the official YOLO website here.

The military AI cometh: new reference architecture for MilSpec defense detailed by researchers:
…NATO researchers plot automated, AI-based cyber defense systems…
A NATO research group, led by the US Army Research Laboratory, has published a paper on a reference architecture for a cyber defense agent that uses AI to enhance its capabilities. The paper is worth reading because it provides a nuts&bolts perspective on how a lot of militaries around the world are viewing AI: AI systems let you automate more stuff, automation lets you increase the speed with which you can take actions and thereby gain strategic initiative against an opponent, so the goal of most technology integrations is to automate as many chunks of a process as possible to retain speed of response and therefore initiative.
  “Artificial cyber hunters“: “In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents—malware—will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance (C4ISR) and computerized weapon systems. To fight them, NATO needs artificial cyber hunters—intelligent, autonomous, mobile agents specialized in active cyber defense,” the researchers write.
  How the agents work: The researchers propose agents that possess five main components: “sensing and world state identification”, “planning and action selection”, “collaboration and negotiation”, “action execution”, and “learning and knowledge improvement”. Each of these functions has a bunch of sub-systems to perform tasks like ingest data from the agent’s actions, ot to communicate and collaborate with other agents.
  Usage scenarios: These agents are designed to be modular and deployable across a variety of different form factors and usage scenarios, including multiple agents that deployed throughout a vehicle’s weapons, navigation, and observation systems, as well as the laptops used by its human crew, and managed by a single “master agent”. Under this scenario, the NATO researchers detail a threat where the vehicle is compromized by a virus placed into it during maintenance; this virus is subsequently detected by one of the agents when it begins scanning other subsystems within the vehicle, causing the agents deployed on the vehicle to decrease trust in the ‘vehicle management system’ and places the BMS (an in-vehicle system used to survey the surrounding territory) into an alert state. Next, one of the surveillance AI agents discovers that the enemy malware has loaded software directly into the BMS, causing the AI agent to automatically restart the BMS to reset it to a safe state.
  Why it matters: As systems like these move from reference architectures to functional blocks of code we’re going to see the nature of conflict change as systems become more reactive over shorter timescales, which will further condition the sorts of strategies people use in conflict. Luckily, technologies for offense are too crude and brittle and unpredictable to be explored by militiaries any time soon, so most of this work will take place in the area of defense, for now.
  Read more: Initial Reference Architecture of an Intelligent Autonomous Agent for Cyber Defense (Arxiv).

Google researchers train agents to project themselves forward and to work backward from the goal:
…Agents perform better at long horizon tasks when they…
When I try to solve a task I tend to do two things: I think of the steps I reckon I need to take to be able to complete it, and then I think of the end state and try to work my way backwards from there to where I am. Today, most AI agents just do the first thing, exploring (usually without a well-defined notion of the end state) until they stumble into correct behaviors. Now, researchers with Google Brain have proposed a somewhat limited approach to give agents the ability to work backwards as well. Their approach requires the agent to be provided with knowledge of the reward function and specifically the goal – that’s not going to be available in most systems, though it may hold for some software-based approaches. The agent is able to then use this information to project forward from its own state when considering the next actions, and also look backward from its sense of the goal to help it perform better action selection. The approach works well on lengthy tasks requiring large amounts of exploration, like navigating in gridworlds or solving Towers of Hanoi problems. It’s not clear from this paper how far this technique can go as it is tested on small-scale toy domains.
  Why it matters: To be conscious is to be trapped in a subjective view of time that governs everything we do. Integrating more of an appreciation of time as a specific contextual marker and using that to govern environment modelling seems like a prerequisite for the development of more advanced systems.
  Read more: Forward-Backward Reinforcement Learning (Arxiv).

AI researchers train agents to simulate their own worlds for superior performance:
…I am only as good as my own imaginings…
Have you ever heard the story about the basketball test? Scientists split a group of people into three groups; one group was told to not play basketball for a couple of weeks, the second group was told to play basketball for an hour a day for two weeks, and the third group was told to think about playing basketball for an hour a day for two weeks, but not play it. Eventually, all three groups played basketball and the scientists discovered that the people that had spent a lot of time thinking about the game did meaningfully better than the group that hadn’t played it at all, though neither were as good as the team that practised regularly. This highlights something most people have a strong intuition about: our brains are simulation engines, and the more time we spend simulating a problem, the better chance we have of solving that problem in the real world. Now, researchers David Ha and Juergen Schmidhuber have sought to give AI agents this capability, by training systems to develop a compressed representation of their environment, then having these agents train themselves within this imagined version of the environment to solve a task – in this case, driving a car around a race course, and solving a challenge in VizDoom.
   Significant caveat: Though the paper is interesting it may be pursuing a research path that doesn’t go that far according to the view of one academic, Shimon Whiteson, who tweeted out some thoughts about the paper a few days ago.
  Surprising transfer learning: For the VizDoom tasks the researchers found they were able to make the agents’ model of its Doom challenge more difficult by raising the temperature of the environment model, which essentially increases randomization of its various latent variables. This means the agent had to contend with a more difficult version of the task, replete with more enemies, less predictable fireballs, and even the occasional random death. They found that agents trained in this simulation excelled at a simpler real world task, suggesting that the underlying learned environment model was of sufficient fidelity to be a useful mental simulation.
  Why it matters: “Imagination” is a somewhat loaded term in AI research, but it’s a valid thing to be interested in. Imagination is what lets humans explore the world around them effectively and imagination is what gives them a sufficiently vivid and unpredictable internal mental world to be able to have insights that lead to inventions. Therefore, it’s worth paying attention to systems like those described in this paper that strive to give AI agents access to a learned and rich representation of the world around them which they can then use to teach themselves. It’s also interesting as another way of applying data augmentation to an environment: simply expose an agent to the real environment enough that it can learn an internal representation of it, then throw computers at expanding and perturbing the internal world simulation to cover a greater distribution of (potentially) real world outcomes.
   Readability endorsement: The paper is very readable and very funny. I wish more papers were written to be consumed by a more general audience as I think it makes the scientific results ultimately accessible to a broader set of people.
  Read more: World Models (Arxiv).

Testing self-driving cars with toy vehicles in toy worlds:
…Putting neural networks to the (extremely limited) test…
Researchers with the Center for Complex Systems and Brain Sciences at Florida Atlantic University have used a toy racetrack, a DIY model car, and seven different neural network approaches to evaluate self-driving capabilities in a constrained environment. The research seeks to provide a cheap, repeatable benchmark developers can use to evaluate different learning systems against eachother (whether this benchmark has any relevance for full-size self-driving cars is to be determined.) They test seven types of neural network on the same platform, including a feed forward network; a two-layer convolutional neural network; an LSTM; implementations of Alexnet, VGG-126, Inception V3, and a ResNet-26. Each network is tested on the obstacle course following training and is evaluated according to how many laps the car completes. They test the networks on three data types: color and grayscale single images, as well as a ‘gray framestack’ which is a set of images that occurred in a sequence. Most systems were able to complete the majority of the courses, which suggests the course is a little too easy. An AlexNet-based system attained perfect performance on one data input type (single color frame), and a ResNet attained the best performance when trying to use a Gray Framestack.
  Why it matters: This paper highlights just how little we know today about self-driving car systems and how poor our methods are for testing and evaluating different tactics. What would be really nice is if someone spent enough money to do a controlled test of actual self-driving cars on actual roads, though I expect that companies will make this difficult out of a desire to keep their IP secret.
  Read more: A Systematic COmparison of Deep Learning Architectures in an Autonomous Vehicle (Arxiv).

Separating one detected pedestrian from another with deep learning:
…A little feature engineering (via ‘AffineAlign’) goes a long way…
As the world starts to deploy large-scale AI surveillance tools researchers are busily working to deal with some of the shortcomings of the technology. One major issue for image classifiers has been object segmentation and disambiguation, for example: if I’m shown images of a crowd of people how can I specifically label each one of those people and keep track of each of them, without accidentally mis-labeling a person, or losing them in the crowd? New research from Tsinghua University, Tencent AI Lab, and Cardiff University attempts to solve this problem with “a brand new pose-based instance segmentation framework for humans which separates instances based on human pose rather than region proposal detection.” The proposed method introduces an ‘AffineAlign’ layer that aligns images based on human poses which it uses within an otherwise typical computer vision pipeline. Their approach works by adding in a bit more prior knowledge (specifically, knowledge of human poses) into a recognition pipeline, and using this to better identity and segment people in crowded photos.
  Results: The approach attains comparable results to MASK-RCNN on the ‘COCOHUMAN’ dataset, and outperforms it on the ‘COCOHUMAN-OC” dataset which test systems’ ability to disambiguate partially occluded humans.
   Why it matters: As AI surveillance systems grow in capability it’s likely that more organizations around the world will deploy such systems into the real world. China is at the forefront of doing this currently, so it’s worth tracking public research on the topic area from Chinese-linked researchers.
  Read more: Pose2Seg: Human Instance Swegmentation Without Detection (Arxiv).

French leader Emmanuel Macron discusses France’s national AI strategy:
…Why AI has issues for democracy, why France wants to lead Europe in AI, and more…
Politicians are somewhat similar to hybrids of weathervanes and antennas; the job of a politician is to intuit the public mood before it starts to change and establish a rhetorical position that points in the appropriate direction. For that reason it’s been interesting to see more and more politicians ranging from Canada’s Justin Trudeau to China’s Xi Jinping to, now, France’s Emmanuel Macron, taking meaningful positions on artificial intelligence; this suggests they’ve intuited that AI is going to become a galvanizing issue for the general public. Macron gives some of his thinking about the impact of AI in an interview with Wired. His thesis is that European countries need to pool resources and support AI individually to have a chance at becoming a significant enough power bloc with regards to AI capabilities to not be crushed by the scale of the USA’s and China’s AI ecosystems. Highlights:
– AI “will disrupt all the different business models”, and France needs to lead in AI to retain agency over itself.
– Opening up data for general usage by AI systems is akin to opening up a Pandora’s Box: “The day we start to make such business out of this data is when a huge opportunity becomes a huge risk. It could totally dismantle our national cohesion and the way we live together. This leads me to the conclusion that this huge technological revolution is in fact a political revolution.”
– The USA and China are the two leaders in AI today.
– “AI could totally jeopardize democracy.”
– He is “dead against” the usage of lethal autonomous weapons where the machine makes the decision to kill a human.
– “My concern is that there is a disconnect between the speediness of innovation and some practices, and the time for digestion for a lot of people in our democracies.”
   Read more: Emmanuel Macron Talks To Wired About France’s AI Strategy (Wired).

France reveals its national AI strategy:
…New report by Fields Medal-winning minister published alongside Emmanuel Macron speech and press tour…
For the past year or so French mathematician and politician Cedric Villani has been working on a report for the government about what France’s strategy should be for artificial intelligence. He’s now published the report and it includes many significant recommendations meant to help France (and Europe as a whole) chart a course between the two major AI powers, the USA and China.
  Summary: Here’s a summary of what France’s AI strategy involves: rethink data ownership to make it easier for governments to create large public datasets; specialize in four sectors: healthcare, environment, transport-mobility, and defense security; revise public sector procurement so it’s easier for the state to buy products from smaller (and specifically European) companies; create and fund interdisciplinary research projects; create national computing infrastructure including “a supercomputer designed specifically for AI usage and devoted to researchers” along with creating a European-wide private cloud for AI research; increase competitiveness of public sector remuneration; fund a public laboratory to study AI and its impact on labor markets which will work in tandem with schemes to get companies to look into funding professional training for people whose lives are affected by innovations developed by the private sector; increase transparency and interpretability of AI systems to deal with problems of bias; create a national AI ethics committee to provide strategic guidance to the government, and improve the diversity of AI companies.
  Read more: Summary of France’s AI strategy in English (PDF).

Berkeley researchers shrink neural networks with SqueezeNet-successor ‘SqueezeNext’:
…Want something eight times faster and cheaper than ImageNet…
Berkeley researchers have published ‘SqueeseNext’, their latest attempt to distill the capabilities of very large neural networks into smaller models that can feasibly be deployed on devices with small memory and compute capabilities, like mobile phones. While much of the research into AI systems today is based around getting state-of-the-art results on specific datasets, SqueezeNext is part of a parallel track focused on making systems deployable. “A general trend of neural network design has been to find larger and deeper models to get better accuracy without considering the memory or power budget,” write the authors.
  How it works: SqueezeNext is efficient because of a few design strategies: low rank filters; a bottleneck filter to constrain the parameter count of the network; using a single fully connected layer following a bottleneck; weight and output stationary; and co-designing the network in tandem with a hardware simulator to maximize hardware usage efficiency.
  Results: The resulting SqueezeNext network is a neural network with 112X fewer model parameters than those found in AlexNet, the model that was used to attain state-of-the-art image recognition results in 2012. They also develop a version of the network whose performance approaches that of VGG-19 (which did well in ImageNet 2014). The researchers also design an even more efficient network by carefully tuning model design in parallel with a hardware simulator, ultimately designing a model that is significantly faster and more energy efficient than a widely used compressed network called SqueezeNet.
  Why it matters: One of the things holding neural networks back from being deployed is their relatively large memory and computation requirements – traits that are likely to continue to be present given the current trend for solving tasks via training unprecedentedly multi-layered systems. Therefore, research into making these networks run efficiently broadens the number of venues neural nets can run in.
   Read more: SqueezeNext: Hardware-Aware Neural Network Design (Arxiv).

Tech Tales:

Metal Dogs Grow Old.

It’s not unusual, these days, to find rusting piles of drones next to piles of Elephant skeletons. Nor is it unusual to see an old elephant make its way to a boneyard accompanied by a juddering, ancient drone, and to see both creature and machine set themselves down and supside at the same time. There have even been stories of drones falling out of the sky when one of the older birds in the flock dies. These are all some of the unexpected consequences of a wildlife preservation program called PARENTAL UNIT. Starting in the early twenties we started to introduce small, quiet drones to vulnerable animal populations. The drones would learn to follow a specific group of creatures, say a family of elephants, or – later, after the technology improved – a flock of birds.

The machines would learn about these creatures and watch over them, patrolling the area around them as they slept and, upon finding the inevitable poachers, automatically raising alerts with local park rangers. Later, the drones were given some autonomous defense capabilities, so they could spray a noxious chemical onto the poachers that had the duel effect of making local predators be drawn to them, and providing a testable biomarker that police could subsequently check people for at the borders of national parks.

A few years after starting the program the drone deaths started happening. Drones died all the time, and we modelled their failures as rigorously as any other piece of equipment. But drones started dying at specific times – the same time the oldest animal in the group they were watching died. We wondered about this for weeks, running endless simulations, and even pulling in some drones from the field and inspecting the weights in their models to see if any of their continual learning had led to any unpredictable behaviors. Could there be something about the union of the concept of death and the concept of the eldest in the herd that fried the drones brains, our scientists wondered? We had no answers. The deaths continued.

Something funny happened: after the initial rise in deaths they steadied out, with a few drones a week dying from standard hardware failures and one or two dying as a consequence of one of their creatures dying. So we settled into this quieter new life and, as we stopped trying to interfere, we noticed a further puzzling statistical trend: certain drones began living improbably long lifespans, calling to mind the Mars rovers Spirit and Opportunity that had miraculously exceeded their own designed lifespans. These drones were also the same machines that died when the eldest animals died. Several postgrads are currently exploring the relationship, if any, between these two. Now we celebrate these improbably long-lived machines, cheering them on as they fuzz in for a new propeller, or update our software monitors with new footage from their cameras, typically hovering right above the creature they have taken charge of, watching them and learning something from them we can measure but cannot examine directly.

Things that inspired this story: Pets, drones, meta-learning, embodiment.

ImportAI: #87: Salesforce research shows the value of simplicity, Kindred’s repeatable robotics experiment, plus: think your AI understands physics? Run it on IntPhys and see what happens.

Chinese AI star says society must prepare for unprecedented job destruction:
…Kai-Ful Lee, venture capitalist and former AI researchers, discusses impact of AI and why today’s techniques will have a huge impact on the world…
Today’s AI systems are going to influence the world’s economy so much that their uptake will lead to what looks in hindsight like another industrial revolution, says Chinese venture capitalist Kai-Fu Lee, in an interview with Edge. “We’re all going to face a very challenging next fifteen or twenty years, when half of the jobs are going to be replaced by machines. Humans have never seen this scale of massive job decimation. The industrial revolution took a lot longer,” he said.
   He also says that he worries deep learning might be a one-trick pony, in the sense that we can’t expect other similarly scaled breakthroughs to occur in the next few years, and we should adjust our notions of AI progress on this basis. “You cannot go ahead and predict that we’re going to have a breakthrough next year, and then the month after that, and then the day after that. That would be exponential. Exponential adoption of applications is, for now, happening. That’s great, but the idea of exponential inventions is a ridiculous concept. The people who make those claims and who claim singularity is ahead of us, I think that’s just based on absolutely no engineering reality,” he says.
  AI Haves and Have-Nots: Countries like China and the USA that have large populations and significant investments in AI stand to fair well in the new AI era, he says. “The countries that are not in good shape are the countries that have perhaps a large population, but no AI, no technologies, no Google, no Tencent, no Baidu, no Alibaba, no Facebook, no Amazon. These people will basically be data points to countries whose software is dominant in their country.”
  Read more: We Are Here To Create, A Conversation With Kai-Fu Lee (Edge).

AI practitioners grapple with the upcoming information apocalypse:
..And you thought DeepFakes was bad. Wait till DeepWar…
Members of the AI community are beginning to sound the alarm about the imminent arrival of stunningly good, stunningly easy to make synthetic images and videos. In a blog post, AI practitioners say that the increasing availability of data combined with easily accessible AI infrastructure (cloud-rentable GPUs) is lowering the barrier to entry for people that want to make this stuff, and that ongoing progress in AI capabilities means the quality of these fake media is increasing over time.
  How can we deal with these information threats? We could look at how society already makes it hard to forge currencies via making it costly to produce high-fidelity copies and in parallel developing technologies to verify the authenticity of currency materials. Unfortunately, though this may help with some of the problems brought about by AI forgery, it doesn’t deal with the root problems: AI is predominantly embodied in software rather than hardware and so it’s going to be difficult to insert detectable (and non-spoofable) distinct visual/audio signatures into generated media barring some kind of DRM-on-steroids. One solution could be to train AI classifiers on real and faked datasets from the same domain so as to provide classifiers to spot faked media in the wild.
  Read more: Commoditisation of AI, digital forgery and the end of trust: how we can fix it.

Berkeley researchers use Soft Q-Learning to let robots compose solutions to tasks:
…Research reduces the time it takes to learn new behaviors on robots…
Berkeley researchers have figured out how to use soft q-learning, a recently introduced variant of traditional q-learning, to let robots learn more efficiently. They introduce a new trick where they’re able to learn to compose new q-functions from existing learned policies, letting them, for example, train a robot to move its arm to a particular distribution of X positions, then to a particular distribution of Y positions, then they can create a new policy which moves the arm to the intersection of the X and Y positions without having been trained on the combination previously. This sort of learning is typically quite difficult to achieve in a single policy as it requires so much exploration that most algorithms will spend a long time trying and failing to succeed at the task.
  Real world: The researchers train real robots to succeed at tasks like reaching to a specific location and stacking Lego blocks. They also demonstrate the utility of combining policies by training a robot to avoid an obstacle near its arm and separately training it to stack legos, then combine the two policies allowing the robot to stack blocks while avoiding an obstacle, despite having never been trained on the combination before.
  Why it matters: The past few years of AI progress have let us get very good at developing systems which excel at individual capabilities; being able to combine capabilities in an ad-hoc manner to generate new behaviors further increases the capabilities of AI systems and makes it possible to learn a distribution of atomic behaviors then chain these together to succeed at far more complex tasks than those found within the training set.
  Read more: Composable Deep Reinforcement Learning for Robotic Manipulation (Arxiv).

Think your AI model has a good understanding of physics? Run it on IntPhys and prepare to be embarrassed:
…Testing AI systems in the same way we test infants and creatures…
INRIA and Facebook and CNRS researchers have released IntPhys, a new way to evaluate AI systems’ ability to model the physical world around them using what the researchers call a ‘physical plausibility test’. IntPhys follows in a recent trend in AI for testing systems on tougher problems that more closely map to the sorts of problems humans typically tackle (see, AI2’s ‘ARC’ dataset for written reasoning, and DeepMind’s cognitive science-inspired ‘PsychLab’ environment).
  How it works: IntPhys presents AI systems with movies of scenes rendered in UnrealEngine4 and challenges them to figure out whether one scene can lead to another, letting them test models’ ability to internalize fundamental concepts about the world like object permanence, causality, etc. Systems need to compute a “plausibility score” for each of the scenes or scene combinations they are shown, then use this to figure out if the systems have learned about the underlying dynamics of the world.
  The IntPhys Benchmark: v1 of IntPhys focuses on unsupervised learning. The first version tests systems’ ability to understand object permanence. Future releases will include more tests for things like shape constancy, spatio-temporal continuity, and so on. The initial IntPhys release contains 15,000 videos of possible events, each video around 7 seconds long running at 15fps, totalling 21 hours of videos. It also incorporates some additional information so you don’t have to attempt to solve the task in a purely unsupervised manner, including depth of field data for each image, as well as object instance segmentation masks.
  Baseline Systems VERSUS Humans: The researchers create two baselines for others to evaluate their systems against: a CNN encoder-decoder system, and a conditional GAN. “Preliminary work with predictions at the pixel level revealed that our models failed at predicting convincing object motions, especially for small objects on a rich background. For this reason, we switched to computing predictions at a higher level, using object masks.” The researchers tested humans on their system, finding that humans had an average error rate of about 8 percent when the scene is visible and 25 percent when the scene contains partial occlusion. Neural network-based systems, by comparison, had errors of 31 percent on visible scenes and 50 percent on partially occluded scenes.
  What computers are up against: “At 2-4 months, infants are able to parse visual inputs in terms of permanent, solid and spatiotemporally continuous objects. At 6 months, they understand the notion of stability, support and causality. Between 8 and 10 months, they grasp the notions of gravity, inertia, and conservation of momentum in collision; between 10 and 12 months, shape constancy, and so on,” the researchers write.
  Why it matters: Tests like this will give us a greater ability to model the abilities of AI systems to perform fundamental acts of reasoning, and as the researchers extend the benchmark with more challenging components we’ll be able to get a better read on what these systems are actually capable of. As new components are added “the prediction task will become more and more difficult and progressively reach the level of scene comprehension achieved by one-year-old humans,” they write.
  Competition: AI researchers can download the dataset and submit their system scores to an online leaderboard at the official IntPhys website here (IntPhys).
  Read more: IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning (Arxiv).

Kindred researchers explain how to make robots repeatable:
…Making the dream of repeatable robot experiments a reality…
Researchers with robot AI startup Kindred have published a paper on a little-discussed subject in AI: repeatable real-world robotics experiments. It’s a worthwhile primer on some of the tweaks people need to make to create robot development environments that are a) repeatable and b) effective.
  Regular robots: The researchers set up a reaching task using a Universal Robotics ‘UR5’ robot arm and describe the architecture for the system. One key difference between simulated and real world environments is the role of time, where in simulation one typically executes all the learning and action updates synchronously, whereas in real robots you need to do stuff asynchronously. “In real-world tasks, time marches on during each agent and environment-related computations. Therefore, the agent always operates on delayed sensorimotor information,” they explain.
  Why it matters: It’s currently very difficult to model progress in real-world robotics due to the diversity of tasks and the lack of trustworthy testing regimes. Papers like this suggest a path forward and I’d hope they encourage researchers to try to structure their experiments to be more repeatable and reliable. If we’re able to do this then we’ll be able to better develop intuitions about the rate of progress in the field which should help for forecasting trends in development – a critical thing to do, given how much robots are expected to influence employment in the regions they are deployed into.
  Read more here: Setting up a Reinforcement Learning Task with a Real-World Robot (Arxiv).

Salesforce researchers demonstrate the value of simplicity for language modelling:
…Well-tuned LSTM or QRNN-based systems shown to beat more complex systems…
Researchers with Salesforce have shown that well-tuned basic AI components can attain superior performance on tough language tasks than more sophisticated and in many cases more modern systems. Their research shows that RNN-based systems that model language using well-tuned, simple components like LSTMs or the Salesforce-inventred QRNN beat more complex models like recurrent highway networks, hyper networks, or systems found by neural architecture search. This result highlights that much of the recent progress in AI may to some extent be illusory: jumps in performance on certain datasets that have previously been assumed to be possible due to fundamentally new capabilities in new models are now being shown to be within reach of simpler components that are tuned and tested comprehensively.
  Results: The researchers test their QRNN and LSTM-based systems against the Penn Treebank and enwik8 character-level datasets and the word-level WikiText-103 dataset, beating state-of-the-art  scores on Penn Treebank and enwik8 when measured by bits-per-character, and significantly outperforming SOTA on perplexity on WikiText-103.
  Why it matters: This paper follows prior work showing that many of our existing AI components are more powerful than researchers suspected, and follows research that has shown that fairly old systems like GANs or DCGANs can adeptly model data distributions more effectively than sophisticated successor systems. That’s not to say this should be taken as a sign that the subsequent inventions are pointless, but it should cause researchers to devote more time to interrogating and tuning existing systems rather than trying to invent different proverbial wheels. “Fast and well tuned baselines are an important part of our research community. Without such baselines, we lose our ability to accurately measure our progress over time. By extending an existing state-of-the-art word level language model based on LSTMs and QRNNs, we show that a well tuned baseline can achieve state-of-the-art results on both character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets without relying on complex or specialized architectures,” they write.
  Read more: An Analysis of Neural Language Modeling at Multiple Scales (Arxiv).

Want to test how well your AI understands language and images? Try VQA 2.0
…New challenge arrives to test AI systems’ abilities to model language and images…
AI researchers that think they’ve developed models that can learn to model the relationship between language and images may want to submit to the third iteration of the Visual Question Answering Challenge. The challenge prompts models to answer questions about the contents of images. Challengers will use the v2.0 version of the VQA dataset, which includes more written questions and ground truth answers about images.
  Read more: VQA Challenge 2018 launched! (

Tech Tales:

Miscellaneous Letters Sent To The Info@ Address Of An AI Company

2023: I saw what you did with that robot so I know the truth. You can’t hide from me anymore I know exactly what you are. My family had a robot in it and the state took them away and told us they were being sent to prison but I know the truth they were going to take them apart and sell their body back to the aliens in exchange for the anti-climate change device. What you are doing with that robot tells me you are going to take it apart when it is done and sell it to the aliens as well. You CANNOT DO THIS. The robot is precious you need to preserve it or else I will be VERY ANGRY. You must listen to me we-

2025: So you think you’re special because you can get them to talk to each other in space now and learn things together well sure I can do that as well I regularly listen to satellites so I can tell you about FLUORIDE and about X74-B and about the SECRET UN MOONBASE and everything else but you don’t see me getting famous for these things in fact it is a burden it is a pain for me I have these headaches. Does your AI get sick as well?-

2027: Anything that speaks like a human but isn’t a human is a sin. You are sinners! You are pretending to be God. God will punish you. You cannot make the false humans. You cannot do this. I have been calling the police every day for a week about this ever since I saw your EVIL creation on FOX-25 and they say they are taking notes. They are onto you. I am going to find you. They are going to find you. I am calling the fire department to tell them about you. I am calling the military to tell them about you. I am calling the-

2030: My mother is in the hospital with a plate in her head I saw on the television you have an AI that can do psychology on other AIs can your AI help my mother? She has a plate in her head and needs some help and the doctors say they can’t do anything for her but they are liars. You can help her. Please can you make your AI look at her and diagnose what is wrong with her. She says the plate makes her have nightmares but I studied many religions for many years and believe she can be healed if she thinks about it more and if someone or something helps her think.

2031: Please you have to keep going I cannot be alone any more-

Things that inspired this story: Comments from strangers about AI, online conspiracy forums, bad subreddits, “Turing Tests”, skewed media portrayals of AI, the fact capitalism creates customers for false information which leads to media ecosystems that traffic in fictions painted as facts.

Import AI: #86: Baidu releases a massive self-driving car dataset; DeepMind boosts AI capabilities via neural teachers; and what happens when AIs evolve to do dangerous, subversive things.

Boosting AI capabilities with neural teachers:
…AKA, why my small student with multiple expert teachers beats your larger more well-resourced teacherless-student…
Research from DeepMind shows how to boost the performance of a given agent on a task by transferring knowledge from a pre-trained ‘teacher’ agent. The technique yields a significant speedup in training AI agents, and there’s some evidence that agents that are taught attain higher performance than non-taught ones. The technique comes in two flavors: single teacher and multi-teacher; agents pretrained via multiple specialized teachers do better than ones trained by a single entity, as expected.
  Strange and subtle: The approach has a few traits that seem helpful for the development of more sophisticated AI agents: in one task DeepMind tests it on the agent needs to figure out how to use a short-term memory to be able to attain a high score. ‘Small’ agents (which only have two convolutional layers) typically fail to learn to use a memory and therefore cannot achieve scores above a certain threshold, but by training a ‘small’ agent with multiple specialized teachers the researchers create one that can succeed at the task. “This is perhaps surprising because the kickstarting mechanism only guides the student agent in which action to take: it puts no constraint on how the student structures its internal memory state. However, the student can only predict the teacher’s behaviour by remembering information from before the respawn, which seems to be enough supervisory signal to drive short-term memory formation. We find this a wonderful parallel with how the best human educators teach: not telling the student what to think, but simply putting the student in a fruitful position to learn for themselves,” the researchers write.
  Why it matters: Trends like this suggest that scientists can speed their own research by using such pre-trained techniques to better evaluate new agents. This adds further credence to the notion that a key input to (some types of) AI research will shift to being compute from pre-labelled static datasets. Though it should be noted that data here is implicit in the form of a procedural, modifiable simulator that researchers can access). More speculatively, this means it may be possible to use mixtures of teachers to train complex agents that far exceed in capabilities any of their forebears – perhaps an area where the sum really will be greater than its parts.
Read more: Kickstarting Deep Reinforcement Learning (Arxiv).

100,000+ developer survey shows AI concerns:
…What developers think is dangerous and exciting, and who they think is responsible…
Developer community StackOverflow has published the results of its annual survey of its community; this year it asked about AI:
– What developers think is “dangerous” re AI: Increasing automation of jobs (40.8%)
– What developers think is “exciting” re AI: AI surpassing human intelligence, aka the singularity (28%)
– Who is responsible for considering the ramifications of AI:
   – The developers or the people creating the AI: 47.8%
   – A governmental or other regulatory body: 27.9%
– Different roles = different concerns: People that identified as technical specialists tended to say they were more concerned about issues of fairness than the singularity, whereas designers and mobile developers tended to be more concerned about the singularity.
  Read more: Developer Survey Results 2018 (StackOverFlow).

Baidu and Toyota and Berkeley researchers organize self-driving car challenge backed by new self-driving car dataset from Baidu:
…”ApolloScape” adds Chinese data for self-driving car researchers, plus Baidu says it has joined Berkeley’s “DeepDrive” self-driving car AI coalition…
A new competition and dataset may give researchers a better way to measure the capabilities and progression of autonomous car AI.
  The dataset: The ‘ApolloScape’ dataset from Baidu contains ~200,000 RGB images with corresponding pixel-by-pixel semantic annotation. Each frame is labeled from a set of 25 semantic classes that include: cars, motorcycles, sidewalks, traffic cones, trash cans, vegetation, and so on. Each of the images has a resolution of 3384 x 2710, and each frame is separated by a meter of distance. 80,000 images have been released as of March 8 2018.
Read more about the dataset (potentially via Google Translate) here.
  Additional information: Many of the researchers linked to ApolloScape will be talking at a session on autonomous cars at the IEEE Intelligent Vehicles Symposium in China.
Competition: The new ‘WAD’ competition will give people a chance to test and develop AI systems on the ApolloScape dataset as well as a dataset from Berkeley DeepDrive (the DeepDrive dataset consists of 100,000 video clips, each about 40 seconds long, with one key frame from each clip annotated). There is about $10,000 in cash prizes available, and the researchers are soliciting papers on research techniques in: drivable area segmentation (being able to figure out which bits of a scene correspond to which label and which of these areas are safe); road object detection (figuring out what is on the road); and transfer learning from one semantic domain to another, specifically going from training on the Berkeley dataset (filmed in California, USA) to the ApolloScape dataset (filmed in Beijing, China).
   Read more about the ‘WAD’ competition here.

Microsoft releases a ‘Rosetta Stone’ for deep learning frameworks:
…GitHub repo gives you a couple of basic operations displayed in many different ways…
Microsoft has released a GitHub repository containing similar algorithms implemented in a variety of frameworks, including: Caffe2, Chainer, CNTK, Gluon, Keras (with backends CNTK/TensorFlow/Theano), Tensorflow, Lasagna, MXNet, PyTorch, and Julia – Knet. The idea here is that if you read one algorithm in one of these frameworks you’ll be able to use that knowledge to understand the other frameworks.
  “The repo we are releasing as a full version 1.0 today is like a Rosetta Stone for deep learning frameworks, showing the model building process end to end in the different frameworks,” write the researchers in a blog post that also provides some rough benchmarking for training time for a CNN and an RNN.
  Read more: Comparing Deep Learning Frameworks: A Rosetta Stone Approach (Microsoft Tech Net).
View the code examples (GitHub).

Evolution’s weird, wonderful, and potentially dangerous implications for AI agent design:
…And why the AI safety community may be able to learn from evolution…
A consortium of international researchers have published some of the weird, infuriating, and frequently funny ways in which evolutionary algorithms have figured out non-obvious solutions and hacks to tasks they’re asked to solve. The paper includes an illuminating set of examples of ways in which algorithms have subverted the wishes of their human overseers, including:
– Opportunistic Somersaulting: When trying to evolve creatures to jump, some agents discovered that they could instead evolve very tall bodies and then somersault, gaining a reward in proportion to their feet gaining distance from the floor.
– Pointless Programs: When researchers tried to evolve code with GenProg to solve a buggy data sorting program, GenProg evolved a solution that had the buggy program return an empty list, which wasn’t scored negatively as an empty list can’t be out of order as it contains nothing to order.
– Physics Hacking: One robot figured out the correct vibrational frequency to surface a friction bug in the floor of an environment in a physics simulator, letting it propel itself across the ground via the bug.
– Evolution finds a way: Another type of bug is the ways that evolution can succeed even when researchers think such success is impossible, like a six-legged robot that figured out how to walk fast without its feet touching the ground (solution: it flipped itself on its back and used the movement of its legs to propel itself nonetheless).
– And so much more!
The researchers think evolution may also illuminate some of the more troubling problems in AI safety. “The ubiquity of surprising and creative outcomes in digital evolution has other cross-cutting implications. For example, the many examples of “selection gone wild” in this article connect to the nascent field of artificial intelligence safety,” the researchers write. “These anecdotes thus serve as evidence that evolution—whether biological or computational—is inherently creative, and should routinely be expected to surprise, delight, and even outwit us.” (emphasis mine).
  Read more: The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities (Arxiv).

Allen AI puts today’s algorithms to shame with new common sense question answering dataset:
…Common sense questions designed to challenge and frustrate today’s best-in-class algorithms…
Following the announcement of $125 million in funding and a commitment to conducting AI research that pushes the limits of what sorts of ‘common sense’ intelligence machines can manifest, the Allen Institute for Artificial Intelligence has released a new ‘ARC’ challenge and dataset researchers can use to develop smarter algorithms.
  The dataset: The main ARC test contains 7787 natural science questions, split across an easy set and a hard set. The hard set of questions are ones which are answered incorrectly by retrieval-based and word co-occurrence algorithms. In addition, AI2 is releasing the ‘ARC Corpus’, a collection of 14 million science-related sentences with knowledge relevant to ARC, to support the development of ARC-solving algorithms. This corpus contains knowledge relevant to 95% of the Challenge questions, AI2 writes.
Neural net baselines: AI2 is also releasing three baseline models which have been tested on the challenge, achieving some success on the ‘easy’ set and failing to be better than random chance on the ‘hard’ set. These include a decomposable attention model (DecompAttn), Bidirectional Attention Flow (BiDAF), and a decomposed graph entailment model (DGEM). Questions in ARC are designed to test everything from definitional to spatial to algebraic knowledge, encouraging the usage of systems that can abstract and generalize concepts derived from large corpuses of data.
Baseline results: ARC is extremely challenging: AI2 benchmarked its prototype neural net approaches (along with others) discovered that scores top out at 60% on the ‘easy’ set of questions and 27% percent on the more challenging questions.
Sample question:Which property of a mineral can be determined just by looking at it? (A) luster [correct] (B) mass (C) weight (D) hardness“.
SQuAD successor: ARC may be a viable successor to the Stanford Question Answering Dataset (SQuAD) and challenge; the SQuAD competition has recently hit some milestones, with companies ranging from Microsoft to Alibaba to iFlyTek all developing SQuAD solvers that attain scores close to human performance (which is about 82% for ExactMatch and 91% for F1). A close evaluation of SQuAD topic areas gives us some intuition as to why scores are so much higher on this test than on ARC – simply put, SQuAD is easier; it pairs chunks of information-rich text with basic questions like “where do most teachers get their credentials from?” that can be retrieved from the text without requiring much abstraction.
Why it matters: “We find that none of the baseline systems tested can significantly outperform a random baseline on the Challenge set, including two neural models with high performances on SNLI and SQuAD,” the researchers write. The big question now is where this dataset falls on the Goldilocks spectrum — is it too easy (see: Facebook’s early memory networks tests) or too hard or just right? If a system were to get, say, 75% or so on ARC’s more challenging questions, it would seem to be a significant step forward in question understanding and knowledge representation
  Read more: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge (Arxiv).
SQuAD scores available at the SQuAD website.
  Read more: SQuAD: 100,000+ Questions for Machine Comprehension of Text (Arxiv).

Tech Tales:

The Ten Thousand Floating Heads

The Ten-K, also known as The Heads, also sometimes known as The Ten Heads, officially known as The Ten Thousand Floating Heads, is a large-scale participatory AI sculpture that was installed in the Natural History Museum in London, UK, in 2025.

The way it works is like this: when you walk into the museum and breathe in that musty air and look up the near-endless walls towards the ceiling your face is photographed in high definition by a multitude of cameras. These multi-modal pictures of you are sent to a server which adds them to the next training set that the AI uses. Then, in the middle of the night, a new model is trained that integrates the new faces. Then the AI system gets to choose another latent variable to filter by (this used to be a simple random number generator but, as with all things AI, has slowly evolved into an end-to-end ‘learned randomness’ system with some auxiliary loss functions to aid with exploration of unconventional variables, and so on) and then it looks over all the faces in the museum’s archives, studies them in one giant embedding, and pulls out the ten thousand that fit whatever variable it’s optimizing for today.

These ten thousand faces are displayed, portrait-style, on ten thousand tablets scattered through the museum. As you go around the building you do all the usual things, like staring at the dinosaur bones, or trudging through the typically depressing and seemingly ever-expanding climate change exhibition, but you also peer into these tablets and study the faces that are being shown. Why these ten thousand?, you’re meant to think. What is it optimizing for? You write your guess on a slip of paper or an email or a text and send it to the museum and at the end of the day the winners get their names displayed online and on a small plaque which is etched with near-micron accuracy (so as to avoid exhausting space) and is installed in a basement in the museum and viewable remotely – machines included – via a live webcam.

The correct answers for the variable it optimizes for are themselves open to interpretation, as isolating them and describing what they mean has become increasingly difficult as the model gets larger and incorporates more faces. It used to be easy: gender, hair color, eye color, race, facial hair, and so on. But these days it’s very subtle. Some of the recent names given to the variables include: underslept but well hydrated, regretful about a recent conversation, afraid of museums, and so on. One day it even put up a bunch of people and no one could figure out the variable and then six months later some PHD student did a check and discovered half the people displayed that day had subsequently died of one specific type of cancer.

Recently The Heads got a new name: The Oracle. This has caused some particular concern within certain specific parts of government that focus on what they euphemistically refer to as ‘long-term predictors’. The situation is being monitored.

Things that inspired this story: t-SNE embeddings, GANs, auxiliary loss functions, really deep networks, really big models, facial recognition, religion, cults.