Import AI

Import AI 146: Making art with CycleGANs; Google and ARM team-up on low-power ML; and deliberately designing AI for scary uses

Chinese researchers use AI to build an artist-cloning mind map system:
…Towards the endless, infinite AI artist…
AI researchers from JD and the Central Academy of Fine Arts in Beijing have built Mappa Mundi, software to let people construct aesthetically pleasing ‘mind maps’ in the style of artist ‘Qui Zhijie’. The software was built to accompany an exhibition of Zhijie’s work.

How it works: The system has three main elements: a speech recognition module which pulls key words from speech; a topic expansion system which takes these words and pulls in other concepts from a rule-based knowledge graph; and software for image projection which uses any of 3,000 distinct painting elements to depict key words. One clever twist: the system automatically creates visual ‘routes’ between different words by analyzing their difference in the knowledge graph and using that to generate visualizations.

A reflexive, reactive system: Mappa Mundi works in-tandem with human users, growing and changing its map according to their inputs. “The generated information, after being presented in our system, becomes the inspiration for artist’s next vocal input,” they write. “This artwork reflects both the development of artist’s thinking and the AI-enabled imagination”.

Why this matters: I’m forever fascinated by the ways in which AI can help us better interact with the world around us, and I think systems like ‘Mappa Mundi’ give us a way to interact with the idiosyncratic aesthetic space defined by another human.
  Read more: Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination (Arxiv).
  Read more about Qiu Zhijie (Center for Contemporary Art).

#####################################################

Using AI to simulate and see climate change:
…CycleGAN to the rescue…
In the future, climate change is likely to lead to catastrophic flooding around the world, drowning cities and farmland. How can we make this likely future feel tangible to people today? Researchers with the Montreal Institute for Learning Algorithms, ConscienceAI Labs, and Microsoft Research, have created a system that can take in a Google Street View image of a house, then render an image showing how that house will look like under a predicted climate change future.

The resulting CycleGAN-based system does a decent job at rendering pictures of different houses under various flooding conditions, giving the viewer a more visceral sense of how climate change may influence where they live in the future.

Why this matters: I’m excited to see how we use the utility-class artistic capabilities of modern AI tools to simulate different versions of the world for people, and I being able to easily visualize the effects of climate change may help us make more people aware of how delicate the planet is.
  Read more: Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks (Arxiv).

#####################################################

Google and ARM plan to merge low-power ML software projects:
…uTensor, meet TensorFlow Lite…
Google and chip designer ARM are planning to merge two open source frameworks for running machine learning systems on low-power ‘Arm’ chips. Specifically, uTensor is merging with Google’s ‘TensorFlow Lite’ software. The two organizations expect to work together to further increase the efficiency of running machine learning code on ARM chips.

Why this matters: As more and more people try to deploy AI to the ‘edge’ (phones, tablets, drones, etc), we need new low-power chips on which to run machine learning systems. We’ve got those chips in the form of processors from ARM and others, but we currently lack many of the programming tools needed to extract the greatest amount of performance as possible from this hardware. Software co-development agreements, like the one announced by ARM and Google, help standardize this type of software, which will likely lead to more adoption.
  Read more: uTensor and Tensor Flow Announcement (Arm Mbed blog).

#####################################################

Microsoft wants your devices to record (and transcribe) your meetings:
…In the future, our phones and tablets will transcribe our meetings…
In the future, Microsoft thinks people attending the same meeting will take out their phones and tablets, and the electronic devices will smartly coordinate to transcribe the discussions taking place. That’s the gist of a new Microsoft research paper, which outlines a ‘Virtual Microphone Array’ made of “spatially distributed asynchronous recording devices such as laptops and mobile phones”.

Microsoft’s system can integrate audio from a bunch of devices spread throughout a room and use it to transcribe what is being said. The resulting system (trained on approximately 33,000 hours of in-house data) is more effective than single microphones at transcribing natural multi-speaker speech during the meeting; “there is a clear correlation between the number of microphones and the amount of improvement over the single channel system”, they write. The system struggles with overlapping speech, as you might expect.

Why this matters: AI gives us the ability to approximate things, and research like this shows how the smart use of AI techniques can let us approximate the capabilities of dedicated microphones, piecing one virtual microphone together out of a disparate set of devices.
  Read more: Meeting Transcription Using Virtual Microphone Arrays (Arxiv).

#####################################################

One language model, trained in three different ways:
…Microsoft’s Unified pre-trained Language Model (UNILM) is a 3-objectives-in-1 transformer…
Researchers with Microsoft have trained a single, big language model with three different objectives during training, yielding a system capable of a broad range of language modeling and generation tasks. They call their system the Unified pre-trained Language Model (UNILM) and say this approach has two advantages relative to single-objective training:

  • Training against multiple objectives means UNILM is more like a 3-in-1 system, with different capabilities that can manifest for different tasks.
  • Parameter sharing during joint training means the resulting language model is more robust as a consequence of being exposed to a variety of different tasks under different constraints

The model can be used for natural language understanding and generation tasks and, like BERT and GPT, is based on a ‘Transformer’ component. During training, UNILM is given three distinct language modelling objectives: bidirectional (predicting words based on those on the left and right; useful for general language modeling tasks, used in BERT); unidirectional (predicting words based on those to the left; useful for language modeling and generation, used in GPT2); and sequence-to-sequence learning (mapping sequences of tokens to one another, subsequently used in ‘Google Smart Reply’).

Results: The trained UNILM system obtains state-of-the-art scores on summarization and question answering tasks, and also sets state-of-the-art on text generation tasks (including the delightful recursive tasks of learning to generate appropriate questions that map to certain answers). The model also obtains a state-of-the-art score on the multi-task ‘GLUE’ benchmark (though note GLUE has subsequently been replaced by ‘SuperGLUE’ due to its creators thinking it is a little too easy.

Why this matters: Language modelling is undergoing a revolution as people adopt large, simple, scalable techniques to model and generate language. Papers like UNILM gesture towards a future where large models are trained with multiple distinct objectives over diverse datasets, creating utility-class systems that have a broad set of capabilities.
  Read more: Unified Language Model Pre-training for Natural Language Understanding and Generation (Arxiv).

#####################################################

AI… for Bad!
…CHI workshop is an intriguing direction for AI research..
This week, some researchers gathered together to prototype the ways in which their research could be used for evil. This workshop ‘CHI4Evil, Creative Speculation on the Negative Effects of HCI Research’, was held at the ACM CHI Conference on Human Factors in Computing Systems, and was designed to investigate various ideas in HCI through the lens of designing deliberately bad or undesirable systems.

Why this matters: Prototyping the potential harms of technology can be pretty useful for calibrating thinking about threats and opportunities (see: GPT-2), and thinking about such harms through the lens of human-computer interaction (HCI, or CHI) feels likely to yield new insights. I’m excited for future “AI for Bad” conferences (and would be interested to co-organize one with others, if there’s interest).
  Read more: CHI4EVIL website.

#####################################################

Facial recognition is a hot area for venture capitalists:
…Chinese start-up Megvii raise mucho-moolah…
Megvii, a Chinese computer vision startup known by some as ‘Face++’ has raised $750 million in a funding round. Backers include the Bank of China Group Investment Ltd; a subsidiary of the Abu Dhabi Investment Authority, and Alibaba Group. The company plans to IPO soon.

Why this matters: Chinese is home to numerous large-scale applications of AI for usage in surveillance, and is also exporting surveillance technologies via its ‘One Belt, One Road’ initiative (which frequently pairs infrastructure investment with surveillance).
  This is an area fraught with both risks and opportunities – the risks are that we sleepwalk into building surveillance societies using AI, and the opportunities are that (judiciously applied) surveillance technologies can sometimes increase public safety, given the right oversight. I think we’ll see numerous Chinese startups push the boundaries of what is thought to be possible/deployable here, so watching companies like Megvii feels like a leading indicator for what happens when you combine surveillance+society+capitalism.
  Read more: Chinese AI start-up Megvii raises $750 million ahead of planned HK IPO (Reuters).

Chatbot company builds large-scale AI system, doesn’t fully release it:
…Startup Hugging Face restricts release of larger versions of some models following ethical concerns…
NLP company Hugging Face has released a demo, tutorial, and open-source code for creating a conversational AI based on OpenAI’s Transformer-based ‘GPT2‘ system.
   Ethics in action: The company said it decided not to release the full GPT2 model for ethical reasons – it thought the technology had a high chance of being used to improve spam-bots, or to perform “mass catfishing and identity fraud”. “We are aligning ourselves with OpenAI in not releasing a bigger model until they do,” the organization wrote.
  Read more: Ethical analysis of the open-sourcing of a state-of-the-art conversational AI (Medium).
  Read more about Hugging Face here (official website).

#####################################################

Tech Tales

The Evolution Game

They built the game so it could run on anything, which meant they had to design it differently to other games. Most games have a floor on their performance – some basic set of requirements below which you can’t expect to play. But not this game. Neverender can run on your toaster, or fridge, or watch, and it can just as easily run on your home cinema, or custom computer, and so on. Both the graphics and gameplay change depending on what it is running on – I’ve spent hours stood fiddling with the electronic buttons on my oven, using them to move a small character across a simplified Neverender gameboard, and I’ve also spent hours in my living room navigating a richly-rendered on screen character through a lush, Salvador Dali-esque horrorworld. I’m what some people call a Neverheader, or what others call a Nevernut. If you didn’t know anything about the game, you’d probably call me a superfan.

So I guess that’s why I got the call when Neverender started to go sideways. They brought me in and asked me to play it and I said “what else?”

“Just play it,” they said.

So I did. I sat in my livingroom surrounded by a bunch of people in suits and I played the game. I navigated my character past the weeping lands and up into eldritch keep and beyond, to the deserts of dream. But when I got to the deserts they were different: the sand dunes had grown in size, and some of them now hosted cave entrants. Strange lights shot out of them. I went into one and was killed almost instantly by a beam of light that caused more damage than all the weapons in my inventory combined. After I was reborn at the spawn point I proceeded more carefully, skirting these light-spewing entrances, and trying to walk further across the sand plains to whatever lay beyond.

The game is thinking, they tell me. In the same way Neverender was built to run on anything, its developers recently rolled out a patch that let it use anything. Now, all the game clients are integrated with the game engine and backend simulation environment, sharing computational resources with eachother. Mostly, it’s leading to better games and more entertained players. But in some parts of the gameworld, things are changing that should not be changing: larger sand dunes with subterranean cities inside themselves? Wonderful! That’s the sort of thing the developers had hoped for. But having the caves be controlled by beams of light of such power that no one can go and play within them? That’s a lot less good, and something which no one had expected.

My official title now is “Explorer”, but I feel more like a Spy. I take my character and I run out into the edges of the maps of Neverender, and usually I find areas where the game is modifying itself or growing itself in some way. The code is evolving. One day we took off the local sandbox systems, letting Neverender deploy more code, deeper into my home system. As I played the game the lights began to flicker, and when I was in a cave I discovered some treasure and the game automatically fired up some servers which we later discovered it was using to to high-fidelity modelling of the treasure.

The question we all ask ourselves, now, is whether the things Neverender is building within itself are extensions of the game, or extensions of the mind behind the game. We hear increasing reports of ‘ghosts’ seen across the game universe, and of intermittent cases of ‘kitchen appliance sentience’ in the homes of advanced players. We’ve even been told by some that this is all a large marketing campaign, and any danger we think is plausible, is just a consequence of us having over-active imaginations. Nonetheless, we click and play and explore.

Things that inspired this story: Endless RPGs; loot; games that look like supercomputers such as Eve Online; distributed computation; relativistic ideas deployed on slower timescales.

Import AI 145: Testing general intelligence in Minecraft; Google finds a smarter way to generate synthetic data; who should decide who decides the rules of AI?

Think your agents have general intelligence? Test them on MineRL:
…Games as procedural simulators…
How can we get machines to learn to perform complicated, lengthy sequences of actions? One way is to have these machines learn from human demonstrations – this captures a hard AI challenge, in the form of requiring algorithms that can take in a demonstration as input but generalize to different situations. It also short-cuts another hard AI challenge: exploration, which is the art of designing algorithms that can explore enough of the problem space they can attempt to solve the task, rather than get stuck enroute.
  Now, an interdisciplinary team of researchers led by people from Carnegie Mellon University have created the MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors. This competition uses the ‘Minecraft’ computer game as a testbed in which to train smart algorithms, and ships with a dataset called MineRL-v0 which consists of 60 million state-action pairs of human demonstrations of tasks being solved within MineCraft.

The Challenge: MineRL will test the ability for agents to solve one main challenge: ‘0btainDiamond’, this challenge requires their agents to go and find a diamond somewhere in the (procedurally generated) environment they’ve been dropped into. This is a hard task: “Diamonds only exist in a small portion of the world and are 2-10 times rarer than other ones in Minecraft,” the authors write. “Additionally, obtaining a diamond requires many prerequisite items. For these reasons, it is practically impossible for an agent to obtain a diamond via naive random exploration”.

Auxiliary Environments: 0btainDiamond is such a hard challenge that the competition organizers have released six additional ‘auxiliary environments’ to help people train AI systems capable of solving the challenge. These are designed to encourage the development of several skills necessary to being able to easily find a diamond: navigating the environment, chopping down trees, surviving in the environment, and obtaining three different items – a bed (which needs to be assembled out of three items), meat (of a specific animal), and a pickaxe (which is needed to mine the diamond).

The dataset: MineRL-v0 is 60-million state-action-(reward) sets, recorded from human demonstrations. The state includes things like what the player sees as well as their inventory and distances to objectives and attributes and so on.

Release plans: The researchers aim to soon release the full environment, data, and numerous algorithmic baselines. They will also publish further details of the competition as well, which will encourage contestants to submit via ‘NC6 v2’ instances on Microsoft’s ‘Azure’ cloud.

Why this matters: MineCraft has many of the qualities that make for a useful AI research platform: its tasks are hard, the environments can be generated procedurally, and it contains a broad & complex enough range of tasks to stretch existing systems. I also think that its inherently spatial qualities are useful, and potentially let researchers specify even harder tasks requiring systems capable of hierarchical learning and operating over long timescales.
  Read more: The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors (Arxiv).

#####################################################

Who gets to decide the rules for AI?
…Industry currently has too much power, says Harvard academic…
The co-director for Harvard University’s Berkman Klein Center for Internet & Society is concerned that industry will determine the rules and regulations of AI, leading to significant societal ramifications.

In an essay in Nature, Yochai Benkler writes that: “Companies’ input in shaping the future of AI is essential, but they cannot retain the power they have gained to frame research on how their systems impact society or on how we evaluate the effect morally.”

One solution: Benkler’s main fix is to apply funding. Organizations working to ensure that AI is fair and beneficial must be publicly funded, subject to peer review and transparent to civil society. And society must demand increased public investment in independent research rather than hoping that industry funding will fill the gap without corrupting the process.”
  I’d note the challenge here is that many governments are loathe to invest in building their own capacity for technical evaluation, and tend to defer to industry inputs.

Why this matters: AI is sufficiently powerful to have political ramifications and these will have a wide-ranging effect on society – we need to ensure there is equitable representation here, and I think coming up with ways to do that is a worthy challenge.
  Read more: Don’t let industry write the rules for AI (Nature).

#####################################################

Anki shuts down:
…Maker of robot race cars, toys, shuts down…
Anki, an AI startup that most recently developed a toy/pet robot named ‘Cozmo’, has shut down. Cozmo was a small AI-infused robot that could autonomously navigate simple environments (think: uncluttered desks and tables), and could use its lovingly animated facial expressions to communicate with people. Unfortunately, like most robots that use AI, it was overly brittle and prone to confusing failures – I had purchased one, and found it to be frustrating for inconsistently responding to voice commands, occasionally falling off the edge of my table (and more frustratingly, falling off the same part of the table multiple times in a row),

Why this matters: Despite (pretty much) everyone + their children thinking it’d be cool to have lots of small, cute, pet robots wandering around the world, it hasn’t happened. Why is this? Anki gives us an indication: making consumer hardware is extremely difficult, and robots are particularly hard due to the combination of relatively low production volumes (making it hard to make them cheap) as well as
  Read more: The once-hot robotics startup Anki is shutting down after raising more than $200 million (Vox / Recode).

#####################################################

Google gets its AI systems to generate their own synthetic data:
…What’s the opposite of garbage in / garbage out? Perhaps UDA in / Decent prediction out…
Google wants to spend dollars on compute to create more data for itself, rather than gather as much data – that’s the idea behind ‘Unsupred Data Augmentation’ (UDA), a new technique from Google that can automatically generate synthetic versions of unlabeled data, giving neural networks more information to learn from. Usage of the technique sets a new state-of-the-art on six language tasks and three vision tasks.

How it works: UDA “minimizes the KL divergence between model predictions on the original example and an example generated by data augmentation”, they write. By using a targeted objective, the technique generates valid, realistic perturbations of underlying data, which are also sufficiently diverse to help systems learn.

Better data, better results: In tests, UDA leads to significant across-the-board improvements on a range of language tasks. It can also match (or approach) state-of-the-art performance on various tasks while using significantly less data. They also test their approach on ImageNet – a much more challenging task. They test UDA’s performance in two ways: seeing how well it does as using a hybrid of labelled and unlabeled data on ImageNet (specifically: 10% labelled data, 90% unlabelled), and how well it does at automatically augmenting the full ImageNet dataset: UDA leads to a significant absolute performance improvement when using a hybrid of labelled and unlabelled data. It also marginally improves performance on the full ImageNet set – impressive, considering how

Why this matters: Being able to arbitrage compute for data changes the economic dynamics of developing increasingly powerful AI systems; techniques like UDA show how in the long term, compute could become as strategic (or in some cases more strategic) than data.
  Read more: Unsupervised Data Augmentation (Arxiv).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

US seeking leadership in AI technical standards:
President Trump’s Executive Order on AI tasked the National Institute of Standards and Technology (NIST) with developing a plan for US engagement with the development of technical standards for AI. NIST have made a request for information, to understand how these standards might be developed and used in support of reliable, robust AI systems, and how the Federal government can play a leadership role in this process.

Why standards matter: A recent report from the Future of Humanity Institute looked at how technical standards for AI R&D might inform global solutions to AI governance challenges. Historically, international standards bodies have governed policy externalities in cybersecurity, sustainability, and safety. Given the challenges of trust and coordinating safe practices in AI development and deployment, standards setting could play an important role.
  Read more: Artificial Intelligence Standards – Request for Information (Gov).
  Read more: Standards for AI Governance (Future of Humanity Institute).

Eric Schmidt steps down from Alphabet Board
  Former Google CEO and Chairman Eric Schmidt has stepped down from the Board at Alphabet, Google’s parent company. Diane Green, former CEO of Google Cloud, has also stepped down. Robin L. Washington, CFO of pharmaceutical firm Gilead, joins in their place. in their place Schmidt remains one of the company’s largest shareholders, behind founders Larry Page and Sergey Brin.
   Read more: Alphabet Appoints Robin L. Washington to its Board of Directors (Alphabet)

#####################################################

Tech Tales:

The Flower Garden

We began with a flower garden that had a whole set of machines inside of it as well as flower beds and foliage and all the rest of the things that you’d expect. It was optimized for being clean and neat and fitting a certain kind of design and so people would come to it and complement it on its straight angles and uniquely textured and laid out flower beds. Over time, the owners of this garden started to add more automation to the garden – first, a solar panel, to slurp down energy from the sun in the day and then at night power the lights that gave illumination to the plants in the middle of the warm dark.

But as technology advanced so too did the garden – it gained cameras to monitor itself and then smart watering systems that were coupled with the cameras to direct different amounts of water to plants based not only on the schedule, but on what the AI system thought they needed to grow “best”. A little after that the garden gained more solar panels and some drones with delicate robot arms: the drones were taught how to maintain bits of the garden and they learned how to use their arms to take vines and move them so they’d grow in different directions, or to direct flowers so as not to crowd eachother out.

The regimented garden was now perhaps one part machine for every nine parts foliage: people visited and marvelled at it, and wondered among themselves how advanced the system could become, and whether one day it could obviate the need for human gardeners and landscapers and designers at all.

Of course, eventually: the things did get smart enough to do this. The garden gradually, then suddenly, went from being cultivated by humans to being cultivated by machines. But, as with all things, change happened. It went out of fashion. People stopped visiting, then stopped visiting at all.

And so one day the garden was sold to a private owner (identity undisclosed). The day the owner took over they changed the objective of the AI system that tended to the garden – instead of optimizing for neatness and orderliness, optimize for growth.

Gradually, then suddenly, the garden filled with life. More and more flowers. More and more vines. Such fecund and dense life, so green and textured and strange. It was a couple of years before the first problem from this change: the garden started generating less power for its solar panels, as the greenery began to occlude things. For a time, the system optimised itself and the vegetation was moved – gently – by the drones, and arranged so as to not grow there. This conflicted with the goal of maximizing volume. So the AI system – as these things are wont to do – improvised a solution: one day one of the drones dropped down near one of the solar panels and used its arm to create a little gap between the panel and the ground. Then another drone landed and scraped some rocks into the space. In this way the drones slowly, then suddenly, changed the orientation of the panels, so they could acquire more light.

But the plants kept growing, so the machines took more severe actions: now the garden is famous not for its vegetation, but for the solar panels that have been raised by the interplay between machine and vegetation, as – lifted by various climbing vines – they raise in step with the growth of the garden. On bright days, elsewhere in the city, you can see the panels shimmer as winds shake the vines and other bits of vegetation they are attached to, causing them to cast flashes like the scales of some huge living fish, seen from a great distance.

Are the scales of a fish and its inner fleshy parts so different, as the difference between these panels and their drones, people wonder.

Things that inspired this story: Self-adapting systems; Jack and the Beanstalk; the view of London from Greenwich Observatory; drones; plunging photovoltaic prices; reinforcement learning; reward functions.

Import AI 144: Facial recognition sighted in US airports; Amazon pairs humans&AI for data labeling; Facebook translates videos into videogames

Amazon uses machine learning to automate its own data-labeling humans:
…We heard you like AI so much we put AI inside your AI-data-labeling system…
Amazon reveals ProduceNet, an ImageNet-inspired dataset of products. ProductNet is designed to help researchers train models that have as subtle and thorough an understanding of products, as equivalently-trained systems have with regard to classes of images. The goal for Amazon is to be able to better learn how to categorize products, and the researchers say in tests that this system can significantly improve the effectiveness of human data labelers.

Dataset composition: ProductNet consists of 3900 categories of product, with roughly 40-60 products for each category. “We aim at the diversity and representativeness of the products. Being representative, the labeled data can be used as reference products to power product search, pricing, and other business applications,” they write. “Being diverse, the models are able to achieve strong generalization ability for unlabeled data, and the product embedding is also able to represent richer information”.

ProductNet, what is it good for? ProductNet’s main purpose appears to be helping Amazon to develop better systems to help its human contractors more efficiently label data, and creating a system that can directly label itself.

Labelling: ProductNet is designed to be tightly-integrated with human workers, who can collectively help Amazon better label its various items while continuously calibrating the AI system. It works like this: they start off by using a basic system (eg, Inception-v4 trained on ImageNet for processing images, and fastText for processing text data) to use to search over unlabelled images, then the humans annotate these and the labels are fed back into the master model, which is then used to surface more specific products, which the humans then annotate, and so on.

  20X gain: In tests, Amazon says human annotators augmented via ProductNet can label 100 things to flesh out the edge of a model in about 30 minutes, compared to humans who don’t have access to the model which only manage around five data points during this time period. This represents a 20X gain through the use of the system, Amazon says.   Read more: ProductNet: a Collection of High-Quality Datasets for Product Representation Learning (Arxiv).

#####################################################

AI + Facial Recognition + Airlines:
What does it mean when airlines use facial recognition instead of passports & boarding passes to let people onto planes? We can get a sense of the complex feelings this experience provokes by reading a Twitter thread from someone who experienced it, then questioned the airline (JetBlue) about its use of the tech.
 Read about what happens when someone finds facial recognition systems deployed at the boarding gate. (Twitter).

#####################################################

Down on the construction site: How to deploy AI in a specific context and the challenges you’ll encounter:
…AI is useless unless you can deploy it…
There’s a big difference between having an idea and implementing that idea; research from Iowa State University highlights this by discussing the steps needed to go from selecting a problem (for example: training image recognition systems to recognize images from construction sites) to solving that problem.

“Based on extensive literature review, we found that most of the studies focus on development of improved techniques for image analytics, but a very few look at the economics of final deployment and the trade-off between accuracy and costs of deployment,” the authors write. “This paper aims at providing the researchers and engineers a practical and comprehensive deep learning based solution to detect construction equipment from the very first step of development to the last step, which is deployment of the solution”.

Deployment – more than just a discrete step: The paper highlights the sorts of tradeoffs people need to make as they try to deploy systems, ranging from the lack of good open datasets for specific contexts (eg, here the users try to train a model for use on construction sites off of the comparatively small ‘AIM’ subset of ImageNet) to the need to source efficient models (they use MobileNet), to needing to customize those models for specific hardware platforms (Raspberry Pis, Intel Jetsons, Intel Neural Compute Sticks, and so on.

Why this matters: As AI enters its deployment phase, research like this gives us a sense of the gulf between most research papers and actual deployable systems. It also provides a further bit of evidence in favor of ‘MobileNet’, which I’m seeing crop up in an ever-increasing number of papers concerned with deploying AI systems, as opposed to just inventing them.
  Read more: A deep learning based solution for construction equipment detection: from development to deployment (Arxiv).

#####################################################

Enter the AI-generated Dungeon:
… One big language model plus some crafted sentences = fun…
Language models have started to get much more powerful as researchers have combined flexible components (eg: Transformers) with large datasets to train big, effective general-purpose models (see: ULMFiT, GPT2, BERT, etc). Language models, much like image classifiers, have a ranger of uses, and so it’s interesting to see someone use a GPT2 model to create an online AI Dungeon game, where you navigate a scenario via reading blocks of texts and picking options – the twist here is it’s all generated by the model.
  Play the game here: AI Dungeon.

#####################################################

Facebook wants to make videos into videogames:
…vid2game extracts playable characters from videos…
Facebook AI Research has published vid2game, an AI system that lets you select a person in a public video on the internet and develop the ability to control them, as though they are a character in a videogame. The approach also lets them change the background, so a tennis player can – for instance – walk off of the court and onto a (rendered) dirt road, et cetera.

The technique relies on two components: Pose2Pose and Pose2Frame; Pose2Pose lets you select a person in some footage and extract their pose information by building a 3D model of their body and using that to help you move them. Pose2Frame helps to match this body to a background, which lets you use this technology to take a person, control them, and change the context around them.

Why this matters: Systems like this show how we can use AI to (artificially) add greater agency to the world around us. This approach “paves the way for new types of realistic and personalized games, which can be casually created from everyday videos”, Facebook wrote.
  Read more: Vid2Game: Controllable Characters Extracted from Real-World Videos (Arxiv).
  Watch the technology work here (YouTube).

#####################################################

Making AI systems that can read the visual world:
…Facebook creates dataset and develops technology to help it train AI models to read text in pictures…
Researchers with Facebook AI Research and the Georgia Institute of Technology want to create AI systems that can look at the world around us – including the written world – and answer questions about it. Such systems could be useful to people with vision impairments who could interact with the world by asking their AI system questions about it, eg: what is in front of me right now? What items are on the menu in the restaurant? Which is the least expensive item on the menu in the restaurant? And so on.

If this sounds so simple, why is it hard? Think about what you – a computer – are being asked to do when required to parse some text in an image in response to a question. You’re being asked to:

  • Know when the question is about text
  • Figure out the part of the image that contains text
  • Convert these pixel representations into word representations
  • Reason about both the text and the visual space
  • Decide on whether the answer to the question involves copying some text from the image and feeding it to the user, or whether the answer involves understanding the text in the picture and using that to further reason about stuff.

The TextVQA Dataset: To help researchers tackle this, the authors release TextVQA, a dataset containing 28,408 images from OpenImages, 45,336 questions associated with these images, and 453,360 ground truth answers.

Learning to read images: The researchers develop a model they call LoRRA, short for Look, Read, Reason & Answer. LoRRA staples together some existing Visual Question Answering (VQA) systems, with a dedicated optical character recognition (OCR) module. It also has an Answer Module, loosely modeled on Pointer Networks, which can figure out when to incorporate words the OCR module has parsed but which the VQA module doesn’t necessarily understand.

Human accuracy versus machine accuracy: Human accuracy on the TextVQA dataset is about 85.01%, the researchers say. Meanwhile, the best-performing model the researchers develop (based on loRRA) obtains a top accuracy of 26.56% – suggesting we have a long way to go before we get good at this.

Why this matters: Building AI systems that can ingest enough information about the world to be able to augment people seems like one of the more immediate, high-impact uses of the technology. The release of a new dataset here should encourage more progress on this important task.
  Read more: Towards VQA Models that can Read (Arxiv).
  Get the dataset: TextVQA (Official TextVQA website).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Russia calls for international agreements on military AI:
Russia’s security chief has spoken publicly about the need for international regulation on military uses of AI and emergent technologies, which he said could be as dangerous as weapons of mass destruction. He said it is necessary to “activate the powers of the global community, chiefly at the UN”, to develop an international regulatory framework. This comes as a surprise, given that Russia have previously been among the countries resisting moves towards international agreements on lethal autonomous weapons.
   Read more: Russia’s security chief calls for regulating use of new technologies in military sphere (TASS).

#####################################################

OpenAI Bits & Pieces:

Making music with OpenAI MuseNet:
We’ve been experimenting with big, generative models (see: our language work on GPT-2). We’re interested in how we can explore generations in a variety of mediums to better understand how to build more creative systems. To that end, we’ve developed MuseNet, a Transformer-based system that can generate 4-minute musical compositions with 10 different instruments, combining styles from Mozart to the Beatles
  Listen to some of the samples and try out MuseNet here (OpenAI Blog).

Imagining weirder things with Sparse Transformers:
We’ve recently developed the Sparse Transformer, a Transformer-based system that can extract patterns from sequences 30x longer than possible previously. What this translates to is generically better generative models, and gives us the ability to extract more subtle features from bigger chunks of data.
  Generative Modeling with Sparse Transformers (OpenAI Blog).

#####################################################

Tech Tales:

Keep It Cold

We were at a bar in the desert with a man who liked to turn things cold. He had a generator with him hooked up to a refrigeration unit and it was all powered by some collection of illicit jerry cans of gasoline. We figured he must have fished these out of the desert on expeditions. And now here he was, presiding over a temporary bar in a small town rife with other collectors of banned things, who fanned out into the desertified surrounding areas of burnt or drying suburbs, in search of the things we could once make but could no longer.

The shtick of this guy – and his bar – was that he could make cold drinks, and not any kind of cold drinks but “sub-zero drinks”, powered by another system which seemed to be a hybrid of a chemistry set and a gun. Something about it helped cool water down even more while keeping it flowing so that it came out cold enough you could feel it chill air if you held your eye up close enough. It was a gloriously wasteful, indulgent, expensive enterprise, but he wasn’t short of customers. Something about a really cold drink is universal, I guess.

We were talking about the past: what it had been like to have a ‘rainy season’ where the rains were kind and were not storms. What it meant when a ‘dry season’ referred to an unusually lack of humidity, rather than guaranteed city-eating fires. We talked about the present, a little, but moved on quickly: the past and the future are interesting, and the present is a drag.

We were mid-way through talking about the future – could we get off planet? What was going to happen to the middle of Africa due to desertification? How were the various walls and nation-severing barriers progressing? – when the generator stopped working. The man went outside and fussed for a bit and when he came back in he said: “I guess that’s the last ice on planet earth”.

We all laughed but there was a pretty good chance he was right. That’s how things were in those days, before civilization moved into whatever it is now – we lived in a time where when ever a thing stopped working you could credibly think “maybe that’s the last human civilization will see of that?”. Imagine that.

Things that inspired this story: The increasingly real lived&felt reality of massive and globally distributed climate change; refrigerators; tinkerers; Instagram-restaurants at the end of time and space.

Import AI 143: Predicting car accident risks by looking at the houses people live in; why data matters as much as compute; and using capsule networks to generate synthetic data

Predicting car accident risks from Google Street View images:
The surprising correspondences between different types of data…
Researchers with the University of Warsaw and Stanford University have shown how to use pictures from people’s houses to better predict the chances of that person getting into a car accident. (Import AI administrative note – standard warnings about ‘correlation does not imply’ causation apply).

For the project, the researchers analyzed 20,000 addresses of insurance company clients – a random sample of an insurer’s portfolio collected in Poland between January 2012 and December 2015. For each address, they collect an overhead Google satellite view and a Google Street View image of the property, and humans then annotate the image with labels relating to the type of property, age, condition, estimate wealth of its residents, along with the type and density of buildings in the neighborhood. They subsequently test these variables and find that five of the seven have significant with regard to the insurance prediction problem.

  “Despite the high volatility of data, adding our five simple variables to the insurer’s model improves its performance in 18 out of 20 resampling trials and the average improvement of the Gini coefficient is nearly 2 percentage points,” they write.

Ultimately, they show that – to a statistically significant extent – “features visible on a picture of a house can be predictive of car accident risk, independently from classically used variables such as age, or zip code”.

Why this matters: Studies like this speak to the power of large-scale data analysis, highlighting how data that is innocuous at the level of the individual can become significant when compared and contrasted with a vast amount of other data. The researchers acknowledge this, noting that:  “modern data collection and computational techniques, which allow for unprecedented exploitation of personal data, can outpace development of legislation and raise privacy threats”.
  Read more: Google Street View image of a house predicts car accident risk of its resident (Arxiv).

#####################################################

Your next pothole could be inspected via drone:
…Drones + NVIDIA cards + smart algorithms = automated robot inspectors…
Researchers with HKUST Robotics Institute have created a prototype drone system that can be used to automatically analyze a road surface. The project sees the researchers develop a dense stereo vision algorithm which the UAV uses to analyze the road surface. They’re able to use this algorithm to process road images on the drone in real-time, automatically identifying surface-area disparities.

Hardware: To accomplish this, they use a ZED stereo camera mounted on a DJI Matrice 100 drone, which itself has a JETSON TX2 GPU installed onboard for real-time processing.

Why this matters: AI approaches make it cheap for robots to automatically sense&analyze aspects of the world, and experiments like this suggest that we’re rapidly approaching the era when we’ll start to automate various types of surveillance (both for civil and military purposes) via drones.
  Read more: Real-Time Dense Stereo Embedded in a UAV for Road Inspection (Arxiv).
  Get the datasets used in the experiment here (Rui Fan, HKUST, personal website).
  Check out a video of the drone here (Rui Fan, YouTube).

#####################################################

Train AI to watch over the world with the iWildCam dataset:
…Monitoring the planet with deep learning-based systems…
Researchers with the California Institute of Technology have published the iWildCam dataset to help people develop AI systems that can automatically analyze wildlife seen in camera traps spread across the American Southwest. They’ve also created a challenge based around the dataset, letting researchers compete in developing AI systems capable of automatically monitoring the world.

Testing generalization: “If we wish to build systems that are trained once to detect and classify animals, and then deployed to new locations without further training, we must measure the ability of machine learning and computer vision to generalize to new environments,” the researchers write.

Common nuisances: There are six problems relating to the data gathered from the traps: variable illumination, motion blur, size of the region of interest (eg, an animal might be small and far away from the camera), occlusion, camouflage, and perspective.

iWildCam: The images come from cameras installed across the American Southwest, consisting of 292,732 images spread between 143 locations. iWildCam is designed to capture the complexities of the datasets that human biologists need to deal with: “therefore the data is unbalanced in the number of images per location, distribution of species per location, and distribution of species overall”, they write.

Why this matters: Datasets like this – and AI systems built on top of it – will be fundamental to automating the observation and analysis of the world around us; given the increasingly chaotic circumstances of the world, it seems useful to be able to have machines automatically analyze changes in the environment for us.
   Read more: The iWildCam 2018 Challenge Dataset (Arxiv).
   Get the dataset: iWildCam  2019 challenge (GitHub).

#####################################################

Compute may matter, but so does data, says Max Welling:
…”The most fundamental lesson of ML is the bias-variance tradeoff”…
A few weeks ago Richard Sutton, one of the pioneers of reinforcement learning, wrote a post about the “bitter lesson” of AI research (Import AI #138), namely that techniques which use huge amounts of computation and relatively simple algorithms are better to focus on. Now, Max Welling, a researcher with the University of Amsterdam, has written a response claiming that data may be just as important as compute.

  “The most fundamental lesson of ML is the bias-variance tradeoff: when you have sufficient data, you do not need to impose a lot of human generated inductive bias on your model,” he writes. “However, when you do not have sufficient data available you will need to use human-knowledge to fill the gaps.”

Self-driving cars are a good example of a place where compute can’t solve most problems, and you need to invest in injecting stronger priors (eg, an understanding of the physics of the world) into your models, Welling says. He also suggests generative models could help fill in some of these gaps, especially when it comes to generalization.

Ultimately, Welling ends up somewhere between the ‘compute matters’ versus the ‘strong priors matter’ (eg, data) arguments. “I would say if we ever want to solve Artificial General Intelligence (AGI) then we will need model-based RL,” he writes. “We cannot answer the question of whether we need human designed models without talking about the availability of data.”

Why this matters: There’s an inherent tension in AI research between bets that revolve predominantly around compute and those that revolve around data. That’s likely because different bets encourage different research avenues and different specializations. I do worry about a world where people that do lots of ‘big compute’ experiments end up speaking a different language to those without, leading to different priors when approaching the question of how much computation matters.
  Read more: Do we still need models or just more data and compute? (Max Welling, PDF).

#####################################################

Want to train AI on something but don’t have much data? There’s a way!
…Using Capsule Networks to generate synthetic data…
Researchers with the University of Moratuwa want to be able to teach machines to recognize handwritten characters using very small amounts of data, so have implemented an approach based on Capsule Networks – a recently-proposed technique promoted by deep learning pioneer Geoff Hinton – that lets them learn to classify handwritten letters from as few as 200 examples.

The main way they achieve this is by synthetically augmenting these small datasets by using some of the idiosyncratic traits of capsule networks – namely, their ability to learn data representations that are more robust to transforms, as a consequence of their technical implementation of things like ‘routing by agreement‘. The researchers use these traits to directly manipulate the sorts of data representations being produced on exposure to the data to algorithmically generate handwritten letters that look similar to those in the training dataset, but are not identical; this generates additional data that the system can be trained on, without needing to collect more data from (expensive!) reality.

“By adding a controlled amount of noise to the instantiation parameters that represent the properties of an entity, we transform the entity to characterize actual variations that happen in reality. This results in a novel data generation technique, much more realistic than augmenting data with affine transformations,” they write. “The intuition behind our proposed perturbation algorithm is that by adding controlled random noise to the values of the instantiation vector, we can create new images, which are significantly different from the original images, effectively increasing the size of the training dataset”.

How well does it work? The researchers test their approach by evaluating how well TextCaps can learn to classify images when trained on full datasets and 200-sample-size datasets from EMNIST, MNIST and the much more visually complex Fashion MNIST; TextCaps is able to exceed state-of-the-art when trained on full data of three variants of EMNIST and gets close to this using just 200 samples, and approaches SOTA on MINIST and Fashion MNIST (though does very badly on Fashion MNIST when using just 200 samples, likely because of the complexity).

Why this matters: Approaches like this show how as we develop increasingly sophisticated AI systems we may be able to better deal with some of the limitations imposed on us by reality – like a lack of large, well-labeled datasets for many things we’d like to use AI on (for instance: learning to spot and classify numerous handwritten languages for which there are relatively few digitized examples). “We intend to extend this framework to images on the RGB space, and with higher resolution, such as images from ImageNet and COCO. Further, we intend to apply this framework on regionally localized languages by extracting training images from font files,” they write.
  Read more:  TextCaps: Handwritten Character Recognition with Very Small Datasets (Arxiv).
  Read more: Understanding Hinton’s Capsule Networks (Medium).
  Read more: How Capsules Work (Medium).
  Read more: Understanding Dynamic Routing between Capsules (Capsule Networks explainer on GitHub).

#####################################################

Want to test language progress? Try out SuperGLUE:
…Step aside GLUE – you were too easy!…
Researchers with New York University have had to toss out a benchmark they developed last year and replace it with a harder one, due to the faster-than-expected progress in certain types of language modelling. The ‘SuperGLUE’ benchmark is a sequel to GLUE and has been designed to include significantly harder tasks than those which were in GLUE.

New tasks to frustrate your systems: Tasks in SuperGBLUE include: CommitmentBank, where the goal is to judge how committed an author is to a specific clause within a sentence; the Choice of Plausible Alternatives (COPA) in which the goal is to pick the more likely sentence given two options; the Gendered Ambiguous Pronoun Coreference Task (GAP), where systems need to ‘determine the referent of an ambiguous pronoun’; the Multi-Sentence Reading Comprehension dataset, a true-false question-answering task; RTE, a textual-entailment task which was in GLUE 1.0; WIC, which challenges systems to do disambiguation and the Winograd Schema Challenge, which is a reading comprehension task designed to specifically test for world modeling or the lack of it (eg, systems that think large objects can go inside small objects, and vice versa).

PyTorch toolkit: The researchers plan to release a toolkit based on PyTorch and software from AllenNLP which will include pretrained models like OpenAI GPT and Google BERT, as well as designs to enable rapid experimentation and prototyping. As with GLUE, there will be an online leaderboard that people can compete on.

Why this matters: Well-designed benchmarks are one of the best tools we have available to us to help judge AI progress, so when benchmarks are rapidly obviated via progress in the field it suggests that the field is developing quickly. The researchers believe SuperGLUE is sufficiently hard that it’ll take a while to solve, so think “there is plenty of space to test new creative approaches on a broad suite of difficult NLP tasks with SuperGLUE.”
  Read more: Introducing SuperGLUE: A New Hope Against Muppetkind (Medium).
  Read more: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (PDF).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

European Commission releases pilot AI ethics guidelines:
Last year, the European Commission announced the formation of the High-Level Expert Group on AI, a core component of Europe’s AI strategy. The group released draft ethics guidelines in December (see Import #126), and embarked on a consultation process with stakeholders and member states. This month they released a new draft, and will be running a pilot program through 2019.

   Key requirements for trustworthy AI: The guidelines lay out 7 requirements: Human agency and oversight; Technical robustness and safety; Privacy and data governance; Transparency; Diversity, non-discrimination and fairness; Societal and environmental wellbeing; Accountability.

  International guidelines: The report makes clear the Commission’s ambition to play a leading role in developing internationally-agreed AI ethics guidelines.

  Why it matters: The foregrounding of AI safety (‘technical robustness and safety’ in the language of the guidelines) is good news. The previous draft revealed long-term concerns had proved highly-controversial amongst the experts, and asked specifically for consultation input on these issues. This latest draft suggests that the public and other stakeholders take these concerns seriously.
  Read more: Communication – Building Trust in Human Centric AI (EC).

Microsoft refuses to sell face recognition due to human rights concerns:
In a talk at Stanford, CEO Brad Smith described recent deals Microsoft had declined due to ethical concerns. He revealed that the company refused to provide face recognition technology to a California law enforcement agency. Microsoft concluded the proposed roll-out would have disproportionately impacted women and ethnic minorities. The company also declined a deal with a foreign country to install face recognition across the nation’s capital, due to concerns that it would have suppressed freedom of assembly.
  Read more: Microsoft turned down facial-recognition sales on human rights concerns (Reuters)

#####################################################

Tech Tales:

Until Another Dream

I get up and I hunt down the things that are too happy or too sad and I take them out of the world. This is a civil-general world and by decree we cannot have extremes. So I take their broken shapes with me and I put them in a chest in the basement of my simulated castle. Then I take my headset off and I go to my nearby bar and the barman calls me “dreamkiller” as his way of being friendly.
What dreams did you kill today, dreamkiller?
You still dream about that butterfly with the face of a kitten you whacked?
Ever see any more of those sucking-face spiders?
What happened to the screaming paving slabs, anyway?
You get the picture.

The thing about today is everyone is online and online is full of so much money that it’s just like real life: most people don’t see the most extreme parts of it, and by a combination of market pressures and human preferences, some people get paid to either erase the extremes or hide them away.

After the bar I go home and I get into bed and my muscle memory has me pick up the headset and have it almost on my head before my conscious brain kicks in – what some psychologists call The Supervisor. “Do I really want to do this?” my supervisor asks me? “Why not go to bed?”

I don’t answer myself directly, instead I slide the headset on, turn it on, and go hunting. There have been reports of unspeakably cute birds carrying wicker baskets containing smaller baby birds in the south quadrant. Meanwhile up in the north there’s some kind of parasite that eats up the power sub-systems of the zones, projecting worms onto all the simulated telescreens.

My official job title is Reality Harmonizer and my barman calls me Dreamkiller and I don’t have a name for myself: this is my job and I do it not willingly, but because my own tastes and habits compel me to do it. I have begun to wonder if real-life murderers and murder-police are themselves people that take off their headsets at night and go to bars. I have begun to wonder whether they themselves find themselves in the middle of the night choosing between sleep and a kind of addictive duty. I believe the rules change when fairytales are real.

Things that inspired this story: MMOs; the details change but the roles are always the same; detectives; noir; feature-space.

Import AI 142: Import AI 142: Berkeley spawns cheap ‘BLUE’ arm; Google trains neural nets to prove math theorems; seven questions about GANs

Google reveals HOList, a platform for doing theorem proving research with deep learning-based methods:
…In the future, perhaps more math theorems will be proved by AI systems than humans…
Researchers with Google want to develop and test AI systems that can learn to solve mathematical theorems, so have made tweaks to theorem proving software to make it easier for AI systems to interface with. In addition, they’ve created a new theorem proving benchmark to spur development in this part of AI.

HOL List: The software they base their system on is called HOL Light. For this project, they develop “an instrumented, pre-packaged version of HOL Light that can be used as a large scale distributed environment of reinforcement learning for practical theorem proving using our new, well-defined, stable Python API”. This software ships with 41 “tactics” which are basically algorithms to use to help prove math theorems.

Benchmarks: The researchers have also released a new benchmark on HOL Light, and they hope this will “enable research and measuring progress of AI driven theorem proving in large theories”. The benchmarks are initially designed to measure performance on a few tasks, including: predicting the same methodologies used by humans to create a proof; and trying to prove certain subgoals or aspects of proofs without access to full information.

DeepHOL: They design a neural network-based theorem prover called DeepHOL which tries to concurrently encode the goals and premises while generating a proof. “In essence, we propose a hybrid architecture that both predicts the correct tactic to be applied, as well as rank the premise parameters required for meaningful application of tactics”. They test out a variety of different neural network-based approaches within this overall architecture and train them via reinforcement learning, with the best system able to prove 58% of the proofs in the training set – no slam-dunk, but very encouraging considering these are learning-based methods.

Why this matters: Theorem proving feels like a very promising way to test the capabilities of increasingly advanced machines, especially if we’re able to develop systems that start to generate new proofs. This would be a clear validation of the ability for AI systems to create novel scientific insights in a specific domain, and I suspect would give us better intuitions about AI’s ability to transform science more generally as well.  “We hope that our initial effort fosters collaboration and paves the way for strong and practical AI systems that can learn to reason efficiently in large formal theories,” they write.
  Read more: HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (Extended Version).

#####################################################

Think GANs are interesting? Here are seven underexplored questions:
…Googler searches for the things we know we don’t know…
Generative adversarial networks have become a mainstay component of recent AI research given their utility in creative applications, where you need to teach a neural network about some data well enough that it can generate synthetic data that looks similar to the source, whether videos or images or audio.

But GANs are quite poorly understood, so researcher Augustus Odena has published an essay on Distill listing seven open questions about GANs.

The seven questions: These are:
– What are the trade-offs between GANs and other generative models?
– What sorts of distributions can GANs model?
– How can we scale GANs beyond image synthesis?
– What can we say about the global convergence of the training dynamics?
– How should we evaluate GANs and when should we use them?
– How does GAN training scale with batch size?
– What is the relationship between GANs and adversarial examples?

Why this matters: Better understanding how to answer these questions will help researchers better understand the technology, which will allow us to make better predictions about economics costs of training GAN systems, likely failures to expect, and point to future directions for work. It’s refreshing to see researchers publish exclusively about the problems and questions related to a technique, and I hope to see more scholarship like this.
  Read more: Open Questions about Generative Adversarial Networks (Distill).

#####################################################

Human doctors get better with aid of AI-based diagnosis system:
…MRNet dataset, competition, and research, should spur research into aiding clinicians with pre-trained medical-problem-spotting neural nets…
Stanford University researchers have developed a neural network-based technique to assess Knee MR scans for abnormalities and a few specific diagnoses (eg, ligament tears). They find that clinicians which have access to this model have a lower rate of mistaken diagnoses than those without access to it. When using this model “for every 100 healthy patients, ~5 are saved from being unnecessarily considered for surgery,” they write.

MRNet dataset: Along with their research, they’ve also released an underlying dataset: MRNet, a collection of 1,370 knee MRI exams performed at Stanford University Medical Center, spread across normal and abnormal knees.

Competition: “We are hosting a competition to encourage others to develop models for automated interpretation of knee MRs,” the researchers write. “Our test set (called internal validation set in the paper) has its ground truth set using the majority vote of 3 practicing board-certified MSK radiologists”.

Why this matters: Many AI systems are going to augment rather than substitute for human skills, and I expect this to be especially frequent in medicine, where we can expect to give clinicians more and more AI advisor systems to use when making diagnoses. In addition, datasets are crucial to the development of more sophisticated medical AI systems and competitions tend to drive attention towards a specific problem – so the release of both in addition to the paper should spur research in this area.
  Read more and register to download the dataset here: MRNet Dataset (Stanford ML Group).
  Read more about the underlying research: MRNet: Deep-learning-assisted diagnosis for knee magnetic resonance imaging (Stanford ML Group).

#####################################################

As AI hype fades, applications arrive:
…Now we’ve got to superhuman performance we need to work on human-computer interaction…
Jeffrey Bigham, a human-computer interaction researcher, thinks that AI is heading into an era of less hype – and that’s a good thing. This ‘AI autumn’ is a natural successor to the period we’re currently in, since we’re moving from the development to the deployment phase of many AI technologies.

Goodbye hype, hello applications:
“Hype deflates when humans are considered,” Bigham writes. “Self-driving cars seem much less possible when you think about all the things human drivers do in addition to the driving on well-known roads in good lighting conditions. They find passengers, they get gas, they fix the car sometimes, they make sure drunk passengers aren’t in danger, they walk elderly passengers into the hospital, etc”.

Why this matters: “If hype is at the rapidly melting tip of the iceberg, then the great human-centered applied work is the super large mass floating underneath supporting everything,” he writes. And, as most people know, working with humans is challenging and endlessly surprising, so the true test of AI capabilities will be to first reach human parity at certain things, then be deployed in ways that make sense to humans.
  Read more: The Coming AI Autumn (Jeffrey Bigham blog).

#####################################################

Berkeley researchers design BLUE, a (deliberately) cheap robot for AI research:
…BLUE promises human-like capabilities in a low-cost platform…
Berkeley researchers have developed the Berkeley robot for Learning in Unstructured Environments (BLUE), robotic arm designed for AI research and deployments. The robot was developed by a team of more than 15 researchers over the last three years. It is designed to cost around ~$5000 to build when built in batches of 1,500 units, and many design-choices have been constrained by the goal of making it both cheap to build and safe to operate around humans.

The robot can be used to train AI approaches on a cheap robotics platform, and works with teleoperation systems so it can be trained directly from human behaviors.

BLUE has seven degrees of freedom, distributed across three joints in the shoulder, one in the elbow, and three in the wrist. When designing BLUE, the researchers optimized for a “useful” robot – this required sufficient precision to be human-like (in this case, it can move with a precision of around 4 millimeters, which is far less than ultra-precise industrial robots) cheap enough to be manufactured at scale, and capable of a general class of manipulation tasks in unconstrained (aka, the opposite of a factory production line) environments.

Low-cost design: The BLUE robots use quasi-direct drive actuation (QDD), an approach that has most recently become popular in legged locomotion systems. They also designed a cheap, parallel jaw gripper (“we chose parallel jaws for their predictability, robustness, simplicity (low cost), and ease of simulation”).

Why this matters: In recent years, techniques based on deep learning have started to give robots unprecedented perception and manipulation capabilities. One thing that has held back deployment, though, is the absence of cheap robot platforms which researchers can experiment with. BLUE seems to have the nice properties of being built by researchers to reflect AI needs, while also being designed to be manufactured at scale. “Next up for the project is continued stress testing and ramping manufacturing,” they write. “The goal is to get these affordable robots into as many researchers’ hands as possible”.
  Read more: Project Blue (Berkeley website).
  Read the research paper: Quasi-Direct Drive for Low-Cost Compliant Robotic Manipulation (Arxiv).

#####################################################

Network architecture search gets more efficient with Single-Path NAS:
…Smarter search techniques lower the computational costs of AI-augmented search…
Researchers with Carnegie Mellon University, Microsoft, and the Harbin Institute of Technology have figured out a more efficient way to get computers to learn how to design AI systems for deployment on phones.

The approach, called Single-Path NAS, makes it more efficient to spend compute to search for more sophisticated AI models. The key technical trick is, at each layer of the network, to search over “an over-parameterized ‘superkernel’ in each ConvNet layer’. What this means in practice is the researchers have made it more efficient to rapidly iterate through different types of AI component at each layer of the network, making their approach more efficient than other NAS techniques.
  “Without having to choose among different paths/operations as in multi-path methods, we instead solve the NAS problem as finding which subset of kernel weights to use in each ConvNet layer”, they explain.

Hardware-aware: The researchers add a constraint during training that lets them optimize for the latency of the resulting architecture – this lets them automatically search for an architecture that best maps to the underlying hardware capabilities.

Testing, testing, testing: They test their approach on a Pixel 1 phone – a widely-used premium Android phone, developed by Google. They benchmark by using Single-Path NAS to design networks for image classification on ImageNet and compare it against state-of-the-art systems designed by human researchers as well as ones discovered via other neural architecture search techniques.  

  Results: Their approach gets an accuracy of 74.96%, which they claim is “the new state-of-the-art ImageNet accuracy among hardware-efficient NAS methods. Their system also takes about 8 epoches to train, compared to hundreds (or thousands) for other methods.

Why this matters: Being able to offload the cost of designing new network architectures from human designers to machine designers has the potential to further accelerate AI research progress and AI application deployment. This sort of technique fits into the broader trend of the industrialization of AI (which has been covered in this newsletter in a few different ways) – worth noting that the authors of this technique are spread across multiple companies and institutions, from CMU, to Microsoft, to the Harbin Institute of Technology in Harbin, China.
  Read more: Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 hours (Arxiv).
  Get the code: Single-Path-NAS (GitHub).

#####################################################

How should the Department of Defense use Artificial Intelligence? Tell them your thoughts:
Can the DoD come up with principles for how it uses AI? There are ways you can help…
The Defense Innovation Board, an advisory committee to the Secretary of Defense, is trying to craft a set of AI principles that the DoD can use as it integrates AI technology into its systems – and it wants help from the world.

  Share your thoughts at Stanford this month: If you’re in the Bay Area, you may want to come to ‘The Ethical and Responsible Use of Artificial Intelligence for the Department of Defense (DoD)” at Stanford University on April 25th 2019, where you can give thoughts on how the DoD may want to consider using (or not using) AI. You can also submit public comments online.

  Why this matters: Military organizations around the world are adopting AI technology, and it’s unusual to see a military organization publicly claim to be so interested in the views of people outside its own bureaucracy. I think it’s worth people submitting thoughts here (especially if they’re constructively critical), as this will provide us evidence for how or if the general public can guide the ways in which these organizations use AI.
  Read more about the AI Principles project here (DiB website).

#####################################################

OpenAI Bits & Pieces:

OpenAI Five wins matches against pros, cooperates with humans:
  This weekend, OpenAI’s neural network-based system for playing Dota 2, OpenAI Five, beat a top professional team in San Francisco. Additionally, we showed how the same system can play alongside humans.
  OpenAI Five Arena: We also announced OpenAI Five Arena, a website which people can use to play with or against our Dota 2 agents. Sign up via: arena.openai.com. Wish us luck as we try to play against the entire internet next week.

#####################################################

Tech Tales:

The Big Art Machine

The Big Art Machine, or as everyone called it, The BAM, was a robot about thirty feet tall and a hundred and fifty feet long. It looked kind of like a huge, metal centipede, except instead of having a hundred legs, it had a hundred far more sophisticated appendages – each able to manipulate the world around it, and change its own dimensions through a complicated series of interlocking, metal plates.

The BAM worked like this: you and a hundred or so of your friends would pile into the machine and each of you would sit in a small, sealed room housed at the intersection between each of its hundred appendages and its main body. Each of these rooms contained a padded chair, and each chair came with a little swing-out screen, and on this screen you’d see two movie clips of how your appendage could move – you’d pick whichever one you preferred, then it’d show you another one, and so on.

The BAM was a big AI thing, essentially. Each of the limbs started out dumb and uncoordinated, and at first people would just focus on calibrating their own appendage, then they’d teach their own appendage to perhaps strike the ground, or try and pull something forward, or so on. There were no rules. Mostly, people would try to get the BAM to walk or – very, very occasionally – run. After enough calibration, the operators of each of the appendages would get a second set of movies on their screen – this time, movies of how their appendage plus another appendage elsewhere on the BAM might move together. In this way, the crowd would over time select harmonious movements, built out of idiosyncratic underlays.

So hopefully this gives you an idea for how difficult it was to get the BAM to do anything. If you’ve ever hosted a party for a hundred people before and tried to get them to agree on something – music, a drinking game, even just listening to one person give a speech – then you’ll know how difficult getting the BAM to do anything is. Which is why we were so surprised that one day a team of people got into the BAM and, after the first few hours of aimless clanking and probing, it started to walk, then it started to run, and then… we lost it.

Some people say that they taught it to swim, and took it into the ocean. Others say that it’s not beyond the realms of feasibility that it was possible to teach the thing to fly – though the coordination required and the time it would take to explore its way to such a particular combination of movements was so lengthy that many thought it impossible. Now, we tell stories about the BAM as a lesson in collective action and calibration, and children when they learn about it in school immediately dream of building machines in which thousands of people work together, calibrating around some purpose that comes from personal chaos.

Things that inspired this story: Learning from human preferences; heterogeneous data; the beautiful and near-endless variety of ways in which humans approach problems; teamwork; coordination; inverse reinforcement learning; robotics, generative models.

Import AI 141: AIs play doom at thousands of frames per second; NeurIPS wants reproducible research; and Google creates&scraps AI ethics council.

75 seconds: How long it takes to train a network against ImageNet:
…Fujitsu Research claims state-of-the-art ImageNet training scheme…
Researchers with Fujitsu Laboratories in Japan have further reduced the time it takes to train large-scale, supervised learning AI models; their approach lets them train a residual network to around 75% accuracy on the ImageNet dataset after 74.7 seconds of training time. This is a big leap from where we were in 2017 (an hour), and is impressive relative to late-2018 performance (around 4 minutes: see issue #121).

How they did it: The researchers trained their system across 2,048 Tesla V100 GPUs via the Amazon-developed MXNet deep learning framework. They used a large mini-batch size of 81,920, and also implemented layer-wise adaptive scaling (LARS) and a ‘warming up’ period to increase learning efficiency.

Why it matters: Training large models on distributed infrastructure is a key component of modern AI research, and the reduction in time we’ve seen on ImageNet training is striking – I think this is emblematic of the industrialization of AI, as people seek to create systematic approaches to efficiently training models across large amounts of computers. This trend ultimately leads to a speedup in the rate of research reliant on large-scale experimentation, and can unlock new paths of research.
  Read more: Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds (Arxiv).

#####################################################

Ian ‘GANfather’ Goodfellow heads to Apple:
…Machine learning researcher swaps Google for Apple…
Ian Goodfellow, a machine learning researcher who developed an AI approach called generative adversarial networks (GANs), is leaving Google for Apple.

Apple’s deep learning training period: For the past few years, Apple has been trying to fill its ranks with more prominent people working on its AI projects. In 2016 it hired Russ Salakhutdinov, a researcher from CMU who had formerly studied under Geoffrey Hinton in Toronto, to direct its AI research efforts. Russ helped build up more of a traditional academic ML group at Apple, and Apple lifted its customary veil of secrecy a bit with the Apple Machine Learning Journal, a blog that details some of the research done by the secretive organization. Most recently, Apple hired John Giannandrea from Google to help lead its AI strategy. I hope Ian can push Apple towards being more discursive and open about aspects of its research, and I’m curious to see what happens next.

Why this matters: Two of Ian’s research interests – GANs and adversarial examples (manipulations made to data structures that cause neural networks to misclassify things) – have significant roles in AI policy, and I’m wondering if Apple might explore this more through proactive work (making things safer and better) along with policy advocacy.
  Read more: One of Google’s top A.I. people has joined Apple (CNBC).

#####################################################

World’s most significant AI conference wants more reproducible research:
…NeurIPS 2019 policy will have knock-on effect across wider AI ecosystem…
The organizing committee for the Neural Information Processing Systems Conference (NeurIPS, formerly NIPS), has made two changes to submissions for the AI conference: A “mandatory Reproducibility Checklist”, along with “a formal statement of expectations regarding the submission of code through a new Code Submission Policy”.

Reproducibility checklist: Those submitting papers to NeurIPS will fill out a reproducibility checklist, originally developed by researcher Joelle Pineau. “The answers will be available to reviewers and area chairs, who may use this information to help them assess the clarity and potential impact of submissions”.

Code submissions: People will be expected (though not forced – yet) to submit code along with their papers, if they involve experiments that relate to a new algorithm or a modification of an existing one. “It has become clear that this topic requires we move at a careful pace, as we learn where our “comfort zone” is as a community,” the organizers write.

  Non-executable: Code submitted to NeurIPS won’t need to be executable – this helps researchers whose work depends either on proprietary code (for instance, it plugs into a large-scale, proprietary training system, like those used by large technology companies), or who depend on proprietary datasets.

Why this matters: Reproducibility touches on many of the anxieties of current AI research relating to the difference in resources between academic researchers and those at corporate labs. Having more initiatives around reproducibility may help to close this divide, especially done in a (seemingly quite thoughtful) way that lets corporate researchers do things like publishing code without needing to worry about leaking information about internal proprietary infrastructure.
  Read more: Call for Papers (NeurIPS Medium page).
  Check out the code submission policy here (Google Doc).

#####################################################

Making RL research cheaper by using more efficient environments:
…Want to train agents on a budget? Comfortable with your agents learning within a retro hell? Then ViZDoom might be the right choice for you…
A team of researchers from INRIA in France have developed a set of tasks that demand “complex reasoning and exploration”, which can be run within the ViZDoom simulator at around 10,000 environment interactions per second; the goal of the project is to make it easier for people to do reinforcement learning research without spending massive amounts of compute.

Extending ViZDoom: ViZDoom is an implementation of the ancient first-person shooter game, Doom. However, one drawback is that it ships with only eight different scenarios to train agents in. To extend this, the researchers have developed four new scenarios designed to “test navigation, reasoning, and memorization”, variants of which can be procedurally generated.

Scenarios for creating thinking machines: These four scenarios include a navigation task called Labyrinth; Find and return, where the agents needs to find an object in the maze then return to its starting point; Ordered k-item, where the agent needs to collect a few different items in a predefined order; and Two color correlation, where an agent needs to explore a maze to find a column at its center, then pick up objects which are the same color as the column.

Spatial reasoning is… reassuringly difficult: “The experiments on our proposed suite of benchmarks indicate that current state-of-the-art models and algorithms still struggle to learn complex tasks, involving several different objects in different places, and whose appearance and relationships to the task itself need to be learned from reward”.
  Read more: Deep Reinforcement Learning on a Budget: 3D Control and Reasoning Without a Supercomputer (Arxiv).

######################################################

Facebook wants to make smart robots, so it built them a habitat:
…New open source research platform can conduct large-scale experiments, running 3D world simulators at thousands of frames per second…
A team from Facebook, Georgia Institute of Technology, Simon Fraser University, Intel Labs, and Berkeley, have released Habitat, “a platform for embodied AI research”. The open source software is designed to help train agents for navigation and interaction tasks in a variety of domains, ranging from 3D environment simulators like Stanford’s ‘Gibson’ system or Matterport 3D to fully synthetic datasets like SUNCG.

  “Our goal is to unify existing community efforts and to accelerate research into embodied AI,” the researchers write. “This is a longterm effort that will succeed only by full engagement of the broader research community. To this end, we have opensourced the entire Habitat platform stack.”

  Major engineering: The Habitat simulator can support “thousands of frames per second per simulator thread and is orders of magnitude faster than previous simulators for realistic indoor environments (which typically operate at tens or hundreds of frames per second)”. Speed matters here, because the faster you can run your simulator, the more experience you can collect at each computational timestep. Faster simulators = its cheaper and quicker to train agents.

  Using habitat to test how well an agent can navigate: The researchers ran very large-scale tests on Habitat with a simple task: “an agent is initialized at a random starting position and orientation in an environment and asked to navigate to target coordinates that are provided relative to the agent’s position; no ground-truth map is available and the agent must use only its sensory input to navigate”. This is akin to waking up in a mansion with no memory and needing to get to a specific room…except in this world you do this for thousands of subjective years, since Facebook trains its agents for a little over 70 million timesteps in the simulator.

  PPO outperforms hand-coded SLAM approach: They find in tests that they can develop an AI agent based on a proximal policy optimization (PPO) policy trained via reinforcement learning which outperforms hand-coded ‘SLAM’ systems which implement “a classical robotics navigation pipeline including components for localization, mapping, and planning”.

Why this matters: Environments frequently contribute to the advancement of AI research, and the need for high-performance environments has been accentuated by the recent trend for using significant computational resources to train large, simple models. Habitat seems like a solid platform for large-scale research, and Facebook plans to add new features to it, like physics-based interactions within the simulator and supporting multiple agents concurrently. It’ll be interesting to see how this develops, and what things they learn along the way.
  Read more: Habitat: A Platform for Embodied AI Research (Arxiv).

######################################################

People want their AI assistants to be chatty, says Apple:
…User research suggests people prefer a chattier, more discursive virtual assistant…
Apple researchers want to build personal assistants that people actually want to use, so as part of that they’ve conducted research into how users respond to chatty or terse/non-chatty personal assistants, and how they respond to systems that try to ‘mirror’ the human they are interacting with.

Wizard-of-Oz: Apple composes this as a Wizard-of-Oz study, which means there is basically no AI involved: Apple instead had 20 people (three men and seventeen women – the lack of gender balance is not explained in the paper) take turns sitting in a room, where they would utter verbal commands for a simulated virtual assistant, which was in fact an Apple employee sitting in another room. The purpose of this type of study is to simulate the interactions that may occur between human and AI systems to help researchers figure out what they should build next, and how users might react to what they build.

Study methodology: They tested people against three systems: a chatty system, a non-chatty system, and one which tried to mirror the chattiness of the user.

  When testing the chatty vs non-chatty systems, Apple asked some human users to make a variety of verbal requests relating to alarms, calendars, navigation, weather, factual information, and searching the web. For example, a user make say “next meeting time”, and the simulated agent could respond with (chatty) “It looks like you have your next meeting after lunch at 2 P.M.”, or (non-chatty) “2 P.M.” Participants then classified the qualities of the responses into categories, like: good, off topic, wrong information, too impolite or too casual.

Talk chatty to me: The study finds that people tend to prefer chatty assistants to non-chatty ones, and have a significant preference for agents whose chattiness mirrors the chattiness of the human user. “”Mirroring user chattiness increases feelings of likability and trustworthiness in digital assistants. Given the positive impact of mirroring chattiness on interaction, we proceeded to build classifiers to determine whether features extracted from user speech could be used to estimate their level of chattiness, and thus the appropriate chattiness level of a response”, they explain.

Why this matters: Today’s virtual assistants contain lots and lots of hand-written material and/or specific reasoning modules (see: Siri, Cortana, Alexa, the Google Assistant). Many companies are trying to move to systems where a larger and larger chunk of the capabilities come from behaviors that are learned from interaction with users. To be able to build such systems, we need users that want to talk to their systems, which will generate the sorts of lengthy conversational interactions needed to train more advanced learning-based approaches.

  Studies like this from Apple show how companies are thinking about how to make personal assistants more engaging: primarily, this makes users feel more comfortable with the assistants, but as a secondary effect it can bootstrap the generation of data from which to learn from. There also may be stranger effects: “People not only enjoy interacting with a digital assistant that mirrors their level of chattiness in its responses, but that interacting in this fashion increases feelings of trust”, the researchers write.
  Read more: Mirroring to Build Trust in Digital Assistants (Arxiv).

######################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Google scraps AI ethics council:
Google’s new AI ethics council has been cancelled, just over a week after its launch (see #140).

What went wrong: There was significant backlash from employees at the appointments. ~2,500 employees signed a petition to remove the president of the conservative Heritage Foundation, Kay Coles James, from the council. Her appointment was described as bringing “diversity of thought” to the panel. Employees pointed to James’ track record of political positions described as anti-trans, anti-LGBT and anti-immigrant. There was also anger at the appointment of Dyan Gibbens, CEO of a drone company. A third appointee, Alessandro Acquisti, resigned from the body, saying it was not the right forum for engaging with ethical issues around AI.

What next: In a statement, the company says it is “going back to the drawing board,” and “will find different ways of getting outside opinions on these topics.”

Why it matters: This is an embarrassing outcome for Google, whose employees have again demonstrated their ability to force change at the company. Over and above the issues with appointments, there were reasons to be skeptical of the council as a supervision mechanism – the group was going to meet only four times in person over the next 12 months, and it is difficult to imagine the group being able, in this time, to understand Google’s activities enough to provide any meaningful oversight.
  Read more: Google cancels AI ethics board in response to outcry (Vox).

######################################################

Balancing openness and values in AI research
The Partnership on AI and OpenAI organized an event with members of the AI community, to explore openness in AI research. In particular, they considered how to navigate the tension between openness norms, and minimizing risks from unintended consequences and malicious uses of new technologies. Some of the impetus for the meal was OpenAI’s partial release of the GPT2 language model. Participants role-played an internal review board of an AI company, deciding whether to publish a hypothetical AI advance which may have malicious applications.

Key insights: Several considerations were identified: (1) Organizations should have standardized risk assessment processes; (2) The efficacy of review processes depends on time-frames, and whether other labs are expected to publish similar work. It is unrealistic to think that one lab could unilaterally prevent publication, so it is better to think of decisions as delaying (not preventing) the dissemination of information; (3) AI labs could learn from the ‘responsible disclosure’ process in computer security, where vulnerabilities are disclosed only after there has been sufficient time to patch security issues; (4) It is easier to mitigate risks at an early, design stage, of research, than once research has been completed.

Building consensus: A survey after the event showed consensus across the community that there should be standardized norms and review parameters across institutions. There was not consensus, however, on what these norms should be. PAI identify 3 viewpoints among respondents: one group believed openness is generally the best norm; another believed pre-publication review processes might be appropriate; another believed there should be sharing within trusted groups.
  Read more: When Is It Appropriate to Publish High-Stakes AI Research? (PAI).
  Read more: ATEAC member Joanna Bryson has written a post reflecting on the dissolution of the board, called: What we lost when we lost Google ATEAC (Joanna Bryson’s blog).

######################################################

Amazon shareholders could block government face recognition contract:
The SEC has ruled that Amazon shareholders can vote on two proposals to stop sales of face recognition technologies to law enforcement. The motions, put forward by activist shareholders, will be considered at the company’s annual shareholder meeting. One asks Amazon to stop sales of their Rekognition technology to government unless the company’s board determines it does not pose risks to human and civil rights. The other requests that the board commissions an independent review of the technology’s impacts on privacy and civil liberties. While the motions are unlikely to pass, they put further pressure on the company to address these long-running concerns.
  Read more: Amazon has to let shareholders vote on government Rekognition ban, SEC says (The Verge).
  Read more: A win for shareholders in effort to halt sales of Amazon’s racially biased surveillance tech (OpenMIC).

######################################################

Tech Tales:

Joy Asteroid

The joy asteroid landed in PhaseSpace at two in the morning, pacific time, in March 2025. Anyone with a real-world location that corresponded to the virtual asteroid was inundated with notifications for certain types of gameworld-enhancing augmentations, in exchange for the recording and broadcast of fresh media related to the theme of ‘happiness’.

Most people took the deal, and suddenly a wave of feigned happiness spread across the nearby towns and cities as people posed in bedrooms and parks and cars and trains in exchange for trinkets, sometimes mentioned and sometimes not. This triggered other performances of happiness and joy performances, entirely disconnected from any specific reward – though some who did it said they hoped a reward would magically appear, as it had done for the others.

Meanwhile, in PhaseSpace, the joy persisted, warping most of the rest of virtual reality with it. Joy flowed from PhaseSpace via novel joy-marketing mechanisms, all emanating from a load of financial resources that seemed to have been embedded in the asteroid.

All of this happened in about an hour, and after that people started to work out what the asteroid was. Someone on a social network had already used the term ’emergent burp’, and this wasn’t so far from the truth – something in the vast, ‘world modelling’ neural net that simultaneously modeled various real&virtual simulations while doing forward prediction and planning had spiraled into an emergent fault, leading to an obsession with joy – a reward loop suddenly appeared within the large model, diverging the objective. Most of this happened because of sloppy engineering – many safety protocols these days either have humans periodically calibrating the machines, or are based on systems with stronger guarantees.

The joy loop was eventually isolated, but rather than completely delete it, the developers of the game cordoned off the environment and moved it onto separate servers running on a separate air-gapped network, and created a new premium service for ‘a visit to the land of joy’. They claim to have proved that their networking system will prevent the joy bug from emanating, but they continue to feed it more compute, as people come back with wild tales of lands of bus-sized birds and two-headed sea lions, and trees that grow from the bottom of fat, winged clouds.

The company that operates the system is currently alleged to be building systems to provide ‘live broadcasts’ from the land of joy, to satisfy the demands of online influencers. I don’t want them to do this but I cannot stop them – and I know that if they succeed, I’ll tune in.

Things that inspired this story: virtual reality; imagining an economic ‘clicker-game’-esque version of P

Import AI 140: Surveilling a city via the ‘CityFlow’ dataset; 25,000 images of Chinese shop signs; and the seven traps of AI ethics

NVIDIA’s ‘CityFlow’ dataset shows how to do citywide-surveillance:
…CityFlow promises more efficient, safer transit systems… as well as far better surveillance systems…
Researchers with NVIDIA, San Jose State University, and the University of Washington, have released CityFlow, a dataset to help researchers develop algorithms for surveilling and tracking multiple cars as they travel around a city.

The CityFlow Dataset contains of 3.25 hours of video collected from 40 cameras distributed across 10 intersections in a US city. “The dataset covers a diverse set of location types, including intersections, stretches of roadways, and highways”. CityFlow contains over 229,680 bounding boxes across 666 vehicles, which include cars, buses, pickup trucks, vans, SUVs, and so on. Each video has a resolution of at least 960pixels, and “the majority” have a frame rate of 10 FPS.
  Sub-dataset: CityFlow ReID: The researchers have created a subset of the data for the purpose of re-identifying pedestrians and vehicles as they disappear from the view of one camera and re-appear in another. This subset of the data includes 56,277 bounding boxes,

Baselines: CityFlow ships with a set of baselines for the following tasks:

  • Pedestrian re-identification.
  • Vehicle re-identification.
  • Single-camera tracking of a distinct object.
  • Multi-camera tracking of a given object.

Why this matters – surveillance, everywhere: It would be nice to see some discussion within the paper about the wide-ranging surveillance applications and implications of this technology. Yes, it’ll clearly be used to improve the efficiency (and safety!) of urban transit systems, but it will also be plugged into local police services and national and international intelligence-gathering systems. This has numerous ramifications and I would be excited to see more researchers take the time to discuss these aspects of their work.
  Read more: CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification (Arxiv).

#####################################################

SkelNetOn challenges researchers to extract skeletons from images, point clouds, and parametric representations:
…New dataset and competition track could make it easier for AI systems to extract more fundamental (somewhat low-fidelity) representations of the objects in the world they want to interact with…
A group of researchers from multiple institutions have announced the ‘SkelNetOn’ dataset and challenge, which seeks to “utilize existing and develop novel deep learning architectures for shape understanding”. The challenge involves the geometric modelling of objects, which is a useful problem to work on as techniques that can solve it naturally generate “a compact and intuitive representation of the shape for modeling, synthesis, compression, and analysis”.

Three challenges in three domains: Each SkelNetOn challenge ships with its own dataset of 1,725 paired images/point clouds/parametric representations of objects and skeletons.

Why this matters: Datasets contribute to broader progress in AI research, and being able to smartly infer 2D and 3D skeletons from images will unlock applications, ranging from Kinect-style interfaces that rely on the computer knowing where the user is, to being able to cheaply generate (basic) skeletal models for use in media production, for example video games.
  The authors “believe that SkelNetOn has the potential to become a fundamental benchmark for the intersection of deep learning and geometry understanding… ultimately, we envision that such deep learning approaches can be used to extract expressive parameters and hierarchical representations that can be utilized for generative models and for proceduralization”.
  Read more: SkelNetOn 2019 Datast and Challenge on Deep Learning for Geometric Shape Understanding (Arxiv).

#####################################################

Want over 25,000 images of Chinese shop signs? Come get ’em:
…ShopSign dataset took more than two years to collect, and includes five hard categories of sign…
Chinese researchers have created ShopSign, a dataset of images of shop signs. Chinese shop signs tend to be set against a variety of backgrounds with varying lengths, materials used, and styles, the researchers note; this compares to signs in places like the USA, Italy, and France, which tend to be more standardized, they explain. This dataset will help people train automatic captioning systems that work against (some) Chinese signs, and could lead to secondary applications, like using generative models to create synthetic Chinese shop signs.

Key statistics:

  • 25,362: Chinese shop sign images within the dataset.
  • 4,000: Images taken at night.
  • 2,516: Pairs of images where signs have been photographed from both an angle and a front-facing perspective.
  • 50: Different types of camera used to collect the dataset, leading to natural variety within images.
  • 2.4 years: Time it took to collect the dataset.
  • >10: Locations of images, including Shanghai, Beijing, inner Mongolia, Xinjiang, Heilongjiang, Liaoning, Fujian, Shangqiu, Zhoukou, as well as several urban areas in Henan Province.
  • 5: “special categories”; these are ‘hard images’ which are signs against wood, deformed, exposed, mirrored, or obscure backdrops.
  • 196,010 – Lines of text in the dataset.
  • 626,280 – Chinese characters in the dataset.

Why this matters: The creation of open datasets of images not predominantly written in English will help to make AI more diverse, making it easier for researchers from other parts of the world to build tools and conduct research in contexts relevant to them. I can’t wait to see ShopSigns for every language, covering the signs of the world (and then I hope someone trains a Style/Cycle/Big-GAN on them to generate synthetic street sign art!).
  Get the data: The authors promise to share the dataset on their GitHub repository. As of Sunday March 31st the images are yet to be uploaded their. Check out GitHub here.     Read more: ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views (Arxiv).

#####################################################

Stanford (briefly) sets state-of-the-art for GLUE language modelling challenge:
…Invest in researching new training signals, not architectures, say researchers…
Stanford University researchers recently set a new state-of-the-art on a multi-task natural language benchmark called GLUE, obtaining a score of 83.2 on GLUE on 20th of March, compared to 83.1 for the prior high score and 87.1 for human baseline performance.

Nine tasks, one benchmark: GLUE consists of nine natural language understanding tasks and was introduced in early 2018. Last year, systems from OpenAI (GPT) and Google (BERT) led the GLUE leaderboard; the Stanford system uses BERT in conjunction with additional supervision signals (supervised learning, transfer learning, multi-task learning, weak supervision, and ensembling) in a ‘Massive Multi-Task Learning (MMTL) setting’. The resulting model obtains state-of-the-art scores on four of GLUE’s nine tasks, and sets the new overall state-of-the-art.

RTE: The researchers detail how they improved performance on RTE (Recognizing Textual Entailment), one of GLUE’s nine tasks. The goal of RTE is to figure out if a sentence is implied by the preceding one, for example: in the following example, the second sentence is related to the first: “The cat sat on the mat. The dog liked to sit on the mat, so it barked at the cat.”

Boosting performance with five supervisory signals:

  • 1 signal: Supervised Learning [SL]: Score: 58.9
    • Train a standard biLSTM on the ‘RTE’ dataset, using ELMo embeddings and an attention layer.
  • 2 signals: Transfer Learning [TL] (+ SL): Score: 76.5
    • Fine-tune a linear layer on the RTE dataset on top of a pre-trained BERT module. “By first pre-training on much larger corpora, the network begins the fine-tuning process having already developed many useful intermediate representations which the RTE task head can then take advantage of”.
  • 3 signals: Multi-Task Learning [MTL] (+ TL + SL): Score 83.4
    • Train multiple additional linear layers against multiple tasks similar to RTE as well as RTE itself, using a a pre-trained BERT module with each layer having its own task-specific interface to the relevant dataset. Train across all these tasks for ten epoches, then fine-tune on individual tasks for an additional 5 epochs.
  • 4 signals: Dataset Slicing [DS] (+ TL + SL + MTL): Score 84.1
    • Identify parts of the dataset the network has trouble with (eg, consistently low performance on RTE examples that have rare punctuation), then train additional task heads on top of these subsets of the data.
  • 5 signals: Ensembling [E] (+ DS + TL + SL + MTL): Score 85.1
    • Mush together multiple different models trained with slightly different properties (eg, one that is purely lowercased text, while another which recognizes text upper-casing, or ones with different training/validation set splits). Averaging the probabilities of these model predictions together further increases the score.

Why this matters: Approaches like this show how researchers are beginning to figure out how to train far more capable language systems using relatively simple, task-agnostic techniques. The researchers write: “we believe that it is supervision, not architectures, that really move the needle in ML applications”.
  (In a neat illustration of the rate of progress in this domain, shortly after the Stanford researchers submitted their high-scoring GLUE system, they were overtaken by a system from Alibaba, which obtained a score of 83.3.)
  Details of the Stanford team’s ‘Snorkel MeTaL’ submission here.
  Check out the ‘GLUE’ benchmark here (GLUE official site).
  Read the original GLUE paper here (Arxiv).
  Read more: Massive Multi-Task Learning with Snorkel MeTaL: Bringing More Supervision to Bear (Stanford Dawn).

#####################################################

The seven traps of AI ethics:
…Common failures of reasoning when people explore this issue…
As researchers try to come up with principles to apply when seeking to build and deploy AI systems in an ethical way, what problems might they need to be aware of? That’s the question that researchers from Princeton try to answer in a blog about seven “AI ethics traps” that people might stumble into.

The seven deadly traps:

  • Reductionism: reducing AI ethics to a single constraint, for instance fairness.
  • Simplicity: Overly simplifying ethics, eg via creating checklists that people formulaically follow.
  • Relativism: Placing such importance on the diversity of views people have about AI ethics, that as a consequence it is difficult to distill or collapse these views down to a smaller core set of concerns.
  • Value Alignment: Ascribing one set of values to everyone in an attempt to come up with a single true value for people (and the AI systems they design) to follow, and failing to entertain other equally valid viewpoints.
  • Dichotomy: Presenting ethics as binary, eg “ethical AI” versus “unethical AI”.
  • Myopia: Using AI as a catch-all term, leading to fuzzy arguments.
  • Rule of Law reliance: Framing ethics as a substitute for regulations, or vice versa.

Why this matters: AI ethics is becoming a popular subject, as people reckon with the significant impacts AI is having on society. At the same time, much of the discourse about AI and ethics has been somewhat confused, as people try to reason about how to solve incredibly hard, multi-headed problems. Articles like this suggest we need to define our terms better when thinking about ethics, and indicates that it will be challenging to work out out what and whose values AI systems should reify.
  Read more: AI Ethics: Seven Traps (Freedom To Tinker).

#####################################################

nuTonomy releases a self-driving car dataset:
…nuScenes includes 1,000 scenes from cars driving around Singapore and Boston…
nuTonomy ,a self-driving car company (owned by APTIV), has published nuScenes, a multimodal dataset that can be used to develop self-driving cars.

Dataset: Data within nuScenes consists of over 1,000 distinct scenes of about 20 seconds in length each, with each scene accompanied by data collected from five radar, one lidar, and six camera-based sensors on a nuTonomy self-driving vehicle.

  The dataset consists of ~5.5 hours of footage gathered in San Francisco and Singapore, and includes scenes in rain and snow. nuScenes is “the first dataset to provide 360 sensor coverage from the entire sensor suite. It is also the first AV dataset to include radar data and the first captured using an AV approved for public roads.” The dataset is inspired by self-driving car dataset KITTI, but has 7X the total number of annotations, nuTonomy says.

Interesting scenes: The researchers have compiled particularly challenging scenes, which include things like navigation at intersections and construction sites, the appearance of rare entities like ambulances and animals, as well as potentially dangerous situations like jaywalking pedestrians.

Challenging tasks: nuScenes ships with some in-built tasks, including calculating the bounding boxes, attributes, and velocities of 10 classes of object in the dataset.

Why this matters: Self-driving car datasets are frustratingly rare, given the high commercial value on them. nuScenes will give researchers a better sense of the attributes of data required for development and deployment of self-driving car technology.
  Read more: nuScenes: A multimodal dataset for autonomous driving (Arxiv).
  Navigate example scenes from nuScenes here (nuScenes website).
  nuScenes GitHub here (GitHub).
  Register and download the full dataset from here (nuScene).

#####################################################

Google’s robots learn to (correctly) throw things at a rate of 500 objects per hour:
…Factories of the future could have more in common with food fights than conveyor belts…
Google researchers have taught robots to transport objects around a (crudely simulated) warehouse by throwing them from one container to another. The resulting system demonstrates the power of contemporary AI techniques when applied to modern robots, but has too high a failure mode for practical deployment.

Three modules, one robot: The robot ships with a perception module, a grasping module, and a throwing module.
  The perception module helps the robot see the object and calculate 3D information about the object.
  The grasp module tries to predict the success of picking up the object.
  The throwing module tries to predict “the release position and velocity of a predefined throwing primitive for each possible grasp” and does so with the aid of a handwritten physics controller; it uses this signal, as well as a residual signal it tries to learn on top of it, to predict the appropriate velocity to use.

Residual physics: The system learns to throw the object by using a handwritten physics controller along with a function that learns residual signals on the control parameters of the robot. Using this method, the researchers generate a “wider range of data-driven corrections that can compensate for noisy observations as well as dynamics that are not explicitly modeled”.

The tricks needed for self-supervision: The paper discusses some of the tricks implemented by the researchers to help the robots learn as much as possible without the need for human intervention. “The robot performs data collection until the workspace is void of objects, at which point n objects are again randomly dropped into the workspace,” they write. “In our real-world setup, the landing zone (on which target boxes are positioned) is slightly tilted at a 15 angle adjacent to the bin. When the workspace is void of objects, the robot lifts the bottomless boxes such that the objects slide back into the bin”.

How well does it throw? They test on a ‘UR5’ robotic arm that uses an RG2 gripper “to pick and throw a collection of 80+ different toy blocks, fake fruit, decorative items, and office objects”. They test their system against three basic baselines, as well as humans. These tests indicate that the so-called ‘residual physics’ technique outlined in the paper is the most effective, compared to purely regression or physics-based baselines.
  The robot approaches human performance at gripping and throwing items, with humans having a mean successful throw rate of 80.1% (plus or minus around 10), versus 82.3% for the robot system outlined here.
  This system can pick up and throw up to 514 items per hour (not counting the ones it fails to pick up). This outperforms other techniques, like Dex-Net or Cartman.

Why this matters: TossingBot shows the power of hybrid-AI systems which pair learned components with hand-written algorithms that incorporate domain knowledge (eg, a physics-controller). This provides a counterexample to some of the ‘compute is the main factor in AI research’ arguments that have been made by people like Rich Sutton. However, it’s worth noting lots of the capabilities of TossingBot themselves depend on cheap computation, given the extensive period for which the system is trained in simulation prior to real-world deployment.

Additionally, for transporting objects around a factory, that manufacturers would demand a success rate of 99.9N% (or even 99.99N%), rather than 80.N, suggesting that the performance of 82.3% success for this system puts it some ways off from real-world practical usage – if you deployed this system today, you’d expect one out of every five products to not make it to their designated location (and they would probably incur damage along the way).
  Read more: TossingBot: Learning to Throw Arbitrary Objects with Residual Physics (Arxiv).
  Check out videos and images of the robots here (TossingBot website).

######################################################

UK government wants swarms of smart drones:
New grant designed to make smart drones that can save people, or confuse and deceive them…
The UK government has put together “the largest single contract” awarded by its startup-accelerator-for-weapons ‘Defense and Security Accelerator’ (DASA) organization.

The grant will be used to develop swarms of drones that could be used for applications like: medical assistance, logistics resupply, explosive ordnance detection and disposal, confusion and deception, and situational awareness.

The project will be led by Blue Bear Systems Research Ltd, a UK defence development company that has worked on a variety of different unmanned aerial vehicles like abackpack-sized ‘iStart’ drone, a radiation-detecting ‘RISER’ copter, and more.

Why this matters: Many of the world’s militaries are betting their strategic future on the development of increasingly capable drones, ideally ones that work together in large-scale ‘swarms’ of teams and sub-teams. Research like this indicates how advanced these systems are becoming, and the decision to procure initial prototype systems via a combination of public- and private-sector cooperation seems representative of way such systems will be built & bought in the future.
  I’ll be curious if in a few years we see larger grants for funding the development of autonomous capabilities for the sorts of hardware being developed here.
Read more: £2.5m injection for drone swarms (UK Ministry of Defence press release).
More about Blue Bear Systems Research here (BBSR website).

######################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Google appoints AI ethics council:
Google have announced the creation of a new advisory council, to help the company implement their AI principles, announced last year. The council, which is made up of external appointees, is intended to complement Google’s “internal governance structure and processes”.

Who they are: Of the 8 inaugural appointees, 5 are from academia, 2 are from policy, and 1 is from industry. Half of the council are drawn from outside the US (UK, South Africa, Hong Kong).

Why it matters: This seems like a positive move, insofar as it reflects Google taking ethical issues seriously. With so few details, though, it is difficult to judge whether this will be consequential – e.g. we do not know how the council will be empowered to affect corporate decisions.
  Read more: An external advisory council to help advance the responsible development of AI (Google).

DeepMind’s Ethics Board:
A recent profile of DeepMind co-founder, Demis Hassabis, has revealed new details about Google’s 2014 acquisition of the company. As part of the deal, both parties agreed to an ‘Ethics and Safety Review Agreement’, designed to ensure the parent company was not able to unilaterally take control of DeepMind’s intellectual property. Notably, if DeepMind succeed in their mission of building artificial general intelligence, the agreement gives ultimate control of the technology to an Ethics Board. The members of the board have not been made public, though are reported to include the three co-founders.
  Read more: DeepMind and Google: the battle to control artificial intelligence (Economist).

######################################################

Tech Tales:

Alexa I’d like to read something.

Okay, what would you like to read?

I’d like to read about people that never existed, but which I would be proud to meet.

Okay, give me a second… did you mean historical or contemporary figures?

Contemporary, but they can have died when I was younger. Contemporary within a generation.

Okay, and do you have any preference on what they did in their life?

I’d like them to have achieved things, but to have reflected on what they had achieved, and to not feel entirely proud.

Okay. I feel I should ask at this stage if you are okay?

I am okay. I would like to read this stuff. Can you tell me what you’ve got for me?

Okay, I’m generating a list… okay, here you go. I have the following titles available, and I have initiated processes to create more. Please let me know if any are interesting to you:

 

  • Great joke! Next joke!, real life as a stand-up comedian.

  • Punching Dust, a washed-up boxer tells all.
  • Here comes another one, the architect of mid-21st Century production lines.
  • Don’t forget this! Psychiatry in the 21st century.
  • I have a bridge to sell you, confessions of a con-artist.

 

And so I read them. I read about a boxer whose hands no longer worked. I read about a comedian who was unhappy unless they were on stage. I read about the beautiful Sunday-Sudoko process of automating humans. I read about memory and trauma and their relationships. And I read about how to tell convincing lies.

They say Alexa’s next ability will be to “learn operator’s imagination” (LOI), and once this arrives Alexa will ask me to tell it stories, and I will tell it truths and lies and in doing so it will shape itself around me.

Things that inspired this story: Increasingly powerful language models; generative models; conditional prompts; personal assistants such as Alexa; memory as therapy; creativity as therapy; the solipsism afforded to us by endlessly customizable, generative computer systems.