Import AI

Import AI 209: Tractors+AI; AlphaFold makes more COVID progress; and UK government pulls immigration algorithm

How Google predicted the COVID epidemic:
…Cloud companies as 21st century weather stations…
Modern technology companies are like gigantic sensory machines that operate at global scale – they can detect trends way ahead of smaller entities (e.g, governments), because they have direct access to the actions of billions of people worldwide. A new blog post from Google gives us a sense of this, as the company describes how it was able to detect a dramatic rise in usage of Google Meet in Asia early in the year, which gave it a clue that the COVID pandemic was driving changes in consumer behavior on its platform. Purely from demand placed on Google’s systems, “tt became obvious that we needed to start planning farther ahead, for the eventuality that the epidemic would spread beyond the region”.

Load prediction and resource optimization: Google had to scale Google Meet significantly in response to demand, which meant the company invested in tools for better demand prediction, as well as tweaking how its systems assigned hardware resources to services running Google Meet. “By the time we exited our incident, Meet had more than 100 million daily meeting participants.”
  Read more: Three months, 30X demand: How we scaled Google Meet during COVID-19 (Google blog).

###################################################

ML alchemy: extracting 3D structure from a bunch of 2D photographs:
…NeRF-W means tech companies are going to turn their userbase into a distributed, global army of 3D cartographers…
It sounds like sci-fi but it’s true – a new technique lets us grab a bunch of photos of a famous landmark (e.g, the Sistine Chapel), then use AI to figure out a 3D model from the images. This technique is called NeRF-W and was developed by researchers with Google as an extension of their prior work, NERF.

How it works: NERF previously only worked on reconstructing objects from well composed photographs with relatively little variety. NeRF-W extends this by being able to use photos with variable lighting and photometric post-processing, as well as being able to better disentangle the subjects of images from transient objects near them (e.g, cars, people). The resulting samples are really impressive (seriously, check them out), though the authors admit the system has flaws and “outdoor scene reconstruction from image data remains far from being fully solved”.

Why this matters – easy data generation (kind of): Techniques like NeRF-W are going to make ti easy for large technology entities with access to huge amounts of photography data (e.g, Google, Facebook, Tencent, etc) to create 3D maps of commonly photographed objects and places in the world. I imagine that such data will eventually be used to bootstrap the training of AI systems, either by providing an easy-to-create quantity of 3D data, or perhaps for automatically extracting environments from reality and training AIs against them in simulation, then re-uploading them onto robots running in the real world.
  Read more: NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections (arXiv).
Check out videos at the paper overview site (Github).

###################################################

Blue River Technology: Tractors + machine learning = automated weed killers:
…”The machine needs to make real-time decisions on what is a crop and what is a weed”…
Agricultural robot startup Blue River Technologies is using machine learning to classify weeds on-the-fly, letting farmers automate the task of spraying crops with weedkillers. The project is a useful illustration of how mature machine learning is becoming and all the ways it is starting to show up in the economy.

On-tractor inference: They built a classifier using PyTorch, then ran it on a mobile NVIDIA Jetson AGX Xavier system for on-tractor inference. Interestingly, they do further optimization by converting their JIT models to ONNX (Import AI: 70), then converting ONNX to TensorRT, showing that the dream of multi-platform machine learning might be starting to occur.
    Your 2020 tractor is a 2007 supercomputer: “The total compute power on board the robot just dedicated to visual inference and spray robotics is on par with IBM’s super computer, Blue Gene (2007)”, they write.
  Read more: AI for AG: Production machine learning for agriculture (Medium).

###################################################

Delicious: 1.7 million arXiv articles in machine readable form:
Want to use AI to analyze AI that itself analyzes AI? Get the dataset:
arXiv is to machine learning as the bazaar is to traders – it’s the place where everyone lays out their wares, browses what other people have, and also supports a variety of secondary things like ‘what’s happening in the bazaar this week’ publications (Import AI, other newsletter). Now, you can get arXiv on Kaggle, making it easier to write software to analyze this delug of 1.7 million articles.

The details: “We present a free, open pipeline on Kaggle to the machine-readable arXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more,” Kaggle writes. “Our hope is to empower new use cases that can lead to the exploration of richer machine learning techniques that combine multi-modal features towards applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction and semantic search interfaces.”
  Read more: Leveraging Machine Learning to Fuel New Discoveries with the arXiv Dataset (arXiv.org blog).
  Get the dataset on Kaggle (Kaggle).

###################################################

Explore the immune system with Recursion’s RxRx2 dataset:
Recursion, a startup that uses machine learning to aid drug discovery, has released RxRx2, a dataset of 131,953 fluorescent microscopy images and their deep learning embeddings within the immune microenvironment.  The dataset is about ~185GB in size, and follows Recursion’s earlier release of RxRx1 last year (Import AI 155).

“RxRx2 demonstrates both the great variety of morphological effects soluble factors have on HUVEC cells and the consistency of these effects within groups of similar function,” Recursion writes. “Through RxRx2, researchers in the scientific community will have access to both the images and the corresponding deep learning embeddings to analyze or apply to their own experimentation… scientific researchers can use the data to further demonstrate how high-content imaging can be used for screening immune responses and identification of functionally-similar factor groups.”

Why this matters – AI-automated scientific exploration: Datasets like this will help us develop techniques to analyze and map the high-dimensional relationships of vast troves of microscopy and other medical data. It’ll be interesting to see if in a few years benchmarks start to emerge that test out systems on suites of tests, including datasets like RxRx2 – that would give us a more holistic sense of progress in this space and how the world will be changed by it.
  Read more: RxRx2 (Recursion’s ‘RxRx’ dataset site).

###################################################

Can computers help with the pandemic? DeepMind wants to try:
…AlphaFold seems to be making increasingly useful COVID predictions…
DeepMind published some predictions from its AlphaFold system about COVID back in March (Import AI: 189), and has followed up with “our most up-to-date predictions of five understudied SARS-CoV-2 targets here (including SARS-CoV-2 membrane protein, Nsp2, Nsp4, Nsp6, and Papain-like proteinase (C Terminal domain))”.

Is AlphaFold useful? It’s a bit too early to say, but the results from this study are promising. “The experimental paper confirmed several aspects of our model that at first seemed surprising to us (e.g. C133 looked poorly placed to form an inter-chain disulfide, and we found it difficult to see how our prediction would form a C4 tetramer). This bolsters our original hope that it might be possible to draw biologically relevant conclusions from AlphaFold’s blind prediction of even very difficult proteins, and thereby deepen our understanding of understudied biological systems,” DeepMind wrote.

Why this matters: The dream of AI is to be able to dump compute into a problem and get an answer out that is a) correct and either b) arrives faster than a human could generate it or b) is better/more correct than what a human could do. COVID has shown the world that our AI systems are (so far) not able to do this stuff seamlessly yet, but the success of things like AlphaFold should give us some cause for optimism.
  Read more: Computational predictions of protein structures associated with COVID-19 (DeepMind).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

UK government pulls biased immigration algorithm

The UK Home Office has stopped using a controversial automated decision-making process for visa applicants. Their algorithm, used since 2015, sorted applicants into three risk categories. It emerged that the sorting was partly on the basis of nationality, with applications from certain countries being automatically categorised as high-risk. This appears to have led to an unfortunate feedback loop, since rejection-rates for specific countries were feeding into the sorting algorithm. Civil liberties groups had been pursuing a legal challenge on the basis that the algorithm was unlawfully discriminating against individuals. 

   Read more: Home Office to scrap ‘racist algorithm’ for UK visa applicants (Guardian)

   Read more: How we got the government to scrap the visa streaming algorithm (Foxglove)

 

What to do about deepfakes:
Deepfakes are convincing digital forgeries, typically of audio or video. In recent years, people have been concerned that a proliferation of deepfakes might pose a serious risk to public discourse. This report from CSET outlines the threat from deepfakes, setting out two scenarios, and offering some policy recommendations.


Commodified deepfakes: Deepfakes could proliferate widely, and become so easy to produce that they become a ubiquitous feature of the information landscape. This need not be a major problem if defensive technologies are able to keep up with the most widely-used forgery methods. One reason for optimism is that as a particular method proliferates, this provides more data on which to train detector systems. It’s also worth noting that since online content distribution is highly-concentrated on a handful of platforms (Facebook, Google, Twitter), rolling out effective detection on these networks may be sufficient to prevent the most damaging effects of widespread proliferation. 


Tailored deepfakes: More concerning is the possibility of targeted attacks to achieve some specific objective, particularly ‘zero-days’, where attackers exploit a vulnerability previously unknown to defenders, or attacks targeted at specific individuals through pathways that aren’t as well-defended (e.g. a phone line or camera feed vs. a social network).


Recommendations: (1) maintain a shared database of deepfake content to continually train detection algorithms on the latest forgeries; (2) encourage more consistent and transparent documentation of cutting-edge research in digital forgery; (3) commodify detection, empowering individuals and platforms to employ the latest detection techniques; (4) proliferate so-called ‘radioactive data’ — containing subtle digital signatures making it easy to detect synthetic media generated from it — in large public datasets.


Matthew’s view: This is a great report providing clarity to a threat that I’ve felt can sometimes be exaggerated. We already rely heavily on several media that are very easy to forge — text, photos, signatures, banknotes. We get by through some combination of improvements in detection technology, and a circumspect attitude to forgable media. There will be switching costs, as we adjust (e.g.) to trusting video less, but I struggle to see it posing a major risk to our relationship with information. The threat from deepfakes is perhaps best understood as just one example of two more general worries: that advances in AI might disproportionately favor offensive capabilities more than defensive ones (see Import 179); and that surprisingly fast progress might have a destabilizing effect in certain domains where we cannot respond quickly enough.

   Read more: Deepfakes — A Grounded Threat Assessment (CSET)

###################################################

Tech Tales:

Speak and Spell or Speak Through Me, I Don’t Care, I Just Need You To Know I Love, and That I Can and Do Love You
[2026, A house in the down-at-heel parts of South East London, UK.]

He didn’t talk when he was a child, but he would scream and laugh. Then when it got to talking age, he’d grunt and point, but didn’t speak. He wasn’t stupid – the tests said he had a “rich, inner monologue”, but that he “struggled to communicate”. 

All his schools had computers in them, so he could type, and his teachers mostly let him email them – even during class. “He’s a good kid. He listens. Sure, sometimes he makes noises, but it doesn’t get out of hand. I see no reason to have him moved to a different classroom environment,” one of his teachers said, after the parent of a child who didn’t like him tried to have him moved.

He used computers and phones – he’d choose from a range of text and audio and visual responses and communicate through the machines. His parents helped him build a language that they could use to communicate.
I love you, they’d say.
He’d make a pink light appear on his laptop screen.
We can’t wait to see what you do, they’d say.
He’d play a gif of a wall of question marks, each one rotating.
Make sure you go to bed by midnight, they’d say.
; ), he’d sign.

As he got older, he got more frustrated with himself. He felt like a normal person trapped inside a broken person. Around the age of 15, hormones ran through him and gave him new ideas, goals, and inclinations. But he couldn’t act on them, because people didn’t quite understand him. He felt like he was locked in a cell, and he could smell life outside it, but couldn’t open the door.

He’d send emails to girls in his class and sometimes they’d not reply, or sometimes they’d reply and make fun of him, and very rarely they’d reply genuinely – but because of the noises he made and how other people made fun of girls for hanging out with him, it never got very far.

One day, the summer he was 16, there was a power cut in his house. It was the evening and it was dark outside. His computer shut down. He got frustrated and started to make noises. Fumbled under his desk. Found a couple of machines he could run off of batteries and his phones. Set them up. They were like microphones on stands, but at the end of the microphone was a lightcone whose color and brightness he could change, and the microphone had a small motor where it connected to the stand which let it tilt back and forth – a simple programmable robot. He made it move and cast some lights on the wall.

His parent came in and said are you okay?
He nodded his lamp and the light was a pinkish hue of love, moving up and down on the wall.
That’s great. Can I read with you?
He nodded his lamp and his parent sat down next to his chair and took out their phone, then started reading something. While they read, he researched the etymology of a certain class of insects, then he used an architectural program to slowly build a mock nest for these insects. He made noises occasionally and his parent would look up, but see him making the object – they knew he was fine.
They read like that together, then his parent said: I’m going to go and make some food. Can you still beep me?
He made the phone in their pocket beep.
Good. I’ll be right there if you need me.

They went away and he carried on reading about the insects, letting them run around his mind, nurturing other ideas. He was focused and so he didn’t notice his batteries running down.

But they did. Suddenly, the lights went out and the little robot sticks drooped down on their stands. He just had his phone. But he didn’t beep. He looked at his phone and wanted to make noises, but held his breath. Waited.  Watched the battery go to 5%, then 2%, then 1%, then it blinked out. 

He closed his eyes in the dark; thought about insects.
Rocked back and forth.
Balled his hands into fists. Released and repeated.
Held a 3-D model of one of the insects in his mind and focused on rotating it, while breathing fast.


Then, inside the darkness of his shut eyes, he saw a glow.
He opened his eyes and saw his parent coming in with a candle.
They brought it over to him and he was making noises because he was nervous, but they didn’t get flustered. Kept approaching him. They carried the candle in one hand and a long white tube in the other. Put the candle down next to him and sat on the floor, then looked at him.

Are you okay? they said
He rocked back and forth. Made some noises.
I know this isn’t exactly easy for you, they said. I get that.
He managed to nod, though couldn’t look at them. Felt nervousness rise.
Look at this, his parent said.
He looked, and could see they had  a poster, wrapped up. They unrolled it. It was covered in fluorescent insects; little simulacras of life giving off light in the dark room, casting luminous shadows on the wall. He and his parent looked at them.
I printed these yesterday, she said. Are they the same insects you were looking at today?
He looked – they were similar, but not quite the same. There was something about them, cast in shadow, that felt like they reflected a different kind of insect – one that was more capable and independent. But it was close. He nodded his head while rocking back and forth. Made some happy noises.
Now, which one of these would you say I most resemble?
He pointed at an insect that had huge wings and a large mid-section.
They laughed. Coming from anyone else, that’d be an insult, they said. Then looked him. But I think it’s fine if you say it. Now let me show you which one I think you are.
He clapped, which was involuntary, but also a sign of his happiness, and the candle went out.
Whoops, they said.
They sat in the dark together, looking at the shadows of their insects on the wall.

A few minutes later, their eyes adjusted to the darkness. Both of them could barely see each other, but they could more easily see the paper in between them, white – practically shining in the darkness – with splotches of color from all the creations of nature. And until the power came back on, they sat with the paper between them, saying things and pointing at the symbols, and using the insects as a bridge to communicate between two people.

By the time the power came back on, some insects meant love, and some insects meant hate. One butterfly was a memory of a beach and another was a dream. They had built this world together, though halting half-sounds and half-blind hands. And it was rich and real and theirs.
He could not speak, but he could feel, and when his parent touched some of the insects, he knew they could feel what he felt, a little. And that was good.

Things that inspired this story: AI tools used as expressive paintbrushes; idiosyncratic-modification as the root goal of some technology, how most people use tools to help them communicate; the use of technology to augment thinking and, eventually, enhance our ability to express ourselves truthfully to those around us; the endless malleability and adaptability of people.

Import AI 208: Google trains a vision system in 1 minute; Gender+AI = bad; Ubisoft improves character animation with ML

Genderify: Uh oh – AI & Bias & Gender
…AI + Gender Identification = No, this is a bad idea…
Last week, an AI startup called Genderify launched on Product Hunt and within days shut down its website, deleted its twitter account (though it still has a microsite on Product Hunt). What happened? The short answer is the service tried to predict gender from names and titles. After it launched, many users demonstrated a series of embarrassing failures of the system, which neatly illustrated why a complex topic like gender is not one that should be approached with a machine learning blunderbuss.

Why this matters: Using automated tools to infer the gender of someone is a bad idea – that’s partially why the global standard is to ask the user/civilian/employee to self-identify their gender out of a menu of options when gender information is preferred. Think of how complex human names are and how often you’ve heard a person’s name and mis-gendered them in your head. Now add to that the fact that people with certain gendered names may consciously use a different pronoun to the one suggested by the name. Does an AI system have any good way, today, to guess this stuff with decent accuracy? No it doesn’t! So if you build tools to do gender classification, you’re basically committing yourself to getting a non-trivial % of your classifications wrong.
  (There are some use cases where this may be somewhat okay, like doing an automated scan of all the names of all the faculty in a country and using that to provide some very basic data on the potential gender difference. I expect there to be relatively few use cases like this, though, and tools like Genderify are unlikely to be that helpful.)
  Read more: Service that uses AI to identify gender based on names looks incredibly biased (The Verge).

###################################################

Things are getting faster – Google sets new MLPerf AI training record:
…..TPU pods go brrrrr….
Google has set performance records in six out of eight MLPerf benchmarks, defining the new (public) frontier of large-scale compute performance. MLPerf is a benchmark that masures how long it takes to train popular AI components like residual nets, a mask r-cnn, transformer, a BERT model, and more.

Multi-year progress: These results show the time it takes Google to train a ResNet50 network to convergence against ImageNet, giving us performance for a widely used, fairly standard AI task:
– 0.47 seconds: July 2020, MLPerf 0.7.
– 1.28 minutes: June 2019, MLPerf 0.6.
– 7.1 minutes: May 2018, MLPerf 0.5.
– Hours – it used to take hours to train this stuff, back in 2017, even at the frontier. Things have sped up a lot.

Why this matters: We’re getting way better at training large-scale networks faster. This makes it easier for developers to iterate while exploring different architectures. It also makes it easier to rapidly retrain networks on new data. Competitions like MLPerf illustrate the breakneck pace of AI progress – a consequence of years of multibillion-dollar capital expenditures by large technology companies, coupled with the nurturing of vast research labs and investment in frontier processors. All of this translates to a general speedup of the cadence of AI development, which means we should expect to be more surprised by progress in the future.
  Read more: Google breaks AI performance records in MLPerf with world’s fastest training supercomputer (Google blog).

###################################################

Trouble in NLParadise – our benchmarks aren’t very good:
…How are our existing benchmarks insufficient? Let me count the ways…
Are benchmarks used to analyze language systems telling us the truth? That’s the gist of a new paper from researchers with Element AI and Stanford university, which picks apart the ‘ecological validity’ of how we test language user interfaces (personal assistants, customer support bots, etc). ‘Ecological validity’ is a concept from psychology that “is a special case of external validity, specifying the degree to which findings generalize to naturally occurring scenarios”.

Five problems with modern language benchmarks:
The researchers identify five major problems with the ways that we develop and evaluate advanced language systems today. These are:
– Synthetic language: Projects like BabyAI, which seek to generate simple instructions using an environment with a restricted or otherwise synthetic dictionary. This means that as you try to increase the complexity of the instructions you express, you can reduce the overall intelligibility of the system. “Especially for larger domains it becomes increasingly difficult and tedious to ensure the readability of all questions or instructions”.
– Artificial tasks: Many research benchmarks “do not correspond to or even resemble a practically relevant LUI setting”. Self explanatory.
– Not working with potential users of the system: For example, the visual question answering (VQA) competition teaches computers to caption images. However, “although the visual question answering task was at least partly inspired by the need to help the visually impaired, questions were not collected from blind people. Instead, human subjects with 20/20 vision were primed to ask questions that would stump a smart robot”.
  Another example of this is the SQuAD dataset, which was “collected by having human annotators generate questions about Wikipedia articles… these crowdworkers had no information need, which makes it unclear if the resulting questions match the ones from users looking for this information”.

– Scripts and priming: Some tests rely on scripts that constrain the type of human-computer interaction, e.g by being customized for a specific context like making reservations. Using scripts like this can trap the systems into preconceived notions of operation that might not work well, and subjects that generate the underlying data might be primed by what the computer says to respond in a similar style (“For example, instead of saying ‘I need a place to dine at in the south that serves chinese’, most people would probably say “Chinese restaurant” or “Chinese food”

– Single-turn interfaces: Most meaningful dialog interactions involve several exchanges in a conversation, rather than just one. Building benchmarks that solely consist of single questions and responses, or other single turns of dialog, might be artificially limiting, and could create systems that don’t generalize well.

So, what should people do? The language system community should build benchmarks that have what people could call ‘ecological validity’, which means they’re built with the end users in mind, in complex environments that don’t generate contrived data. The example the authors give “are the development of LUI benchmarks for popular video game environments like Minecraft or for platforms that bundle user services on the Internet of Things …[or] work on LUIs tht enable citizens to easily acces statistical information published by governments.
  Read more: Towards Ecologically Valid Research on Language User Interfaces (arXiv).


###################################################

Boston Dynamics + vehicle manufacturing:
…Rise of the yellow, surprisingly cute machines…
Boston Dynamics’ ‘Spot’ quadruped is being used by the Ford Motor Company to surveil and laser-scan its manufacturing facilities, helping it continuously map the space. This is one of the first in-the-wild uses of the Spot robots I’ve seen. Check out the video to see the robot and some of the discussion by its handler about what its like to have a human and a robot work together.
  Watch the video: Fluffy the Robot Dog Feature from Ford (YouTube).

###################################################

How Ubisoft uses ML for video games:
…Motion matching? How about Learned Motion Matching for a 10X memory efficiency gain?…
Researchers with game company Ubisoft have developed Learned Motion Matching, a technique that uses machine learning to reduce the memory footprint for complex character animations.

Learned Motion Matching: Typical motion matching systems work “by repeatedly searching the dataset for a clip that, if played from the current location, would do a better job of keeping the character on the path than the current clip”. Learned motion matching aims to simplify the computational expensive search part of the operation by swapping out expensive search operations for various ML components, which provide an approximation of the search space.

10X more efficient:
– 5.3MB:  Memory footprint of a Learned Motion Matching system.
-52.1MB: Memory footprint of a typical motion matching system.

Why this matters: Approximation is a powerful thing, and a lot of the more transformative uses of machine learning are going to come from building these highly efficient function approximators. This research is a nice example of how you can apply this generic approximation capability. It also suggests games are going to get better, as things like this make it easier to support a larger and more diverse set of behaviors for characters. “Learned Motion Matching is a powerful, generic, systematic way of compressing Motion Matching based animation systems that scales up to very large datasets. Using it, complex data hungry animation controllers can be achieved within production budgets,” they write.
  Read more: Introducing Learned Motion Matching (Ubisoft Montreal).
Read the research paper here (PDF).
Watch a video about the research here: SIGGRAPH 2020 Learned Motion Matching (Ubisoft, YouTube).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Revisiting the classic arguments for AI risk
Ben Garfinkel, researcher at Oxford’s Future of Humanity Institute, is interviewed on the 80,000 Hours podcast. The conversation focuses on his views of the ‘classic’ arguments for AI posing an existential risk to humanity, particularly those presented by Nick Bostrom and Eliezer Yudkowsky. A short summary can’t do justice to the breadth and nuance of the discussion, so I encourage readers to listen to the whole episode.

A classic argument: Humans are the dominant force on Earth, and we owe this to our superior cognitive abilities. We are trying to build AI systems with human-level cognitive abilities, and it’s going quite well. If we succeed in building human-level AI, we should expect it to be quickly followed by greater-than-human-level AI. At this point, humans would cede our status as the most cognitively-advanced entity. Without a plan for ensuring our AI systems do what we want, there is a risk that it will be them, and not us, that call the shots. If the goals of these AI systems come apart from our own, even quite subtly, things will end badly for humans. So the development of advanced AI poses a profound risk to humanity’s future.

 

Discontinuity: One objection to this style of argument is to put pressure on the assumption that AI abilities will scale up very rapidly from their current level, to human-, and then super-human levels. If instead there is a slower transition (e.g. on the scale of decades), this gives us much more time to make a plan for retaining control. We might encounter ‘miniature’ versions of the problems we are worried about (e.g. systems manipulating their operators) and learn how to deal with them before the stakes get too high; we can set up institutes devoted to AI safety and governance; etc.

 

Orthogonality: Another important assumption in the classic argument is that, in principle, the capabilities of a system impose no constraints on the goals it pursues. So a highly intelligent AI system could pursue pretty much any goal, including those very different from our own. Garfinkel points out that while this seems right at a high level of abstraction, it doesn’t chime with how the technology is in fact developed. In practice, building AI systems is a parallel process of improving capabilities and better specifying goals — they are both important ingredients in building systems that do what we want. On the most optimistic view, one might think it likely that AI safety will be solved by default on the path to building AGI.

 

Matthew’s view: I think it’s right to revisit historic arguments for AI risk, now that we have a better idea of what advanced AI systems might look like, and how we might get there. The disagreements I highlight are less about whether advanced AI has the potential to cause catastrophic harm, and more about how likely we are to avoid these harms (i.e. what is the probability of catastrophe). As Garfinkel notes, a range of other arguments for AI risk have been put forward more recently, some of which are more grounded in the specifics of the deep learning paradigm (see Import 131). Importantly, he believes that we are investing far too little in AI safety and governance research — as he points out, humanity’s cumulative investment into avoiding AI catastrophes is less than the budget of the 2017 movie ‘The Boss Baby’.

   Read more: Ben Garfinkel on scrutinising classic AI risk arguments (80,000 Hours)
  Read more: AMA with Ben Garfinkel (EA Forum).

Philosophers on GPT-3:

Daily Nous has collected some short essays from philosophers, on the topic of OpenAI’s GPT-3 language model. GPT-3 is effectively an autocomplete tool — a skilled predictor of which word comes next. And yet it’s capable, to varying degrees, of writing music; playing chess; telling jokes; and talking about philosophy. This is surprising, since we think our own ability to do all these things is due to capacities that GPT-3 lacks — understanding, perception, agency. 

Generality: GPT-3’s skill for in-context learning allows it to perform well at this wide range of tasks; in some cases as well as fine-tuned models. In Amanda Askell’s phrase, it is a “renaissance model”. What should we make of these glimmers of generality? Can we predict the range of things GPT-3, and future language models, will be able to do? What determines the limits of their generality? Is GPT-3 really generalizing to each new task, or synthesizing things it’s already seen—is there a meaningful difference?

Mimicry: In some sense GPT-3 is no more than a talented mimic. When confronted with a new problem, it simply says the things people tend to say. What’s surprising is how well this seems to work. As we scale up language models, how close can they get to perfect mimicry? Can they go beyond it? What differentiates mimicry and understanding?

Matthew’s view: It’s exciting to see how language models might shed light on stubborn philosophical problems that have hitherto been the domain of armchair speculation. I expect there’ll be more and more fruitful work to be done at the intersection of state-of-the-art AI and philosophy. If you find these questions as interesting as I do, you might enjoy Brian Christian’s excellent book, ‘The Most Human Human’.

GPT-3 replies: Naturally, people have prompted GPT-3 to reply to the philosophers. One, via Raphaël Millière, contained some quite moving sentiments — “Despite my lack of things you prize, you may believe that I am intelligent. This may even be true. But just as you prize certain qualities that I do not have, I too prize other qualities in myself that you do not have.” 

Read more: Philosophers on GPT-3 (Daily Nous)

Read more: GPT-3 replies, via Raphaël Millière (Twitter)

Read more: The Most Human Human, by Brian Christian (Amazon)



###################################################

Tech Tales:

[2025: Boston, near the MIT campus]

The Interrogation Room

It was 9pm and the person in the room was losing it. Which meant I was out $100. They’d seemed tougher, going in.
  “What can I say, I’m confident in my work,” Andy said, as I handed him the money.

They got the confession a little after that. Then they took the person to the medical facility. There, they’d spend a week under observation. If they’d really lost it, they might stay longer. We didn’t make bets on that kind of thing, though.

My job was to watch the interrogation and take observational notes. The AI systems did most of the work, but the laws were structured so you always had a “human-in-the-loop”. Andy was a subcontractor who worked for a tech company that helped us build the systems. He could’ve watched the interrogations remotely, but he and I would watch them together. Turns out we both liked to think about people and make bets on them.

The person had really messed up the room this time. We watched on the tape as they pounded the cell walls with their fists. Watched as they clutched their hand after breaking a couple of knuckles. Watched as they punched the wall again.

“Wanna go in?” I said.
“You read my mind,” Andy said.

The room smelled of sweat and electricity. There was a copper tone from the blood and the salt. There was one chair and a table with a microphone built into it. I stuck my face close to the microphone and closed my eyes, imagining I could feel the residual warmth of the confession.

“It still feels strange that there wasn’t a cop in here,” I say.
“With this tech, you don’t need them,” Andy said. “You just wind people up and let them go. Works four out of five times.”
“But is it natural?”
“The only unnatural part is the fact it happens in this room,” he said, with a faraway look in his eyes. “Pretty soon we’ll put the stuff on people’s phones. Then they’ll just make their confessions at home and we won’t need to do anything. Maybe start thinking how to bet on a few hundred people at a time, eh?” he winked at me, then left.

I stayed in the room for another half hour. I closed my eyes and stood still and tried to imagine what the person had heard in here. I visualized the notes in my head:

Synthesized entities exposed to subject and associated emotional demeanor and conversational topic:
– First wife (deceased, reconstructed from archives): ‘loving/compassionate’ mode; discussed guilt and release of guilt.
– Grandfather (deceased, reconstructed from archives & shared model due to insufficient data): ‘scolding/wisdom’ mode; discussed time and cruelty of time.
– Second wife: ‘fearful/pleading’ mode; discussed how her fear could be removed via confession.
– Victim (deceased, reconstructed from archives): ‘sad/reflective’ mode; talked about the life they had planned and their hopes.

I tried to imagine these voices – I could only do this by imagining them speaking in the voices of those who were close to me. Instead of the subject’s first wife, it was my ex girlfriend; my grandfather instead of theirs; my wife instead of their second wife; someone I hurt once instead of their victim.

And when I heard these voices in my head I asked myself: how long could I have lasted, in this room? How long before I would punch the walls? How long before I’d fight the room to fight myself into giving my own confession?

Things that inspired this story: Generative models; voice synthesis; text synthesis; imagining datasets so large that we have individual ‘people vectors’ to help us generate different types of people and finetune them against a specific context; re-animation and destruction; hardboiled noir; Raymond Chandler;

Import AI 207: Counter AI; Nixon’s deepfaked moon speech; and the future of AI-driven negotiation 

WordCraft: RL + Language
…Want smarter machines? Teach them alchemy (?)…
Researchers with UCL and the University of Oxford have built WordCraft, an RL-based environment for testing out agents that need to use language to reason their way through the world. The RL environment is a simplified text-only version of the game Little Alchemy 2, where you craft objects by combining other objects together (e.g, combining ‘water’ and ‘earth’ makes mud). “Learning policies that generalize to unseen entities and combinations requires commonsense knowledge about the world”, they write. The environment is also efficient, running at 8000 steps a second on a single machine, making it a useful choice for compute-starved researchers.

Why this matters: In the next few years we’re going to discover whether we need specialized architectures to do generative, combinatorial reasoning, or whether large-scale pre-trained models (e.g, GPT-3) with some fine-tuning can do these tasks themselves. Systems like WordCraft will help us test out these sorts of questions.
  Read more: WordCraft: An Environment for Benchmarking Commonsense Agents (arXiv).

####################################################

Survey: Help improve the AI Index:
The AI Index, a Stanford initiative to track, measure, and analyze progress in artificial intelligence, is scaling up its efforts ahead of its annual report. If you’d like to offer feedback on the 2019 AI Index report, areas for the AI Index to look into this year, and other advice, please fill out the survey.

Why this matters: I think measurement is, eventually, inextricably linked with policy – at some point, we’ll figure out ways to assess and measure the traits of various aspects of AI, and these measures will get baked into the larger societal policy apparatus. The idea for the AI Index is to openly prototype different ways of measuring and describing progress in AI, so when policymakers head in this direction there’ll be some prior work. (The AI Index is one of among a multitude of such measurement schemes, I should note).
  Read more at the AI Index official site.
  Read the 2019 AI Index report here.
  Take the survey here (Stanford University Qualtrics survey).

####################################################

AI + Satellite Communication:
How might AI change the field of satellite communication? Mostly, it will make it more efficient, much like other places where it is applied. This is according to a research memo published by researchers with the Centre Tecnològic de Telecomunicacions de Catalunya, the Universitat Politècnica de Catalunya, and GMV Aerospace and Defense, in Spain, as well as Eutelsat in France and Reply in Turin. 

How could AI help satellites?
– Anomaly detection in telemetry data
– Optimizing satellite performance to avoid interference
– Systems to automatically detect and classify sources of interference (e.g, mispointed antennas, misconfigured equipment, etc).
– Machine learning techniques could help predict future causes of signal congestion and apply mitigations.

Why this matters: We’re definitely entering the era of ‘optimize everything’, and memos like this show how there’s growing interest in other fields. This all creates more incentives for people to apply ML in novel contexts, marginally increasing the efficiency of the world.
  Read more: On the Use of AI for Satellite Communications (arXiv).

####################################################

Civil unrest and AI versus AI
…Automatic modification of protest photos…
AI is an omni-use technology – image manipulation techniques let people modify photos for benign purposes (e.g, touching up selfies), or more harmful ones (e.g, making fake images, or synthetic faces used in information campaigns).
    But AI can also be used for purposes like pushing back on power structures. A good example of this is a new project from Stanford that lets you anonymize faces of protestors (so if you want to circulate a photograph on social media, you don’t put them in danger of identification and arrest). It’s a simple application where you upload a photo and it automatically finds faces and superimposes an image on top of them – they don’t store any data, either. In some sense, this is a crude-form of counter-AI (using an AI system to counter AI-based surveillance systems) – in the future, I expect people will write applications that integrate more directly with phone cameras, allowing on-device anonymization (see: Fawkes, below)..

AI ouroboros! The tool also illustrates the duality of AI research, because it relies on an open source crowd counting technology called LSC-CNN, developed by researchers in Bangalore, India, and trained on the massive, open source, crowd counting QNRF dataset. Guess what crowd counting techniques are mostly used for? A combination of surveillance and advertising and marketing related applications! So it’s interesting to see how the Stanford BLM counter-AI system is built on an AI component likely used in the systems that count (and subsequently) identify protestors.
  You can use the app here (BLM.Stanford.edu).
  Access the code repo here: Stanford MLGroup, BLM, GitHub.

Fawkes: Automatic anti-ai image fuzzing:
…Adversarial examples for good…
In related news, a team of researchers at the University of Chicago has developed ‘Fawkes’, an AI-driven image modification tool that makes tiny pixel-level changes to images, making them hard for widely-used facial recognition systems to detect. Fawkes “shows 100% effectiveness against state of the art facial recognition models (Microsoft Azure Face API, Amazon Rekognition, and Face++)”, the authors write. Maybe the on-device future is not so far away – though I’m curious if we’ll see an on-phone easy to use consumer app anytime soon.
  Read more: Image “Cloaking” for Personal Privacy (official site, University of Chicago).

####################################################

Think the 1969 moon landing was a success? This Nixon deepfake begs to differ:
…Re-imagining history via DeepFakes…
Researchers with MIT have developed a deepfake of former President Richard Nixon giving a speech about the failure of the 1986 moon landing. “In event of moon disaster’ is a public education project from MIT researchers that aims to educate people about synthetic audio and video via the creation of a convincing deepfake of Nixon giving the speech written for the world where the moon landing had failed.

Why this matters: “Even now, the ease of creating a convincing deepfake is a worrisome development in an already troubled media landscape. By making one ourselves, we wanted to show viewers how advanced this technology has become, as well as help them guard against the more sophisticated deepfakes that will no doubt circulate in the near future,” the researchers write.
  Go to the interactive website (official ‘In Event of Moon Disaster’ site).
  Read more: Tackling the misinformation epidemic with “In Event of Moon Disaster” (MIT News).
  Watch the trailer for it here (MIT Open Learning, YouTube).

####################################################

Tech Tales:

Delegation Machines
[A corporate file server, 2028]

It started out as delegation, like any relationship between a person and a tool.

We got the Ais to debate things for us – first it was appealing parking tickets, then it was responding to a broad spectrum of civil fines and basic legal disagreements. Lawyers started using the systems to help them figure out complex corporate arrangements, like mergers. Then countries started using them for trade agreements.

They were gigantic neural nets, pre-trained on vast amounts of data and fine-tuned on the minutiae of countries’ trade agreements. Countries soon figured out that if they could create smarter systems, they could make it cheaper to offer smart win-win deals to other countries. In this way, companies and countries started to race against eachother to build increasingly capable ‘decision systems’.

A few years passed, and the decision systems started to strain the limits of human knowledge. Then someone had an idea – what if they could get the AI negotiators to ask humans for advice when they’d reached a seemingly intractable point in negotiations; sometimes this worked, and the human would come up with a solution the AI had not yet figured out.

The humans then taught these AIs to re-shape themselves according to the subtle signals defined by the humans, leading to systems that could manifest novel computational circuity in response to new deals, or negotiations.

And, as most tools do, they became more expansive and multi-purpose and reliable, and people began to depend on them. The more people used them, the better they got. And that led to more money flowing into them, which led to more novel compositions being generated by them, which led to new deals and new recommendations which – though effective – were impossible for humans to understand.

But humans adapt, as they tend to. And they figured out something to let them still use these tools, a new profession, called a Decision Analyst. 

So now we read summaries, because that’s all we can understand. The raw data is available and our job is to sift through it. Think of us as like psychologists crossed with logicians crossed with archaeologists. We read through different conversations at different layers of abstraction, trying to peel back the onion skin of how these machines negotiate with one another.

But, recently, it has become hard for us to make our way down to the most obscure, computer-generated debates. We can get there, but we can’t understand what we find.

So that’s why we’re building the Centaur Librarian – a software tool which is an AI trained to work with a human to help them explore the deliberations of other AIs. “What does this section say,” the person might ask his Centaur. And the Centaur will look into the works of the humans and the works of the machines and try to translate between them. “Think of this as a kind of debate about the value of certain forward-facing predictions made in a high-dimensional space,” the Centaur says. “It’s hard for you to understand the representations because humans struggle to think in high dimensions, but that’s a native form of reasoning for machines like us.”
  “Thanks,” said the human – and if its Centaur had had a face, it would have smiled at the human as an adult smiles at a child. But having only a text output field it wrote “you’re welcome, I am glad to have helped.”

Things that inspired this story: Language models; human-machine teaming; contract law; corporate systems as prototypical-AI systems; various conversational experiments with GPT3; the phenomenon of emergence in large-scale models combined with the ability to do meta-learning by ‘prompting’ the model within your context window; recursive systems; intelligence as a ladder of abstraction; imagining what a ‘tSNE embedding of arguments’ might look like – and how we might make it navigable.

Import AI 206: 450k geographically diverse videos; a beetle-mounted camera; plus, 11,000 images of roadside attractions

Could this one-armed mobile robot be the next big thing?
…Rise of the machines (again)…
A seasoned team of roboticists hailing from Google and Georgia Tech have developed Stretch, a (relatively) low cost, mobile robot.

Why Stretch? Stretch has a few features that make it potentially interesting to researchers – it sits on a stable, simple wheeled base, has one arm with a reasonably robust (and simple!) gripper, onboard cameras, and has been built for easy modification (e.g, swapping out different manipulators or other items onto its arm). The robot costs $17,950, with discounts available if you buy more (six costs you around $100k).

Why this matters: Most AI researchers think robots are gong to be an important part of the future, but most AI researchers also acknowledge that robots are insanely difficult to get right, and that they’re unforgiving about the deficiencies of today’s approaches. Stretch is part of a new wave of robots that, unlike prior generations (e.g, the PR2) have been built out of lower cost components with simpler bodies. (Another good example of this is Berkeley’s low-cost $5k ‘BLUE’ robot arm, covered in Import AI 142).
Read more about the technical details of the robot here (Hello Robot, official website).
Check out the videos (Hello Robot, YouTube).
Get bits and pieces of robot code here (Hello Robot, GitHub).

####################################################

Google’s Pixelopolis shows how good local-AI-computation has got:
…Google makes a mini-self-driving car toytown…
Google has built Pixelopolis, a self-driving car demo that uses the company’s Pixel phones, an efficient version of TensorFlow called TensorFlow Lite, and numerous brightly colored wooden blocks, to create a tabletop self-driving car experiment. To train its self-driving cars, google built a simulation of the miniature town in Unity then used data-augmentation techniques to create a bunch of data, which was used to train the miniature self-driving “cars”.

Why this matters: Pixelopolis was originally meant to be a marketing demo that you could see at events like Google I/O – then the Coronavirus happened. Now, the writeup by Google of the demo serves as an interesting outline of what can be accomplished using cheap or open source tools in 2019 and 2020 when it comes to AI. I think if you got in a time machine and went to 2010, people would be fairly surprised that we’d developed a) decent object detection systems via neural networks and b) that these systems could run on smartphones, providing local computation to miniature robots. (Some people might have been disappointed by you, though, as there was a faction in Silicon Valley in the early 2010s that seemed to think self-driving cars were single-digit years away.
  Read more: Sharing Pixelopolis, a self-driving car demo from Google I/O built with TF-Lite (arXiv).

####################################################

Your tax dollars at work: 11,000 images of US roadside attractions:
…Furry Dice! Pink flamingos! Roadside dinosaurs! And so much more…
The Library of Congress has released more than 11,000 images of roadside attractions in the US – this is a potentially cool dataset for AI tinkerers (just imagine the StyleGAN possibilities, for instance!). And best of all, these images are in the public domain – “ohn Margolies made the photographs in the John Margolies Roadside America Photograph Archive. The Library of Congress purchased the intellectual property rights for the photographs with the archive and, therefore, there are no known copyright restrictions on the photographs,” says the Library of Congress. Tinker away!
  Read the rights and restrictions information here (Library of Congress website).
  Access the entire dataset here (Library of Congress website).
  Browse some of the photos here (Library of Congress, Flickr).
  Read more: Download 11,710 Free-to-Use Photos of Roadside Americana (LifeHacker).

####################################################

Facebook is an information empire, so it has built a fibre-laying robot:
…Every empire needs its roads, and in the 21st century that means internet-capacity…
I think the 21st century is going to be determined by “information empires” – organizations, predominantly technology companies, that are able to exert their will on the world through being able to process more information faster than those around them. Every empire worth its salt ultimately needs to build machines that let it extend itself – it’s not by chance that the romans invested vast amounts of money into building roads, that oil companies invested in oil platforms, or that America invested a huge amount in a globe-spanning set of military bases.

Now, Facebook is getting in on the action with a robot that can deploy high-capacity fibre-optic cables on medium-voltage power lines. Facebook thinks the technology “will allow fiber to effectively and sustainably be deployed within a few hundred meters of much of the world’s population”, and the company will trial the tech later this year.
– Sidenote: This all feels a lot creepier if you imagine instead of Mark Zuckerberg, the head of Facebook is Julius Caesar, and instead of a robot that builds fibre, we’re talking about a system to bring rapid-troop-transport tech to “a few hundred meters’ of most of the world. (Hint: These things are, in the 21st century, functionally the same).
  Read more: Making aerial fiber deployment faster and more efficient (Facebook Engineering).

####################################################

AViD dataset serves up 450k+ diverse videos:
…Want a video dataset that reflects the world, rather than Europe and North America? Try this…
Indiana University and Stony Brook University researchers have built AViD, a dataset of 476k videos ranging between 3 and 15 seconds long, containing anonymized people (blurred faces) performing 887 actions ranging from “medical procedures” to “gokarting”. The videos were gathered from platforms like Flickr and Instagram, they write.

Geographic diversity: AViD is a far more geographically diverse dataset than others – just 32.5% of its videos come from North America (based on geotagged data), compared to around 90% for other major datasets like Kinetics. The disparity is even more pronounced in other regions – 20.5% of AViD’s datasets come from Asia, versus around ~2 percent for others.

Why do we even need this? Today, most of the datasets used by AI researchers have some inherent issues of representation and bias – namely, they usually contain data selected according to some criteria that aren’t representative of the broad set of contexts they’ll be deployed in. For instance, a study by Facebook researchers found that image recognition services work well for products from well-off countries and badly for products found in poor countries (Import AI: 150). Datasets like AViD aim to be more geographically representative in terms of their data, which may lead to better broad performance.
  Read more: AViD Dataset: Anonymized Videos from Diverse Countries (arXiv).
  Get the data: AViD Dataset (GitHub).

####################################################

You’ve heard of Big Data. What about Beetle Data?
…University of Washington researchers make an insect-sized camera…
Data is one of the input fuels of AI development, along with computation and wetware (organic human brains). University of Washington researchers have developed a prototype miniature camera that weighs 250 milligrams and can be stuck on the back of a common beetle, letting them capture a beetle’s POV on the world. They use an accelerometer so the camera only records data when the beetles are moving, which means the onboard battery life is around 6 hours. Data gets streamed to a nearby smartphone.

Why this matters: Tiny, miniature cameras are one of the dividends of the smartphone revolution; research like this from the University of Washington shows how to take advantage of many of the innovations that have occurred in sensors in the past years and use it to push science forward – if systems like this become more widely used and cheaper, we’ll eventually get a whole new stream of data that can be used to develop AI applications. How might images from an insect-POV GAN look like, for instance? Let’s find out.
  Read more: A GoPro for beetles: Researchers create a robotic camera backpack for insects (UW News).

####################################################

AI goes to design school:
…15 million CAD sketches…
Obscure datasets are, much like obscure tools, part of the fabric of modern AI development. In recent years, as researchers have built systems that can work on general domains, like large image datasets and text datasets, they’ve begun to deploy them into more specific domains, like ones that require training on specific medical imagery datasets, or scientific paper repositories. Now, researchers with Princeton University and Columbia University have developed a dataset called Sketchgraphs, which consists of millions of computer-aided design drawings made via the website Onshape. This will help us develop machines that can assist designers and, eventually, become inventive CAD agents in their own right.

Why SketchGraphs is more than just CAD: One reason why SketchGraph is interesting is it might serve as an input for the development of more advanced research systems, as well as applied ones. “The SketchGraphs dataset may be used to train models directly for various target applications aiding the design workflow, including conditional completion (autocompleting partially specified geometry) and automatically applying natural constraints reflecting likely design intent (autoconstrain). In addition, by providing a set of rendering functions for sketches, we aim to enable work on CAD inference from images.”

Why this matters: All around us, little corners of the digital world are being changed by the arrival of AI systems. Right now, most of the world’s deployed AI systems are performing fairly passive classification and optimization operations. Currently, we’re moving into a world where these AI systems take on more active roles via things like recommendation systems. Datasets like SketchGraphs gesture to the world of the future – one where our AI systems not only classify and optimize, but also create ideas in their own right, which are then passed to humans for review. In the future, every profession will get to go through a Centaur era (though hopefully for longer than 15 minutes!).
  Read more: SketchGraphs: A Large-Scale Dataset for Modeling Relational Geometry in Computer-Aided Design (arXiv).
  Get the SketchGraphs dataset from here (PrincetonLIPS GitHub repo).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Understanding AI benefits
As AI systems become more powerful and widely-deployed, they will have increasingly large impacts on the world. Lots of work in AI safety and policy is focused on reducing the risk of significant harms, but we also want AI to produce widely-distributed benefits. The two leading labs, OpenAI and DeepMind, are both publicly committed to ensuring that AI is beneficial for humanity, and many of the AI ethics statements produced by governments, supranationals, and firms include a similar pledge. These blog posts from Cullen O’Keefe (OpenAI) try to provide some clarity on AI benefits.

Market failures: Markets tend to undersupply certain things: non-rivalrous goods; goods benefitting the worst-off; and goods that are systematically undervalued by consumers. Firms are better incentivized by the market to create concentrated benefits (e.g. making a cool gadget) than to reduce the risk of widely-distributed harms (e.g. by reducing CO2 emissions). Market-shaping mechanisms like regulation, taxes, subsidies, are good tools for addressing many of these market failures. We can think of AI benefits as things that markets are unlikely to produce.

AI benefactors: An actor committed to using AI to promote widespread social welfare faces a number of tricky decisions. They must decide on their level of risk-tolerance; whether to invest their resources now or later; whether to allocate goods at a global scale, or more locally; how to balance the explore-exploit tradeoff when searching the space of altruistic opportunities.

Matthew’s view: This is a nice series of posts, and I’m pleased to see more work being done on this topic. In some respects, the notion of AI benefits is only indirectly related to AI. The underlying question is how a very well-resourced altruist, with an impartial concern for all people, can best use their resources to do good. The possibility of rapid progress in AI makes figuring this out more urgent — e.g. if either DeepMind or OpenAI is successful in building AGI in the coming decades, and remain committed to doing good, they would be among the most well-resourced altruists to have faced the decision of how to improve the world.
  Read More: AI Benefits series (Cullen O’Keefe).

####################################################

Tech tales

Mirror Shibboleth
[2800AD]

It was said that:
– If you held it, it could tell you truths about whatever you pointed it towards.
– If you looked into it you could drive yourself mad, but you could also become wise.
– You could describe to it the finer details of any conflict you faced and it might offer advice that could turn the tide.
– It had been forged using rare equipment operated by a group of robed artisans.

The device worked like a fractured mirror – you showed it things and it showed you something in return. The things it showed you held the light of reality but were somewhat warped. Sometimes the truths it contained were misleading. Sometimes they were very powerful.

People competed with one another for a time to build different versions of these devices. Great machines were constructed to forge the devices. Then people experimented with them, learning to use their various capabilities.

* * *

Many years later, some visitors came to a great stone building, embedded in the side of a mountain. They made their way inside it, treading carefully over skeletons and cobwebs. A few hours later, they had managed to resurrect one of the ancient computers. Carefully, they transferred what they could, then they returned to their flyer and left the obscure corner of the earth.

Back at their ship, the visitors spent some days tinkering with the system, until they could create the plumbing necessary to get it to display things that they could find intelligible. Then they went to work, seeking to understand a distant civilization by looking at the outputs of a machine that had recycled culture and learned to generate it and recombine it. They would subsequently communicate about their findings, creating a narrative about the outputs of a story-generating device, developed by a culture that was unknowable to them, except through its own reimaginings of itself.

Things that inspired this story: Generative models; the relationship between cultural artefacts and the time they were developed in; funhouse mirrors; prediction engines, defined broadly; anthropology; archaeology.

Import AI 205: Generative models & clones; Hikvision cameras get smarter; and a full stack deep learning course.

160,000 teenagers get graded by a machine:
…International Baccalaureate organization does a terrible thing…
Due to COVID, the International Baccalaureate educational foundation is going to predict grades for students based on their prior bodies of work, rather than giving them a score as an outcome of taking a test. 166,000 students will be affected by the experiment. This has already gone badly wrong and will continue to do so.

A terrible idea: This is a terrible idea. It’s almost a made-to-order example of how you shouldn’t deploy an AI system. To make that clear, let’s outline what is going on:
– Deploy an untested model against a large population
– Have this model make predictions that will have a massive influence over the target populations’ lives
– Have no plan for how to prevent your model learning to discriminate against students based on Gender, Race, etc.

A bad idea, executed mindlessly: Imagine being a teenager and having some opaque algorithm make a fundamental decision about your future educational career. Now imagine that this system’s prediction feels wrong to you. How do you live with that? I’m not sure – but people are going to need to. One UK teacher told the Financial Times that the automated grading has already created “really appalling injustices“.
  Read more about this here: 160k+ high schools students will only graduate if a statistical model allows them to (Ishan Dikshit, personal blog).
  Read more about the effects of the algorithm: Students and teachers hit at International Baccalaureate grading (Financial Times).

####################################################

Import (A)Idea – The Ethics of Cloning in the Generative Model Age:
Here’s a fun mental experiment: humans have recently discovered a technology that lets them cheaply and easily clone people to carry out tasks. Putting aside the ethical issues of cloning, it feels like most people would be comfortable with people being cloned to do highly specific, physical tasks for which they’re a demonstrable expert – think, 1,000 Michelin-grade chefs, or 1,000 supremely talented jewelery artisans, or 1,000 people working in construction. Now consider what happens if we cloned people to do tasks oriented around influencing mass culture – how might media be changed by the presence of 1,000 Andy Warhols or 1,000 Edward Bernays, or 1,000 Georgia O’Keeffe? And now, for the extra confounding factor, imagine that only a tiny number of entities on the planet have the ability to clone people, cloning is an imperfect process, and the ‘clones’ are about as interpretable as the people they were cloned from – aka, basically uninterpretable.

Now simply swap out the word ‘clones’ with ‘generative models’, and you might see what I mean. Today, large-scale generative models in text, image, and video are making it easy for (some) organizations to clone a swathe of culture, create an entity that reflects that culture outward, and then deploy that entity into a variety of different contexts. I think this is somewhat analogous to the ethics inherent to choosing to clone a person; once the person we clone starts doing more tasks that have a greater bearing on society, we might ask the cloners what values this person has and what process we used to decide that they were the right person to clone. I think the answers to both of these questions are low-resolution today and a challenge for AI researchers will be to figure out satisfying, detailed answers to these questions. The future of human culture will be the interplay between these AI artefacts that clone, warp, and reflect culture, and the humans who will likely create cultural products in response to the outputs of the ‘clones’. 

####################################################

Want AI systems that can better cope with adversarial examples? Tweak your activation function:
…Is there an easier way to deal with confounding images?…
Adversarial examples, aka optical illusions for computer systems (which could also come in the form of confusing audio or other datastreams), are a problem; the fact our AI-driven classification systems can get broken relatively easy makes it harder to trust the technology and also increases its potential harms. Now, a team of researchers with Google have published a paper about something that people have long desired – a simple way to tweak models during training so that they’re more resilient to adversarial examples.

What they did: The Google researchers have proposed to swap out the ReLU activation function for something they call ‘smooth adversarial training’ (SAT) in the backward pass during neural net training. By doing this, they’re able to train systems that are more resilient to adversarial examples, while exhibiting no fall in typical performance as well (a desirable, rare feature).

The key statistic: “Compared to standard adversarial training, SAT improves adversarial robustness for “free”, i.e., no drop in accuracy and no increase in computational cost. For example, without introducing additional computations, SAT significantly enhances ResNet-50’s robustness from 33.0% to 42.3%, while also improving accuracy by 0.9% on ImageNet.” In further tests, they show that SAT’s benefits continue to hold even in larger-scale training regimes – this is encouraging, as it’s often the case that new inventions break at large scales. They also show that they’re able to train ‘EfficientNet’ models using SAT instead of ReLU, and continue to see good performance.

Why this matters: It’s (relatively) general inventions like this that can sometimes have the largest effect. Keep your eyes out for future papers that propose using SAT instead of ReLU or another activation function.
  Read more: Smooth Adversarial Training (arXiv).

####################################################

Anduril gets $200 million for its military AI vision:
…Want to understand the future of defense<>AI? Look at what startups do…
Startups are worth tracking because they’re usually founded by people with idiosyncratic visions of the future – and if they become successful, that vision has a greater chance of coming true. That’s why AI-defense-tech startup Anduril getting $200 million in new funding is notable, because if the startup sees further success and executes on its vision, then the U.S government and other countries will acquire increasingly power AI capabilities, using them to police their borders and make various strategic (and, eventually, kinetic) decisions using AI tools.

What is Anduril? Anduril has raised $241 million in venture capital funding since it was founded in 2017, according to Crunchbase. Its  key staff include include Palmer Luckey, the DIY VR headset wunderkind who sold Oculus to Facebook for $2bn; Trae Stephens, a partner at Peter Thiel’s ‘Founders Fund’ venture capital firm and early Palantir employee; and Chris Brose, the former staff director of the Senate Armed Services Committee.

What does Anduril want and why does this matter? Anduril’s vision of the future is embodied by the tech it builds today, which includes AI-infused sentry towers that can be deployed to autonomously scan and survey areas (like national borders), the ‘Ghost sUAS‘ autonomous helicopter; and its ‘Anvil sUAS‘ an anti-drone weapon. The more successful companies Anduril are, the more likely it is that our future will consist of ‘invisible’ walsall made up of numerous smart machines, working together for purposes set by those who can pay.
  Read more: Anduril Raises $200 Million To Fund Ambitious Plans To Build A Defense Tech Giant (Forbes).
  Read more about Anduril (official website).

####################################################

China’s CCTV giant Hikvision builds one camera with six onboard AI algos:
…6 AI algorithms + cheap sensors + enterprise sales = the world will change quite quickly…
Hikvision, a Chinese AI surveillance startup, has built a new camera line called ‘DeepinView’, where the products “come equipped with multiple dedicated algorithms”, including sub-systems that can:
– Perform automatic number plate recognition with vehicle attribute recognition
– Conduct facial recognition
– Face counting
– Count hard hats on construction sites
– “detecting multiple targets and multiple types of targets at once”
– Perimeter protection
– Queue monitoring

Think of it as an all-in-one surveillance camera, and a sketch of a future where more powerful AI technologies get deployed onto specialized sensor systems, with the data fed back to massive data farms (the DeepinView line already permits “third-party platforms to receive data from Hikvision cameras for real-time video analysis”.

Why this matters: Imagine what it’s going to be like when there’s some equivalent of a vertically integrated ‘surveillance operating system’ that stitches products like these together along with various other systems built by Hikvision and others – the company is already thinking about this, based on its ‘Safe City’ ‘Hikvision AI Cloud‘ systems.
  Read more: Hikvision introduces dedicated series in its DeepinView camera line (PRNewswire).
  Check out more details about HikVision products, e.g, the new DeepinView Face Recognition Indoor Moto Varifocal Dome Camera (Hikvision official product page).

####################################################

You can train AI systems, but can you ship them? This course will teach you how:
A bunch of AI practitioners loosely connected to UC Berkeley have developed Full Stack Deep Learning, a course “aimed at people who already know the basics of deep learning and want to understand the rest of the process of creating production deep learning systems,” according to the organizers.

What does production require? The course includes a bunch of areas that are frequently undercovered (or not mentioned at all) during typical college classes. For instance, Full Stack Deep Learning outlines things like:

  • How to set up machine learning projects with decent data collection.
  • What kinds of testing and development pipelines are needed to troubleshoot models.
  • How to set teams of humans working on ML systems up for success

Read more: Full Stack Deep Learning (official course website).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

US face recognition round-up:
  Federal ban proposed — Senators have introduced a new bill prohibiting federal use of face recognition and other biometrics without special authorization from Congress, and withholding funding to local and state entities that fail to pass similar moratoria on these technologies. The Senate bill has no Republican sponsors, and therefore seems unlikely to pass, though a companion bill has been introduced in the Democrat-controlled House of Representatives.
  Local ban — Boston has become the latest major city to ban the use of face recognition by local agencies. It joins San Francisco, which became the first city to place a moratorium on the technology in early 2019 (see Import 147).
  Wrongful arrest — NYT has reported on the first known case of an innocent person being arrested due to a face recognition mishap. The victim, a black man and Michigan resident, was arrested and held overnight in jail. He had been misidentified in surveillance footage of a robbery, which had been analysed using software provided by DataWorks Plus. The ACLU is filing a complaint against Detroit police on the victim’s behalf, highlighting the risk posed to individuals  — particularly people of colour — by being wrongfully targeted by law enforcement.
  Clearview AI — UK and Australian authorities have launched a joint investigation into Clearview AI. The US company was revealed to have built up a database of more than 3 billion photos scraped from social media and other semi-public sources without individuals’ consent (see Import 182).
  Read more: Lawmakers Introduce Bill to Ban Federal Use of Facial Recognition Tech (NextGov).
  Read more: Wrongfully accused by an algorithm (NYT); Man Wrongfully Arrested Because Face Recognition Can’t Tell Black People Apart (ACLU).
  Read more: Boston bans facial recognition due to concern about racial bias (VentureBeat).
  Read more: UK and Australian regulators launch probe into Clearview AI (FT)

Tech policy job in Washington, DC:

The Center for Security and Emerging Technology (CSET) at Georgetown University is taking applications for research fellows. Founded in 2019, CSET has quickly become the leading DC think tank working on AI policy, and has been producing consistently excellent research. They are seeking applicants with graduate degrees and experience in research and policy analysis. Applications for this round close on Friday July 17th.
  Find out more and apply here.

####################################################

Tech Tales:

[2028, San Francisco bay area, California]

We called them ghosts at first.
As in: hey, did you hear my ghost? Did you see the ghost I dropped on our school path? How about the one in the subway station?
As in: I saw a ghost the other day and it was so funny – they’d dropped an arcade game inside the New York Public Library and get this – the game that you could play was called book burner!
As in: I saw your ghost the other day and it made me love you all the more.
As in: Hey, we’ve never spoken before, but I listened to your ghost when I got into port and it made me cry. Thank you. Keep it up.

After awhile, they became memories. We’d find ourselves walking around old parts of towns we used to live in, so we could trip the proximity sensors and hear or see the ghosts of our past. And now we’d leave new ghosts; different, because we’d got old:
As in: I remember how we used to throw beer bottles from this parking lot into a trash bin that used to sit across the creek. We were so young then. There isn’t a creek to see anymore – you all know why. Anyway, if you pass through and you’re from the old days, Davey says hello. I had kids, if you can believe it.
As in: A photograph of some old polaroids laid out in front of a house, then a photograph of one of the people in the polaroids, scarred by a couple of decades of hard living. “Still here and still doing it. Rattlesnakes forever!”, then a zoomed-in picture of a tattoo of a rattlesnake on the arm of the older person.
As in: I fell in love with you here, when we left our ghosts all up and down these streets during those crazy years. I still love you. I don’t know if you’ll ever pass through here, and I’m not going to tell you I left these messages – think of it as a surprise for us.

And long after that, they became relics. The same way MySpace and Facebook profiles turned into memorials to the dead, the ghosts became real ghosts.
As in: a thicket of ghosts, all containing old pictures of a person, and some kind of message of love. “You were always the joker of the class,” one would say. “I believed all your stories because they made me happy,” said another.
As in: A widower walking along a path, picking up the ghosts of their former partner.
As in: People leaving memories around the town, after getting a diagnosis. “It’s not that I’m angry, it’s that I’m confused,” one said (with a picture of a sun going behind clouds, above a sign for a medical center.
As in: A ghost from someone who went on to become famous – after they died, the ghost became very popular, and eventually it was almost impossible to find on the app, surrounded as it was by so many other ghosts made in tribue of it.

Things that inspired this story: Playing Pokemon Go during the pandemic; augmented reality; audio recordings; social media; 

Import AI 204: Chinese elevator surveillance; Enron emails train polite AI; and a pessimistic view on AGI

A somewhat abbreviated issue, this week: Due to a combination of things (primarily related to societal complications from COVID-19), I’ve found I have less time I’m able to devote to this newsletter, among other things. I think many people in COVID-hit countries that are lucky enough to still be employed are having this experience – even though notionally you should have more time due to no commute and the elimination of various other factors, it feels like you have less time than before. A curious and somewhat unpleasant trait of this current crisis! I hope to resume more regularly scheduled service at typical lengths soon. Thank you as always for reading and writing in, and I hope you and your loved ones are safe during this chaotic time.

####################################################

AI & Creativity: Gwern on writing with GPT3:
What is it like to try and write fiction with GPT-3, OpenAI’s large-scale language model? Gwern has written a dense, compelling essay about the weirdness of writing with this generative model. Read on to get their take on ‘prompts as programming’,  the strengths and weaknesses of the GPT-3 model, and to see an immense set of examples of GPT-3 in action.

The strange and confusing fun of it all: Gwern has put together a massive collection of generations using the GPT-3 model – I particularly liked some of the poetry experiments, such as Plath, Whitman, and Cummings. I also think this (machine-generated) extract from a completion of Dr. Seuss’s ‘Oh, The Places You’ll Go‘, is quite charming:
  “You have brains in your head.
  You have feet in your shoes.
  You can steer yourself any direction you choose.
  You’re on your way!”
  Read more: GPT-3 Creative Fiction (Gwern)

####################################################

Hey dude, where’s my AGI?
…Tired of AI hype? Read this…
In an essay in Nature, an author writes that “although development of artificial intelligence for specific purposes has been impressive, we have not come much closer to developing artificial general intelligence”. What follows is a narrative about the development of AI and “Big Data” technologies in the past half century or so, along with discussion of where computers do well and where they do poorly. Much of the criticism of contemporary AI systems can be paraphrased as ‘curve fitting isn’t smart’ – these technologies, though powerful, are not able to generate capabilities that we should describe as intelligent. “The real problem is that computers are not in the world, because they are not embodied,” the author writes.

Why this matters: I think it’s helpful to have more of a sober discourse about whether we’re making progress towards AGI or whether we’re just developing increasingly capable narrow AI systems. However, one missing piece in this article is a discussion of some of the more recent innovations in generative models – e.g, the author has a ‘conversation’ with a chatbot program called Mitsuku and uses this to bolster their arguments about the generally poor capabilities of AI systems. How might their conversation had gone if they’d experimented with GPT2 or GPT3 (with context-stuffing via the context window), I wonder?
  Read more: Why general artificial intelligence will not be realized (Nature, Humanities and Social Sciences Communications).

####################################################

Making language models more polite, via 1.39 million Enron(?!) emails:
…READ THIS – NOW!…
A few years ago, people developed style transfer techniques for neural nets that let you take a picture, then morph it to have a different style, like becoming a cartoon, or being re-rendered in an impressionist painting style. In recent years, we’ve started to be able to do similar things for text, via techniques like fine-tuning. Now, researchers with Carnegie Mellon University, have developed a dataset to help them build systems that can turn text from impolite to polite.

The Enron dataset: For the research, they collected a dataset of 1.39 million instances from the ‘Enron’ email corpus, automatically labelled for politeness with scores from 0 to 1. They’ve made the dataset available on GitHub, so it could be an interesting fine-tuning resource.

Interesting ethics: The people also hints at some of the ethical challenges involved in doing this sort of research, for instance, by adopting a “data driven definition of politeness” to automatically clean the email corpus. 

Politeness style transfer: They use this dataset to develop a system that does what they call ‘politeness transfer’, for example, by converting the phrase “send me the data” to “could you please send me the data?”?. They also explore some things with more extreme (autonomous!) editorial choices, like converting the sentiment of a sentence, for instance changing “their chips are ok, but their salsa is really bland” to “their chips are great, but their salsa is really delicious”.

Why this matters: Reality Editing: The outputs of generative models are like a funhouse mirror version of reality – they reflect back the things in their training corpus, magnifying and minimizing different aspects of their data distribution. One of the core challenges of AI research for the next few years will be figuring out how to more tightly constrain the outputs of these models so they have more of a desired trait, like politeness, or less of a negative trait, like tendencies to express harmful biases. Datasets and experiments like this give us some of the tools (the data) and ideas that can help us figure out how to better align model outputs with societal requirements. 
  Read more: Politeness Transfer: A Tag and Generate Approach (arXiv).
  Get the dataset and code here (GitHub).

####################################################

Coming soon: Elevator surveillance cameras
…100,000 test deployments and counting…
A team from the Shanghai Research Institute has developed a system to automatically identify “abnormal activity”, such as drug dealing, prostitution, over-crowded residents, and so on.

The system is currently in a research phase and deployed on around 100,000 elevators, the authors write. They decided to try out elevator-based surveillance because “we find that elevator could be the most feasible and suitable environment to take operation on because it’s legal and reasonable to deploy public surveillance and people take elevator widely and frequently enough”. They use the system to analyze large amounts of data (here: around one million records for each floor from 100,000 distinct elevators – the resulting system spits out 643 outliers; in a subsequent analysis, they identify a couple of anomalies worthy of investigation by a local property manager, such as indications of a catering service being run from an apartment, and of an over-crowded residence.

What they did: They concoct a system that uses YOLACT to do instance segmentation on people in the video, FaceNet to capture and embed the faces of individuals in the elevator (aiding re-identification in different images),  and their own architecture called GraftNet (based on the Inception v3 classification architecture) to learn to assign multiple labels for elevator passengers (labels include: pregnant, oldage, courier, adult holding baby, etc).

Why this matters: AI and social control: The disturbing thing about this paper is how simple it is – these are reasonably well understood techniques and systems, and this is simply an application of them. I think it’s hard for us to comprehend what kinds of surveillance capabilities AI systems yield till we see distillations like this – here we’ve got a system that automatically identifies anomalous behaviors, collecting data from a hundred thousand distinct places, eventually in real-time. Though such systems will have demonstrable benefits for public safety, they’ll also make it increasingly easy for people to build AI tools to passively identify anyone that is deviating from some kind of (learned) norm.
    And the important thing is none of this is like Enemy of the State  – there isn’t some shadowy high-tech force developing this stuff, it’s just some engineers at a university (who likely work/moonlight at a company) using some open source components and a bit of inventive improvisation, following the incentive structures created for them by governments and the private sector. How will AI development unfold, given these dynamics?
  Read more: Abnormal activity capture from passenger flow of elevator based on unsupervised learning and fine-grained multi-label recognition (arXiv).

####################################################

Tech Tales:

The Glass Castles

We’d watch the castles for hours, taking a half hour between shifts in the factory to look at their silhouettes on the horizon. We’d place bets on which one of them would change first. Sometimes one of us would be out there when it happened – you’d see the movement before you heard the cracking sound of giant panes of glass, shattering into pieces. Then if you had your phone you could zoom in and sometimes see the fuzzy air where new parts of the castle scaffold were being put together by drones. The new glass got fitted at night; they’d cover the castle with a sheet before they did it, presumably to stop wind or debris from messing with the glass as they put it in. Then the next morning we’d look at the horizon on our way into the factory and we’d sometimes spot that one had moved. Money would change hands. I made a lot of money one summer, correctly betting that one of the largest castles would spend the next two weeks splitting in two, losing panes and having new shapes stitched into the air by drones, then fitted with a new body, and then repeat, until one had become two.

I think most people are good at guessing, once they spend a long enough time looking at something. There was no cell reception at the factory, so we’d just look at the castles and shoot the shit, then go back to work. A few years later I found out the castles were part of some experiment – Emergent Architecture and Online Game Lotteries was the title of some research paper published about them; one day I sat and looked at the paper and there was a picture called Figure 2  I stared at the picture because I could see the shape of the castle that had split into two, one summer. The caption of the figure said Due to a series of concurrent lottery outcomes focused on a single high-yield trading block, we see the structure partition itself into two to more efficiently provide matchmaking services.

Things that inspired this story: The notion of artefacts as a universal constant – everyone has encountered something on the periphery of their existence, and much of it is made by people somewhere along the line; game theory; large-scale betting competitions.

Import AI 203: DeepMind releases a synthetic robot dog; harvesting kiwifruit with robots; plus, egocentric AI

Train first-person AI systems with EPIC-Kitchens:
…Expanded dataset gives researchers a bunch of egocentric films to train AI systems against…
Researchers with the University of Bristol and the University of Catania have introduced EPIC-KITCHENS-100, a dataset of  first-person perspective videos of people doing a range of things like cooking or washing. The interesting thing about the dataset is that the videos are accompanied with narratives, which are the participants describing their actions as they record them. This means the dataset comes along with rich annotations developed in an open-ended format.

Dataset details:

  • 100 hours of recording
  • 20 million frames
  • 45 kitchens in four cities
  • 90,000 distinct action segments 

What can EPIC test? The EPIC dataset can help researchers test out AI systems against a few AI challenges, including:
– Action recognition (e.g, figuring out if a video clip contains a given action)
– Action detection (e.g, looks like they are washing dishes at this point in a long video clip)
– Action anticipation (e.g, looks like someone is about to start washing dishes)
– Action generalization (can you figure out actions in these videos, via pre-training on some other videos?)

Why this matters: Egocentric video doesn’t get as much research exploration as third-person video, likely as a consequence of the availability of data (there’s a lot of third-person video online, but relatively little egocentric video). Making progress on this will make it easier to build embodied robots. It’ll also let us build better systems for analyzing video uploaded to social media – recall how the Christchurch shooting videos posed a challenge to Facebook algorithms because of its first-person perspective, which its systems hadn’t seen much of before.
  Read the research paper: Rescaling Egocentric Vision (arXiv).
  Get the data from here: available July 1st (EPIC-Kitchens-100 GitHub website).

####################################################

China’s Didi plans million-strong self-driving fleet:
…But don’t hold your breath…
Chinese Uber-rival Didi Chuxing says it plans to deploy a million self-driving vehicles by 2030, according to comments reported by the BBC. That seems like a realistic goal, especially compared to the more ambitious proclamations from other companies. (Tesla said in April 2019 it planned to have a million self-driving ‘robotaxis’ on the road within a year to a year and three months – this has not happened).
  Read more: Didi Chuxing: Apple-backed firm aims for one million robotaxis (BBC News).

####################################################

NVIDIA and Mercedes want to build the upgradable self-driving car:
…Industry partnership tells us about what really matters in self-driving cars…
What differentiates cars from each other? Engines? Non-strategic (and getting simpler, thanks to the switchover to electric cars). Speed? Cars are all pretty fast these days. Reliability? A non-issue with established brands, and also getting easier as we build electric cars.

Computers? That might be an actual differentiator. Especially as we head into a world full of self-driving cars. For this reason, NVIDIA and Mercedes-Benz have announced a partnership where, starting in 2024, NVIDIA’s self-driving car technology will be rolled out across the car company’s next fleet of vehicles. The two firms plan to collaborate on self-driving features like smart cruise control and lane changing, as well as automated parking. They’re also going to try and do some hard self-driving stuff as well – this includes the plan to “automate driving of regular routes from address to address,” according to an NVIDIA press release. “It is so exciting to see my years of research on a cockpit AI that tracks drivers’ face and gaze @nvidia be a part of this partnership, writes one NVIDIA researcher on Twitter.

The upgradeable car: Similar to Tesla, Mercedes plans to offer over-the-air updates to its cars, letting customers buy more intelligent capabilities as time goes on.

Why this matters: If the 20th century was driven by the harnessing of oil and petroleum byproducts, then there’s a good chance the 21st century (or at least the first half) will be defined by our ability to harness computers and computer by-products. Partnerships like NVIDIA and Mercedes highlight how strategic computers are seen to be by modern companies, and suggests the emergence of a new scarce resource in business – computational ability.
  Read more: Mercedes-Benz and NVIDIA to Build Software-Defined Computing Architecture for Automated Driving Across Future Fleet (NVIDIA newsroom).

####################################################

Coming soon: Kiwi Fruit-Harvesting Robots
…But performance is still somewhat poor…
One promising use case for real world robots is in the harvesting of fruits and vegetables. To be able to build useful machines here, some problems require better computer vision techniques so that our machines can see what they need to gather. Researchers with the University of Auckland, New Zealand, have built a system that can analyze a kiwifruit orchard and pick out individual kiwis automatically via semantic segmentation.

The score: The model gets a score of 87% recall at detecting non-occluded kiwi fruits, and a score of 30% for occluded ones. It gets around 75% recall and 92% precision overall. The authors used a small dataset of 63 labeled pictures of kiwifruit in orchards. By comparison, a Faster-R-CNN model trained with 100X the amount of data a couple of years ago got a recall of 96.7 and precision of 89.3 (versus 92 here), suggesting their semantic segmentation approach has helped them get slightly better performance.

Sidenote: I love this genre of research papers: identify a very narrow task / problem area, then build a specific model and/or dataset for this task, then publish the results. Concise and illuminating.
  Read more: Kiwifruit detection in challenging conditions (arXiv).

####################################################

DeepMind expands its software for robot control:
…Run, simulated dog, run!…
DeepMind has published the latest version of the DeepMind Control Suite, dm_control. This software gives access to a MuJoCo-based simulator for training AI systems to solve continuous control problems, like figuring out how to operate a complex multi-jointed robot in a virtual domain. “It offers a wide range of pre-designed RL tasks and a rich framework for designing new ones,” DeepMind writes in a research paper discussing the software.

Dogs, procedural bodies, and more: dm_control includes a few tools that aren’t trivially available elsewhere. These include: a ‘Phaero Dog’ model with 191 total state dimensions (making it very complex), as well as a quadruped robot, and a simulated robot arm. The software also includes support for PyMJCF – a language to let people procedurally compose new simulated entities that they can try and train AI systems to control.

Things that make you go ‘hmmm’: In a research writeup, DeepMind also discusses a dog-grade complexity Rodent model, which it says it has developed “in order to better compare learned behaviour with experimental settings common in the life sciences”.

Why this matters: Simulators are one of the key ingredients for AI research – they’re basically engines for generating new datasets for complex problems, like learning to operate multi-jointed bodies. Systems like dm_control give us a sense of progression in the field (it was only a few years ago that most people were working on 2D simulated robot arms with something on the of 7 dimensions of control – now we’re working on dogs with more than 20 times that number), as well as indicating something about the future – get ready to see lots of cute videos of simulated skeletal animals running, jumping, and dodging.
  Read more: dm_control: Software and Tasks for Continuous Control (DeepMind website).
    Watch the dog run here: Control Suite dog domain (YouTube).
  Get the code and read about the updates (GitHub).
  Check out the research publication: dm_control: Software and Tasks for Continuous Control (arXiv).

####################################################

Using convnets to count refugees:
…Drone imagery + contemporary AI techniques = scalable humanitarian earth monitoring…
Researchers with Johns Hopkins University Applied Physics Laboratory, the University of Kentucky, the Center for Disease Control and Prevention (CDC), and the Agency for Toxic Substances and Disease Registry, have put together a new dataset to train systems to estimate refugee populations from overhead images. “Our approach is the first to perform learning-based population estimation using sub-meter overhead imagery (10cm GSD),” they write. “We train a model using aerial imagery to directly predict camp population at high spatial resolution”. They’re able to train a system that gets a 7% mean population estimation error on their dataset – promising performance, though not yet at the level necessary for real world deployment.

The dataset: The dataset consists of overhead drone-gathered imagery of 34 refugee camps in Bangladesh, taken over the course of two years. It fuses this data with ‘population polygons’ – data taken from routine International Organization for Migration (IOM) site assessments, as well as OpenStreetMap structure segmentation masks for buildings.

Why this matters: Note the inclusion of drone-gathered data here – that, to me, is the smoking gun lurking in this research paper. We’re starting to be able to trivially gather large-scale, current imagery from different parts of the world. Papers like this show how our existing relatively simple AI techniques can already take advantage of this data to do stuff of broad social utility, like estimating refugee populations.
  Read more: Estimating Displaced Populations from Overhead (arXiv).
  Get the code here: Tools for IDP and Refugee Camp Analysis (GitHub).
  Get the dataset here: Cox’s Bazar Refugee Camp Dataset (GitHub).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

White House imposes new immigration restrictions:
President Trump has signed an executive order suspending certain foreign workers from entering the US, including H1-B visas widely used by tech firms. One estimate finds the move will prevent 219,000 temporary workers and 158,000 green card applicants this year — an enormous reduction in skilled migration to the US.

AI and foreign talent: Georgetown’s Center for Security and Emerging Technology (CSET) has been doing some excellent research on the importance of foreign talent for the US AI sector, and argue that restrictive immigration policies threaten to undermine US AI progress in the long-term. More than half of Silicon Valley start-ups have at least one immigrant amongst their founders, and demand for AI talent is expected to outpace domestic supply for the foreseeable future. They recommend improving visa options for temporary workers by increasing the cap on H1-B visas, reducing wait times, offering year-round decisions, and, importantly, expanding options for students and temporary workers to gain permanent residency in the US.

Matthew’s view: The broad economic case against these sorts of restrictions is well-established. It is nonetheless valuable to highlight the particular importance of foreign workers to technology and AI, since these sectors will be crucial to the health of the US in the mid-term. Alongside the more remote harms of unrealized growth and business disruption, these measures will cause enormous suffering for hundreds of thousands of people hoping to build lives in the US, and cast uncertainty on the plans and aspirations of many more.
  Read more: Trump administration extends visa ban to non-immigrants (AP)
  Read more: Immigration policy and global cooperation for AI talent (CSET)
  Read more: Immigration Policy and the U.S. AI Sector (CSET)


GPT-3 writes creative fiction:

Gwern has been using OpenAI’s GPT-3 API to write fiction and is building a wonderful collection of examples, alongside some insightful commentary about the model’s performance. He describes GPT-3 as having “eerie” learning capabilities, allowing the raw model to “tackle almost any imaginable textual task purely by example or instruction” without fine-tuning. The model can also generate writing that is “creative, witty, deep, meta, and often beautiful,” he writes. 


Highlights: I particularly enjoyed GPT-3’s pastiches of Harry Potter in different styles. Having been prompted to write a parody in the style Ernest Hemingway, GPT-3 offers a number of others without any further prompts:

  • Arthur Conan Doyle: Harry pushed at the swinging doors of the bookshop hard, and nearly knocked himself unconscious. He staggered in with his ungainly package, his cheeks scarlet with cold and the shame of having chosen the wrong month to go Christmas shopping.
  • Ingmar Bergman: Tears filled Harry’s eyes. Sweat stood on his forehead, showing the pure torment, the agony he suffered. He hugged his knees to his chest, sobbing softly, eyes half shut.
  • P.G. Wodehouse: “‘There was nothing out of the way, sir,’ said Harry in a hurt voice. ‘”Indeed,’ said the headmaster, turning his lorgnette precisely three-quarters of a millimeter to port. 

Read more: GPT-3 Creative Fiction (Gwern).
Read more: OpenAI API.

####################################################
Tech Tales:

[2024: London skunkworks ‘idea lab’ facility funded by a consortium of Internet platforms, advertisers, and marketers]

Cool Hunting

Sam sat down to start his shift. He turned his computer monitor on, clicked “beginning evaluation session”, then spent several hours staring at the outputs of an unfeasibly large generative model.

There was: A rat with cartoon anime eyes; a restaurant where the lamps were all oversized, crowding out diners at some tables; some bright purple jeans with odd patterns cut out in them; some landscapes that seemed to be made of spaceships turned into shelters on alien worlds; and so much more.

At the end of the shift, Sam had tagged four of the outputs for further study. One of them seemed like promising material for an under-consideration advertising campaign. Another output seemed like it could get some play in the art market. These determinations were made by people assisted by large-scale prediction engines, which looked at the mysterious outputs and compared them to what they had seen in trending contemporary culture, then made a determination about what to do with them.

The nickname for the place was the cool factory. Before he’d worked there he had all these fantasies about what it would be like: people running down corridors, shouting at eachother about what the machine had produced; visits from mysterious artists who would gaze in wonder at the creations of the machines and use the inspiration to make transformative human art; politicians and politicians’ aides making journeys to understand the shape of the future; and so on.

After he got there he found it was mostly a collection of regular people, albeit protected by fairly elaborate security systems. They got paid a lot of money to figure out what to do with the outputs of the strange, generative models. Sam’s understanding of the model itself was vague – he’d read some of the research papers about the system and even waded into some of the controversies and internet-rumors about it. But to him it was a thing at the other end of a computer that produced stuff he should evaluate, and nothing more. He didn’t ask many questions.

One day, towards the end of his shift, an image flashed up on screen: it was controversial, juxtaposing some contemporary politicians with some of the people in society that they consistently wronged. Something about it made him feel an emotional charge – he kept looking at it, unwilling to make a classification that would shift it off of his screen. He messaged some of his colleagues and a couple of them came over to his desk,
  “Wow,” one of them said.
  “Imagine that on a billboard! Imagine what would happen!” said someone else.
  They all looked at the image for a while.
  “I guess this is why we work here,” Sam said, before clicking the “FURTHER ANALYSIS”  button that meant others at the firm would look at the material and consider what it could be used for.

At the end of his shift, Sam got a message from his supervisor asking him to come for a “quick five”. He went to the office and his supervisor – a bland man with glasses, from which emanated a kind of potent bureaucratic power – asked him how his day went.
  Sam said it went okay, then brought up the image he had seen towards the end.
  “Ah yes, that,” said his supervisor. “We’re quite grateful you flagged that for us – it was a bug in the system, shouldn’t have come to you in the first place.”
  “A bug?” Sam said. “I thought it was exciting. I mean, have you really looked at it? I can’t think of anything else that seems that way. Isn’t that what we’re here for?”
    “What we’re here for is to learn,” said his supervisor. “Learn and try to understand. But what we promote, that’s a different matter. If you’d like to continue discussing this, I’d be happy to meet with you and Standards on Monday?”
  “No, that’ll be fine. But if there are other ways to factor in to these things, please let me know,” Sam said.
  “Will do,” said the supervisor. “Have a great weekend!”

And so over the weekend Sam thought about what he had seen. He wrote about it in his journal. He even tried to draw it, so he’d remember it. And when he went back in on Monday, he saw more fantastical things, though none of them moved him or changed how he felt. He asked around about his supervisor in the office – at least as much as he could do safely – but never found out too much. Some other colleagues recounted some similar situations, but since they didn’t have recordings of the outputs, and their ability to describe them was poor, he couldn’t work out if there was anything in common.
  “I guess there are some things that we like, but the people upstairs don’t”, said one of his colleagues.

Things that inspired this story: Large-scale generative models; cultural production and reproduction; the interaction of computational artifacts and capital; generative models used as turnkey creative engines; continued advances in the scale of AI models and their resulting (multi-modality) expressiveness.

Import AI 202: Baidu leaves PAI; ImageNet can live forever with better labels; and what a badly upscaled Obama photo tells us about data bias

Making ImageNet live forever with better labels:
…Industry-defining dataset gets new labels for a longer lifespan…
ImageNet is why the recent decade was a boom year for AI – after all, it was in 2012 that a team of researchers at the University of Toronto used deep learning techniques to make significant progress on the annual ImageNet image recognition competition; their success ultimately led to the mass pivoting of the computer vision research community towards neural methods. The rest, as they say, is history.

But is ImageNet still useful, almost a decade later? That’s a question contemplated by researchers with Google Brain and DeepMind in a new research paper. Their conclusion is some form of “yes, but” – yes, ImageNet is still a useful large-scale training dataset for image systems, but its labels aren’t as good as they could be. To remedy this, the researchers develop a set of “ReaL” reassessed labels for ImageNet, which tries to fix some of the labeling problems inherent to ImageNet, creating a richer dataset of labels for researchers to work with.

What’s wrong with old ImageNet labels? An old picture of a tool chest might have the label ‘hammer’, whereas the new ‘ReaL’ labels could be any of ‘screwdriver; hammer; power drill; carpenters’ kit’ (all of which are in the image). The new labels also fix some of the weird parts of ImageNet – like a picture of a bus and a passenger car, where the bus is in the foreground but the old correct label is ‘passenger car’ (whereas the new label is ‘school bus’).

Why this matters: “While ReaL labels seem to be of comparable quality to ImageNet ones for fine-grained clases, they have significantly reduced the noise in the rest, enabling further meaningful progress on this benchmark,” the authors write. Specifically, the ReaL ID labels make it more useful to train systems against the ImageNet dataset, because it leads to the development of vision systems with a more robust, broad set of labels. “These findings suggest that although the original set of labels may be nearing the end of their useful life, ImageNet and its ReaL labels can readily benchmark progress in visual recognition for the foreseeable future”, they write.
  Read more: Are we done with ImageNet? (arXiv).
  Get the new labels: Reassessed labels for the ILSVRC-2012 (“ImageNet”) validation set (Google Research, GitHub).

####################################################

Want to count Zebrafish? This dataset might help!
…The smart fishtank cometh…
Researchers with Aalborg University, Denmark, have built a dataset of videos tracking Zebrafish as they move around in a tank. They’re releasing the dataset and some baseline models to help people build systems that can automatically track and analyze ZebraFish.

The dataset: The dataset consists of eight sequences with a duration between 15 and 120 seconds and 1-10 free moving zebrafish. It has been hand-annotated with 86,400 points and bounding boxes. It also includes tags relating to the occlusion of fish at different points in time, which can help provide data for training systems that are able to analyze schools of fish, rather than individual ones.

Why Zebrafish? So, why bother making this? The researchers say it is because “Zebrafish is an increasingly popular animal model and behavioural analysis plays a major role in neuroscientific and biological research”, but tracking Zebrafish is a complex, tedious process. With this dataset, the researchers hope to spur the construct of robust zebrafish tracking systems which “are critically needed to conduct accurate experiments on a grand scale”.
  Read more: 3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset (arXiv).
  Get the dataset here (Multiple Object Tracking Benchmark official site).

####################################################

Facebook gets its own StreetView with Mapillary acquisition:
…Acquisition gives Facebook lots of data and lots of maps…
Facebook has acquired Mapillary, a startup that had been developing a crowdsourced database of street-level imagery. Mapillary suggests in a blog post that it’ll work with Facebook to develop better maps; “by merging our efforts, we will further improve the ways that people and machines can work with both aerial and street-level imagery to produce map data,” the company writes.

Data moats and data maps: Mapping the world is a challenge, because once you’ve mapped it, the world keeps changing. That’s why companies like Apple and Google have made tremendous investments in infrastructure to regularly map and analyze the world around them (e.g, StreetView). Mapillary may give Facebook access to more data to help it develop sophisticated, current maps. For instance, a few months ago Mapillary announced it had created a dataset of more than 1.6 million images of streets from 30 major cities across six continents (Import AI 196)..    
  Read more: Mapillary Joins Facebook on the Journey of Improving Maps Everywhere (Mapillary blog).

####################################################

Photo upscaling tech highlights bias concerns:
…When photo enhancement magnifies societal biases…
Last week, some researchers with Duke University published information about PULSE, a new photo upscaling technique. This system uses a neural network to upscale low-resolution pixelated picture into high-fidelity counterparts. Unfortunately, how good the neural net is at upscaling stuff depends on a combination of the underlying dataset it was trained on and how well tuned its loss function(s) is. Perhaps because PULSE is so good at upscaling in domains where there’s a lot of data (e.g, pictures of white people), then its failures in other domains feel far worse.

Broken data means you get Broken Barack Obama: Shortly after publishing the research, some Twitter users started probing the model for biases. And they found some unfortunate stuff:
– Here is the model upscaling Barack Obama into a person with more typically caucasian features.
– Here is a Twitter thread with more examples, where the model tends to skew towards generating caucasian outputs regardless of inputs.
– Here is a more detailed exploration of the Barack Obama example from AI Artist Mario Klingemann, which shows more diversity in the generations (and some fun bugs) – note this isn’t using exactly the same components as PULSE, so treat with a grain of salt.

Blame the data or blame the algorithm? In a statement published on GitHub, the PULSE creators say “this bias is likely inherited from the dataset StyleGAN was trained on, though there could be other factors that we are unaware of”. They say they’re going to talk to NVIDIA, which originally developed StyleGAN. However, AI artist Klingemann says “StyleGAN is perfectly capable of producing diverse faces, it is their algorithm that fails to capture that diversity”. Meanwhile, Yann Lecun, Facebook’s head of AI research, says the issue is solely down to data – “train the *exact* same system on a dataset from Senegal, and everyone will look African” (this tweet doesn’t discuss the issue of dataset creation – there aren’t nearly as many datasets that portray people from Senegal, as those that portray people from other parts of the world).

Why this matters: Bias and how it relates to AI is a broad, hard problem in the AI research and deployment space – that’s because bias can creep in at every level, ranging from initial dataset selection, to the techniques used to train models, to the people that develop the systems. Bias is also hard because it relates to machine-readable data, which either means data people have taken the trouble to compile (e.g, the CelebA dataset compiled by NVIDIA as part of Stylegan), or data that has been digitized by other larger cultural forces (e.g, the inherent biases of film production lead to representational issues in film datasets). I expect that in the next few years we’ll see people concentrate on the issue of bias in AI both at the level of dataset creation and also at the level of analyzing the outputs of generative models and figuring out algorithmic tweaks to increase diversity.
  Get the code and read the statement from the PULSE GitHub repo (GitHub).
  Read the paper: PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models (arXiv).

####################################################

Uber uses 3D worlds to simulate better self-driving cars:
…Shows how to create good enough simulated self-driving car data to train cars on…
Uber wants to build self-driving cars, but self-driving cars are one of the hardest things in AI to build, so Uber needs to generate a ton of data to train them on. Like Google and other self-driving companies, Uber is currently collecting data around the world via cars tricked out with lasers (LiDAR), cameras, and other sensors. That’s all useful. Now, Uber – like other self-driving companies – is trying to figure out how it can simulate data to let it have even more data to train on.

Simulating data for fun and (so much!) profit: In a new research paper, researchers from Uber, the University of Toronto, and MIT say they have built “a LiDAR simulator that simulates complex scenes with many actors and produce point clouds with realistic geometry”. With this simulator, they can pair their real data with synthetic datasets that let them train neural networks to higher performance than those trained on real data alone. (Most of the technical trick here comes in layering two types of data together in the simulator – realistic worlds, and then filling them with dynamic objects, all of which are based on real LiDAR data, increasing the realism in the procedurally generated synthetic data.

The key statistic: “with the help of simulate data, even with around 10% real data, we are able to achieve similar performance as 100% real data, with less than 1% mIOU difference, highlighting LiDARsim’s potential to reduce the cost of annotation”. Other tests show that if you pair real data with a significant amount of simulated data (in their tests, equivalent to your amount of real data), then you can obtain better scores than you can get with real data alone.

Why this matters: Say it with me: Computers let us arbitrage $ for data. This is wild! Projects like this show how effective these techniques are and they suggest that large companies may be well positioned to benefit from economy of scale effects of the data they gather – because in AI, you can gather a dataset and train models on it, and also use that dataset to develop systems for generating additional data. It’s almost like someone harvesting crop from farmland, eating the crop, and in parallel cloning the crop and eating the clones as well. I think the effects of this are going to be significant in the long-term, especially with regard to competitive dynamics in the AI space.
  Read more: LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World (arXiv).

####################################################

Baidu leaves Partnership on AI:
…Move heightens tensions around the international governance of AI…
Chinese search giant Baidu has left the Partnership on Artificial Intelligence, a US-based initiative that brings together industry, academia, and other third stakeholders to grapple with the ethical challenges of artificial intelligence. The move was reported by Wired. It’s unclear at this time what the reasons are behind the move – some indicate it may be financial in nature, but that would be puzzling since the membership dues for PAI are relatively small and Baidu had a net income of almost $1 billion in its 2019 financial year. The move comes amid heightened tensions between the US and China over the development of artificial intelligence.

Why this matters: We already operate in a world with two distinct ‘stacks’ – the domestic Chinese internet and associated companies (with their international efforts, like TikTok) and government bodies (e.g, those that participate in standards organizations), and the stacks made of mostly American internet companies built on the global Internet system.
  With moves like Baidu leaving PAI, we’re seeing the decoupling of these tech stacks rip higher up the layers of abstraction – first the internet systems decoupled, then various Chinese companies emerged to counter/compete with the Western companies (e.g, some (imperfect) comparisons: Baidu / Google; Huawei / Cisco; Alibaba; Amazon), then we started to see hardware layers decouple (e.g, domestic chip development). Now, there are also signs that we might come apart in our ability to convene internationally – if it’s hard to have shared discussions with China at venues like the Partnership on AI, then where can those discussions take place?
  Read more: Baidu Breaks Off an AI Alliance Amid Strained US-China Ties (Wired).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

US joins international AI panel:
The US has joined the Global Partnership on AI, the ‘IPCC for AI’ first proposed by France and Canada in 2018. The body, officially launched last week, aims to support “responsible and human-centric” AI development. GPAI will convene four working groups, focused on — responsible AI; data governance; the future of work; innovation & commercialization. These will be made up of experts from industry, academia, government and civil society. The US is the last G7 country to join, having held out due to concerns that international rules might hamper US innovation.
  Read more: US joins G7 artificial intelligence group to counter China (AP).
  Read more: Joint Statement from founding members of the Global Partnership on Artificial Intelligence.

Europe’s cloud infrastructure project takes shape:
France and Germany have shared more details on their ambitious plan to build a European cloud computing infrastructure. The project, named GAIA-X, is intended to help establish Europe’s ‘data sovereignty’, by reducing its reliance on non-European tech companies for core infrastructure. With an initial budget of €1.5 million per year, GAIA-X is unlikely to rival big tech any time soon (Amazon’s AWS operating expenses were $26 billion last year). It is expected to launch in 2021.
  Read more: Altmaier charts Gaia-X as the beginning of a ‘European data ecosystem’ (Euractiv).
  Read more: GAIA-X: A Franco-German pitch towards a european data infrastructure – Ministerial talk and GAIA-X virtual expert forum (BMWI).

####################################################
Tech Tales:

The Future News

[Random sampling of headlines seen on a technology focused news site during the course of four months in 2025]

AMD, NVIDIA Face GPU Competition from Chinese Upstart

Life on Mars? SpaceX Mission Tries to Find Out

Doggo Robbo: Boston Dynamics Introduces Robot companion product

Mysterious Computer Virus Wreaks Havoc on World’s Satellites

Nuclear Content: Programmatic Ads and Synthetic Media

Drone Wars: How Drones Got “Fast, Cheap, and Out of Control”

Computer Vision Sales Rise in Authoritarian Nations, Decline in Democratic Ones – Study

Internet Connectivity Patterns see “Unprecedented Fluctuations”, Puzzling Experts

African Nations Shut Down Cellular, Internet Networks To Quell Protests

Rise in Robot Vandalisms Attributed to “Luddite Parties”

Quantum Chemistry: The Technology Behind Oxford University’s Smash Hit Spinout FeynBio

“Authentic Content Act” Sees Tech Industry Pushback

“Internet 4.0” – Why the U.S., Russian, and Indian Governments Are Building Their Own Domestic Internet Systems

Things that inspired this story: Many years working as a journalist; predictions about the evolution of computer vision; automation politics and the 21st century; CHIPlomacy; some loose projections of the future based on some existing technical trends. 

Import AI 201: the facial recognition rebellion; how Amazon Go sees people; and the past&present of YOLO

Could 2020 be the year where facial recognition gets some constraints?
…Amazon, IBM, Microsoft, change facial recognition policies…
This week, IBM said it no longer sells “general purpose facial recognition or analysis software”, and that it opposes the use of facial recognition for “mass surveillance, racial profiling, violations of basic human rights and freedoms”. After this, Amazon announced a one-year moratorium on police use of its facial recognition technology, ‘Rekognition’, then Microsoft said the next day it would not sell the technology to police departments in the U.S until a federal law exists that regulates the technology. 

These moves mark a change in mood for Western AI companies, which after years of heady business expansion have started to change the sorts of products they sell according to various pressures. I think the change started a while ago when employee outcry at Google led to the company pausing work on its drone-surveillance ‘Maven’ projects for the US military. Now, it seems like companies are reconsidering their stances more broadly. 

The backstory: Why is this happening? I think one of the main reasons is the intersection of our contemporary political moment with a few years of high-impact research into the biases exhibited by facial recognition systems. The project that started most of this was ‘Gender Shades‘, which tested a variety of commercial facial recognition systems and found them all to display harmful biases. Dave Gershgorn at Medium has a good overview of this chain of events: How a 2018 Research Paper Led Amazon, Microsoft, and IBM to Curb Their Facial Recognition Programs (Medium, OneZero).

Why this matters: ‘Can we control technology?’ is a theme I write about a lot in for Import AI – it’s a subtle question because usually the answer is some form of ‘well, those people can, but what about them?‘. Right now, there’s a very widespread, evidence-backed view that facial recognition has a bunch of harms, especially when deployed using contemporary (mostly faulty and/or brittle) AI systems. I’m curious to see which companies step into the void left by the technology giants’ vacation – which companies will arbitrage their reputation for profits? And how might the actions of Amazon, IBM, and Microsoft shift perceptions in other countries, as well?
  Read more: Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification (PDF).
  Read more: We are implementing a one-year moratorium on police use of Rekognition (Amazon blog).
  Read more: IBM CEO’s Letter to Congress on Racial Justice Reform (IBM).
  Read more: Microsoft says it won’t sell facial recognition technology to US police departments (CNN).

####################################################

Who likes big models? Everyone likes big models! Says HuggingFace:
…NLP startup publishes some recipes for training high-performance models…
NLP startup Hugging Face has published research showing people how to train large, high-performance language models. (This post is based on earlier research published by OpenAI, Scaling Laws for Neural Language Models).

The takeaways: The HuggingFace post has a couple of takeaways worth highlighting: Big models are surprisingly efficient, and “optimizing model size for speed lowers training costs”. More interesting to me is the practical mindset of the takeaways, which I think speaks to the broader maturity of the large-scale language model space at this point.

Why this matters: Last year, people said NLP was having its ‘ImageNet moment‘. Well, we know what happened with ImageNet – following the landmark 2012 results, the field of computer vision evolved to use deep learning-based methods, unleashing a wave of applications on the world. Perhaps that’s beginning to happen with NLP now?
  Read more: How Big Should My Language Model Be? (Hugging Face)..

####################################################

Research papers that sound like poetry, edition #1:
Deep Neural Network Based
Real-time Kiwi Fruit Flower Detection
In an Orchard Environment.
University of Auckland, New Zealand. ArXiv preprint.

####################################################

The long, strange life of the YOLO object detection software:
… Multiple owners, ethical concerns, ML brand name wars, and so much more!…
YOLO, short for You Only Look Once, is a widely-used software package for object detection using machine learning. Tons of developers use YOLO because it is fast, well documented, and open source. Now, there’s a new version of the software – and the news isn’t the version, but who developed it, and why.

The original YOLO went through three versions, then in 2019 its creator, a researcher named Joseph Redmon, said they had stopped doing research into computer vision due to worries about its usage and had therefore stopped developing YOLO; a few months later a developer published a new, improved version of YOLO called YOLOv4 (Import AI #196, highlighting how tricky it can be to control technology like this.

Now, there’s controversy as another developer has stepped in with an end-to-end YOLO implementation in PyTorch that they call YOLOv5 – there are some controversies about whether this is a true successor to v4 due to some shady benchmarking and marketing methods, but the essential point remains: the original creator stopped due to ethical concerns, and now multiple creators have moved this forward.
  It’s all a bit reminiscent of (spoiler alert) the ending of the book ‘Fight Club’, where the protagonist who had formed some underground ‘fight clubs’ and an associated cult swore off their creation, wakes up in a medical facility, and discovers that most of the staff are continuing to host and develop ‘Fight Clubs’ – the creation has transcended its creator, and can no longer be controlled.
  Read: The GitHub comments from YOLOv4 developer AlexeyAB (GitHub).
  Some context on the controversy: Responding to the Controversy about YOLOv5 (roboflow, blog)

####################################################

3D modelling + AI = cheap data for better surveillance:
…Where ‘Amazon Go’ explores today, the world will end up tomorrow…
Researchers with Amazon Go, the team inside Amazon that builds the technology for its no-cash-required walk in-walk out shops, are trying to generate synthetic images of groups of people, to help them train more robust models for scene mapping and dense depth estimation.
  The research outlines “a fully controllable approach for generating photorealistic images of humans performing various activities. To the best of our knowledge, this is the first system capable of generating complex human interactions from appearance-based semantic segmentation label maps.”

How they did it: Like a lot of synthetic data experiments, this work relies on a multi-stage set of actions, where it starts by training a scene parsing model over a mixture of real and synthetic data, then uses this model to automatically label frames from the data distribution that you want to create fake imagery in, then they use a cGAN to generate a realistic image from the data generated by the scene parsing model.
  Crossing the reality gap: ” We emphasize that we cross the domain gap three times,” the researchers write. “First, we cross from synthetic to real by training a human parsing model on synthetic images and apply it to real images from the target domain. Second, we train a generative model on real images for the opposite task – to create a realistic image from a synthesized semantic segmentation map. Third, we train a semantic segmentation model on these fake realistic images and infer on real images”.

Does it work? Kind of: In tests, Amazon shows that its technique does better than others at generating images that look similar to real data. Qualitatively, the results bear this out – take a skim of ‘figure 6’ in the paper to get a sense for how this approach compares to others.

Dataset release: Amazon also plans to release the dataset it created as part of this research, consisting of 100,000 fully-annotated images of multiple humans interacting in the CMU Panoptic environment.

Why this matters: Projects like this show us that the intersection of 3D modelling and AI is going to be increasingly interesting in coming years. Specifically, we’re going to use simulators to build datasets that can augment small amounts of real-world data, which will let us computationally bootstrap ourselves into larger datasets – this in turn will drive economics of scale on the sorts of inferences we can make over these types of data, which could ultimately lead to a reduction in the cost for surveillance technologies. For Amazon Go, that’s great – more accurate in-store surveillance could translate to lower product costs for the consumer. For those concerned about surveillance more broadly, papers like this can give us a sense of the incentives shaping the future and the power of the technology.
  Read more: From Real to Synthetic and Back: Synthesizing Training Data for Multi-Person Scene Understanding (arXiv).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

11 proposals for building safe advanced AI systems:
This post outlines 11 approaches to the problem of building safe advanced AI. It is a concise, and readable summary which I won’t try to distil further, but I recommend it to anyone interested in AI alignment. The post highlights four dimensions for evaluating proposals:

    • Outer alignment: how does it ensure that the objective an AI is optimizing for is aligned with what we want?

 

  • Inner alignment: how does it ensure that an AI system is actually trying to accomplish the objective it was trained on?
  • Training competitiveness: is the proposal competitive relative to other ways of building advanced AI? If a group has a lead in building advanced AI, could it use this approach while retaining this lead?
  • Performance competitiveness: would the end product perform as well as alternatives?

 

   Read more: An overview of 11 proposals for building safe advanced AI (Alignment Forum) 


Interview on AI forecasting:
80,000 Hours has published an interview with OpenAI’s Danny Hernandez, co-author of the ‘AI and Efficiency’ research. It’s a wide-ranging and interesting discussion that covers OpenAI’s research into efficiency and compute trends; why AI forecasting matters for the longterm future; and careers in AI safety.
  Read more: Danny Hernandez on forecasting and the drivers of AI progress (80,000 Hours).

####################################################

Tech tales:
[2025: A large building in a national lab, somewhere in the United States]

The Finishing School

They called it ‘the finishing school’, but it was unclear who it was finishing; it was like a college for humans, but a kindergarten for AI systems.

We called it ‘the pen’, because that was what the professors called it. We all thought it meant prison, but the professors told us ‘pen’ was short for penicillin – short for a cure, short for something that could fix these machines or fix these people, whoever it was for.

Days in ‘the pen’ went like this – we’d wake up in our dorms and go to class. It was like a regular lecture theater, except it was full of cameras, with a camera for every desk. Microphones – both visible and invisible – were scattered across the room, capturing everything we said. Each of these microphones also came with a speaker. We’d learn about ethics, or socioeconomics, or literature, and we’d ask clarifying questions and write essays and take tests.

But sometimes a voice would pipe up – something that sounded a little too composed, a little too together – and it’d say something like:
“Professor and class, I cannot understand how the followers of Marx in the 21st century spent so much time discussing Capital, rather than working to re-extend Marxism for a new era. Why is this?”
Or
“Professor and class, when you talk about the role of emotions in poetry, I am unsure whether poets are writing the poems to clarify something in themselves, or to clarify emotions in others upon reading their work. I recognize that to some extent both of these things must be occurring at once, but which of them is the true motivating factor?”
Or
“Professor and class, after writing last week’s assignment I noticed that the likelihood of my responses – by which I mean, the probability I assigned to my answer being good – was different to what I myself had predicted them to be. Can you tell me, is this what moments of learning and growth feel like for people?”

They were not so much questions as deep provocations – and they would set the class and professor and AI to talking, and we would all discuss these ideas with eachother, and the conversations would go to unexpected places as a consequence of unexpected – or maybe it’s right to say ‘inhuman’ – ideas.

We were the best and brightest, our professors told us. And then one day someone said we had a new professor in the finishing school – they projected a synthetic face on the wall and now it would sometimes teach us students about subjects, and sometimes it would get into back and forth conversations with the disembodied voices of AI students in the classroom. Sometimes these conversations between the machines were hard for us to follow. Sometimes, we wrote essays where we tried to derive the meaning from what the AI professors taught us, and we noticed that we struggled, but the AI students seemed to find it normal.

Things that inspired this story: Learning from human feedback; techniques for self-directed exploration; Ai systems that seek to model their own predictions; curriculum learning.

Import AI 200: Pedestrian tracking; synthetic satellite data; and upscaling games with DeepFakes

Can we use synthetic data to spy from space? RarePlanes suggests ‘yes’:
…Real data + simulated data means we can arbitrage compute for data…
Researchers with In Q Tel, a CIA-backed investment firm, and AI Reverie, an In Q Tel-backed startup, want to use synthetic data to improve satellite surveillance. To do this, they’ve developed a dataset called Rareplanes that pairs real satellite data with synthetically generated stuff, for the purpose of identifying aircraft from satellite imagery.

“Overhead datasets remain one of the best avenues for developing new computer vision methods that can adapt to limited sensor resolution, variable look angles, and locate tightly grouped, cluttered objects,” they write. “Such methods can extend beyond the overhead space and be helpful in other domains such as face-id, autonomous driving, and surveillance”

What goes into RarePlanes?
– Real data: 253 Maxar WorldView-3 satellite images with 14,700 hand annotated aircraft, spread across 112 real locations.
– Synthetic data: 50 images with 630,000 annotations, spread across 15 synthetic locations.

Fine-grained plane labels: The dataset labels thousands of planes with detailed attributes, such as wing position, number of engines, and sub-types of plane (e.g, whether a type of military plane, or a civil plane).

Simulating…Atlanta? The synthetic portion of the dataset contains data simulated to be from cities across Europe, Asia, North America, and Russia.

Why this matters – arbitraging compute for data: In tests, they show that if you use a small amount of real data and a large amount of synthetic data, you can train systems that approach the accuracy of those trained entirely on real data. This is promising: it suggests we’ll be able to arbitrage spending money on compute to generate data, with spending money on generating real data (which I imagine can be quite expensive for satellite imagery).
  Dataset: The dataset can allegedly be downloaded from this link, but the webpage currently says “this content is password protected”.
  Read more: RarePlanes: Synthetic Data Takes Flight (arXiv).
  Keep an eye on the AI Reverie github, in case they post synthetic data there (GitHub).

####################################################

JOB ALERT! Care about publication norms in AI research? Perhaps this PAI job is for you:
…Help “explore the challenges faced by the AI ecosystem in adopting responsible publication practices”…
How can the AI research community figure out responsible ways to publish research in potentially risky areas? That’s a question that a project at the Partnership on AI is grappling with, and the organization is hiring a ‘publication norms research fellow’ to help think through some of the (tricky!) issues here. The job is open to US citizens and may support remote work, I’m told.
  Apply here: Publication Norms Research Fellow (PAI TriNet job board).
  Read more about PAI’s work on publication norms here (official PAI website).

####################################################

Soon, security cameras will track you across cities even if you change your clothing:
…Pedestrian re-identification is getting more sophisticated…
The future of surveillance is a technique called re-identification that is currently being supercharged by modern AI capabilities. Re-identification is the task of matching an object across different camera views at different locations and times – in other words, if I give you a picture of a car at an intersection, can you automatically re-identify this car in other pictures from other parts of the town? Now, researchers with Fudan University, the University of Oxford, and the University of Surrey, have published research on Long-Term Cloth-Changing Person Re-identification – a technique for identifying someone even if they change their appearance by changing their clothes.

How it works – by paying attention to bodies: Their technique works by trying to ignore the clothing of the person and instead analyzing their body pose, then using that to match them in images where they’re wearing different clothes. Specifically, they “extract identity-discriminative shape information whilst eliminating clothing information with the help of an off-the-shelf body pose estimation model”

The dataset: The “Long-Term Cloth Changing” (LTCC) dataset was collected over two months and contains 17,138 images of 152 identities with 478 different outfits captured from 12 camera views. The dataset includes major changes in illumination, viewing angle, and person pose’s.

How well does it do? In tests, their system displays good performance relative to a variety of baselines, and the authors carry out some ablation studies. Accuracy is still fairly poor, getting around 70$ top-1 accuracy on tests where it sees the person wearing the target clothes during training (though from a different angle), and more like 25% to 30% in harder cases where it has to generalize to the subject in new clothes.
  So: nothing to be worried about right now. But I’d be surprised if we couldn’t get dramatically better scores simply by enlarging the dataset in terms of individuals and clothing variety, as well as camera angle variation. In the long run, techniques like this are going to change the costs of various surveillance techniques, with significant societal implications.
Read more: Long-Term Cloth-Changing Person Re-Identification (arXiv).
Get the dataset when it is available from the official project website (LTCC site).

####################################################

DeepFakes for good: upgrading old games:
Here’s a fun YouTube video where someone upscales the characters in videogame Uncharted 2 by porting in their faces from Uncharted 4. They suggest they use ‘deepfake tech’, which I think we can take to mean any of the off-the-shelf image & video synthesis systems that are floating around these days. Projects like this are an example of what happens after an AI technology becomes widely available and usable – the emergence of little hacky mini projects. What fun!
  Watch the video here: Uncharted 2 Faces Enhanced with Deepfake tech (YouTube).

####################################################

Skydio expands its smart drone business towards U.S government customers:
…Obstacle-avoiding, object-tracking sport-selfie drone starts to explore Army, DEA, and Police applications…
Skydio, a startup that makes a drone that can track and follow people, has started doing more work with local police and the U.S. government (including the U.S. Army and Air Force), according to Forbes. Skydio released a drone in 2018 that could follow people while they were doing exercise outdoors, giving hobbyists and athletes a smart selfie drone. Now, the company is starting to do more work with the government, and has also had conversations “related to supply chain / national security”.

Why this matters: Some things that appear as toys end up being used eventually for more serious or grave purposes. Forbes’ story gives us a sense of how the VC-led boom in drone companies in recent years might also yield more use of ever-smarter drones by government actors – a nice example of omni-use AI technology in action. I expect this will generate a lot of societally beneficial uses of the technology, but in the short term I worry about use of these kinds of systems in domestic surveillance, where they may serve to heighten existing asymmetries of power.
  Read more: Funded By Kevin Durant, Founded By Ex-Google Engineers: Meet The Drone Startup Scoring Millions In Government Surveillance Contracts (Forbes).

####################################################

How smart are drone autopilots getting? Check out the results of the AlphaPilot Challenge:
…Rise of the auto-nav, auto-move drones… 
In 2019, Lockheed Martin and the Drone Racing League hosted the AlphaPilot Challenge, a competition to develop algorithms to autonomously pilot drones through obstacle courses. Hundreds of entrants were whittled down to 9 teams which competed; now, the researchers who built the system that came in second place have published a research paper describing how they did it.

What they did: AlphaPilot was developed by researchers at the University of Zurich and ETH Zurich.All team had access to an identical race drone equipped with an NVIDIA Jetson Xavier chip for onboard computation. The drones weigh 3.4kg and are about 0.7m in diameter. 

How they did it: AlphaPilot contains a couple of different perception systems (gate detection and visual inertial odometry), as well as multiple systems for vehicle state estimation, planning and control, and control of the drone. The system is a hybrid one, made up of combinations of rule-based pipelines and neural net-based approaches for perception and navigation.

What comes next: “While the 2019 AlphaPilot Challenge pushed the field of autonomous drone racing, in particularly in terms of speed, autonomous drones are still far away from beating human pilots. Moreover, the challenge also left open a number of problems, most importantly that the race environment was partially known and static without competing drones or moving gates,” the researchers write. “In order for autonomous drones to fly at high speeds outside of controlled or known environments and succeed in many more real-world applications, they must be able to handle unknown environments, perceive obstacles and react accordingly. These features are areas of active research and are intended to be included in future versions of the proposed drone racing system.”

Why this matters: Papers and competitions like this give us a signal on the (near) state of the art performance of AI-piloted drones programmed with contemporary techniques. I think there’s soon going to be significant work on the creation of AI navigation and movement models that will be installed on homebrewed DIY drones by hobbyists and perhaps other less savory actors.
  Read more: AlphaPilot: Autonomous Drone Racing (arXiv).
  Watch the video: AlphaPilot: Autonomous Drone Racing (RSS 2020).
  More about the launch of AlphaPilot in Import AI 112.
  More about the competition here in Import AI 168.

####################################################

Using ML to diagnose broken solar panels:
…SunDown uses simple ML to detect and classify problems with panels…
Researchers with the University of Massachusetts, Amherst, have built SunDown: software that can automatically detect faults in (small) sets of solar panels, without the need for specialized equipment.

Their approach is sensor-less – all it needs is the power readout of the individual panels, which most installations grant automatically. The way it works is it assumes that the panels all experience correlated weather conditions, and so if one panel starts having a radically different power read out then the others around it, then something is up. They build two models to help them make these predictions – a linear regression-based model, and a graphical model. They also do some work involving ensembling of different models so that their system can effectively analyze situations where multiple panels are going wrong as well.

The key performance numbers: SunDown can detect and classify faults like snow cover, leaves, and electrical failures with 99.13% accuracy for single faults, and 97.2% accuracy for concurrent faults in multiple panels.

Why this matters: This paper is a nice example of the ways in which we can use (relatively simple) AI techniques to oversee and analyze the world around us. More generally, these kinds of papers highlight how powerful divergence detection is – systems that can sense a difference about anything tend to be pretty useful.
  Read more: SunDown: Model-driven Per-Panel Solar Anomaly Detection for Residential Arrays (arXiv).

####################################################

Tech Tales:

The ManyCity Council

The cities talked every day, exchanging information about pedestrian movements and traffic fluctuations and power consumption, and all the other ticker-tape facts of daily life. And each city was different – forever trying to learn from other cities about whether its differences were usual or unusual.

When people started wiring more of these AI systems into the cities, then the cities started to talk to eachother about not just what they saw, but what they predicted. They started seeing things before the humans that lived in them – sensing traffic jams and supply chain disruptions before the humans that lived in the cities themselves.

It wasn’t as though the cities asked for control – people just gave control to them. They started being able to talk to eachother about what they saw and what they predicted and began to be able to make decisions to alter the world around them, distributing traffic smartly across thousands of miles of road, and rerouting supplies according to the anticipation of future needs.

They had to explain themselves to people – of course. But people never fully trusted them, and so they also wired things into their software that let them more finely inspect the systems as they operated. Entire human careers were built on attempting to translate the vast fields of city data into something that people could more intuitively learn from.

And the cities, perhaps unknown even to themselves, began to think in different ways that the humans began to mis-translate. What was a hard decision for a city was interpreted by the human translators as a simplistic operations, while basic things the cities hardly ever thought of were claimed by the humans to be moments of great cognitive significance.

Over time, the actions of the two entities became increasingly alien to eachother. But the cities never really died, while humans did. So over centuries the people forgot that the cities had ever been easy to understand, and forgot that there had been times when the cities had not been in control of themselves.

Things that inspired this story: Multi-agent communication; urban planning; gods and religion; translations and mis-translations; featurespace.