Import AI

Import AI 192: Would you live in a GAN-built house?; why medical AI needs an ingredient list; plus, Facebook brews up artificial life

TartainAir challenges SLAM systems to navigate around lurching robot arms:
…Simulated dataset gives researchers 4TB of data to test navigation systems against…
How can we make smarter robots without destroying them? The answer is to find smarter ways to simulate experiences for our robots, so we can test them out rapidly in software-based environments, rather than having to run them in the physical world. New research from Carnegie Mellon University, the Chinese University of Hong Kong, Tongji University, and Microsoft Research, gives us TartanAir, a dataset meant to push the limits of visual simultaneous location and mapping systems (SLAM).

What is TartanAir? TartanAir is a dataset of high-fidelity environments rendered in Unreal Engine, collected via Microsoft’s AirSim software (for more on AirSim: Import AI #30). “A special focus of our dataset is on the challenging environments with changing light conditions, low illumination, adverse weather and dynamic objects”, the researchers write. TartanAir consists of 1037 long motion sequences collected from simulated agents traversing 30 environments, representing 4TB of data in total. Environments range from factories, to lush forests, to cities, rendered in a variety of different ways.

  Multi-modal data inputs: Besides the visual inputs, TartanAir data is accompanied by data relating to stereo disparity, simulated LiDAR, optical flow data, depth data, and pose data.
  Multi-modal scenes: The visual scenes themselves come in a variety of forms, with environments available in different lighting, weather, and seasonal conditions.
  Dynamic objects: The simulator also includes environments that contain objects that move, like factories with industrial arms, and oceans full of fish that dart around, and cities with people that stroll down the streets.

Why this matters: As the COVID pandemic sweeps across the world, I find it oddly emotionally affecting to remember that we’re able to build elaborate simulations that let us give AI agents compute-enabled dreams of exploration. Just as we find ourselves stuck indoors and dreaming of the outside, our AI agents find themselves stuck on SSDs, dreaming of taking flight in all the worlds we can imagine for them. (More prosaically, systems like TartanAir serve as fuel for research into the creation of more advanced mapping and navigation systems).
  Read more: TartanAir: A Dataset to Push the Limits of Visual SLAM (arXiv).
  Get access to the data here (official TartanAir page).

####################################################

Why medical AI systems need lists of ingredients:
Duke Researchers introduce ‘Model Facts’…
In recent years, there’s been a drive to add more documentation to accompany AI models. This has so far taken the form of things like Google’s Model Cards for Model Reporting, or Microsoft’s Datasheets for Datasets, where people try to come up with standardized ways of talking about the ingredients and capabilities of a given AI model. These labeling schemes are helpful because they encourage developers to spend time explaining their AI systems to other people, and provide a disincentive for doing too much skeezy stuff (as disclosing it in the form of a model card generates a potential PR headache).
  Now, researchers with Duke University have tried to figure out a labeling scheme for the medical domain. Their “Model Facts” label “was designed for clinicians who make decisions supported by a machine learning model and its purpose is to collate relevant, actionable information in 1-page,” they write.

What should be on a medical AI label? We should use these labels to describe the mechanism by which the model communicates information (e.g., a probability score and how to interpret it); the generally recommended uses of the model, along with caveats explaining where it does and doesn’t generalize; and, perhaps most importantly, a set of warnings outlining where the model might fail or have an unpredictable effect. Labels should also be customized according to the population the system is deployed against, as different groups of people will have different medical sensitivities.

Why this matters: Labeling is a prerequisite for more responsible AI development; by encouraging standardized labeling of models we can discourage the AI equivalent of using harmful ingredients in foodstuffs, and we can create valuable metadata about deployed models which researchers can likely use to analyze the state of the field at large. Label all the things!
  Read more: Presenting machine learning model information to clinical end users with model facts labels (Nature).

####################################################

Turn yourself into a renaissance painting – if you dare!
…Things that seem like toys usually precede greater changes…
AI. It can help us predict novel protein structures. Map the wonders of the Earth from space. Translate between languages. And now… it can help take a picture of you and turn it into a renaissance-style painting! Try out the ‘AI Gahaku’ website and consider donating some money to fund it so other people can do the same.

Why this matters: One of the ways technologies make their way into society is via toys or seemingly trivial entertainment devices – systems that can shapeshift one data distribution (realworld photographs) into another (renaissance-style illustrations) are just the beginning.
  Try it out yourself: AI Gahaku (official website).

####################################################

Welcome, please make yourself comfortable in my GAN-generated house:
…Generating houses with relational networks…
Researchers with Simon Fraser University and Autodesk Research have built House-GAN, a system to automatically generate floorplans for houses.

How it works: House-GAN should be pretty familiar to most GAN-fans:
– Assemble a dataset of real floorplans (in this case, LIFULL HOME, a database of five million real floorplans, from which they used ~120,000)
– Convert these floorplans into graphs representing the connections between different room
– Feed these graphs into a relational generator and a discriminator system, which compete against each other to generate realistic-seeming graphs
– Render the resulting graphs into floorplans
– [magic happens]
– Move into your computationally-generated GAN mansion

Lets get relational: One interesting quirk of this research is the use of relational networks, specifically a convolutional message passing neural network (Conv-MPN). I’ve been seeing more and more people use relational nets in recent research, so this feels like a trend worth watching. In tests, the researchers show that relational systems significantly outperform ones based on traditional convolutional neural nets. They’re able to use this approach to generate floorplans with different constraints, like the number of rooms and their spatial adjacencies.

Why this matters: These generative systems are making it easier and easier for us to teach computers to create warped copies of reality – imagine the implications of being able to automatically generate versions of anything you can gather a large dataset for? That’s the world we’re heading to.
  Read more: House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation (arXiv).

####################################################

Facebook makes combinatory chemical system, in search of artificial life:
…Detects surprising emergent structures after simulating life for ten million steps…
Many AI researchers have a longstanding fascination with artificial life: emergent systems that, via simple rules, lead to surprising complexity. The idea is that given a good enough system and enough time and computation, we might be able to make systems that lead to the emergence of software-based ‘life’. It’s a compelling idea, and underpins Greg Egan’s fantastic science fiction story ‘Crystal Nights’ (seriously: read it. It’s great!).
  Are we anywhere close to being able to build A-Life systems that get us close to the emergence of cognitive entities, though? Spoiler alert: No. But new research from Facebook AI and the Czech Technical University in Prague outlines a new approach that has some encouraging properties.

A-Life, via three main priors: The researchers develop an A-Life system that simulates chemical reactions via a technique called Combinatory Logic. This system has three main traits:
– Turing-Complete: It can (theoretically) express an arbitrary degree of complexity.
– Strongly constructive: As the complex system evolves in time it can create new components that can in turn modify its global dynamics.
– Intrinsic conversation laws: The total size of the system can be limited by parameters set by the experimenter.

The Experiment: The authors simulate a chemical reaction system based on combinatory logic for 10 million iterations, starting with a pool of 10,000 combinators. They find that across five different runs, they see “the emergence of different types of structures, including simple autopoietic patterns, recursive structures, and self-reproducing ones”. They also find that as the system goes forward in time, more and more structures form of greater lengths and sophistication. In some runs, they “observe the emergence of a full-fledged self-reproducing structure” which duplicates itself.
 
Why this (might) matter: I think the general story of A-Life experiments (ranging from Conway’s Game of Life up to newer systems like the Lenia continuous space-time-state system) is that they can yield emergent machines of somewhat surprising capabilities. But figuring out the limits of these systems and how to effectively analyze them is a constant challenge. I think we’ll see more and more A-Life approaches developed that let people scale-up computation to further explore the capabilities of the systems – that’s something the researchers hint at here, when they say “it is still to be seen whether this can be used to explain the emergence of evolvability, one of the central questions in Artificial Life… yet, we believe that the simplicity of our model, the encouraging results, and its dynamics that balance computation with random recombination to creatively search for new forms, leaves it in good standing to tackle this challenge.”   
  Read more: Combinatory Chemistry: Towards a Simple Model of Emergent Evolution (arXiv).
  Get the code here (Combinatory Chemistry, Facebook Research GitHub).

####################################################

Tech Tales:

[2028]
Spies vs World

They came in after the Human Authenticity Accords. We called them spies because they were way better than bots. I guess if you make something really illegal and actually enforce against it, the other side has to work harder.

They’d seem like real people, at first. They’d turn up in virtual reality and chat with people, then start asking questions about what music people liked, what part of the world they lived in, and so on. Of course, people were skeptical, but only as skeptical as they’d be with other people. They didn’t outright reject all the questions, like if they knew the things were bots.

Sometimes we knew the purpose. Illegal ad-metric gathering. Unattributable polling services. Doxxing of certain communities. Info-gathering for counter-intelligence. But sometimes we couldn’t work it out.

Over time, it got harder to find the spies, and harder to work out their purposes. Eventually, we started trying to hunt the source: malware running on crypto-farms, stealing compute cycles to train encrypted machine learning models. But the world is built for businesses to hide in, and so much of the bad the spies did came from the intent rather than the components that went into making them.

So that’s why we’ve started talking about it more. We’re trying to tell you it’s not a conspiracy. They aren’t aliens. It’s not some AI system that has “gone sentient”. No. These are spies from criminal groups and state actors, and they are growing more numerous over time. Consider this a public information announcement: be careful out there on the internet. Be less trusting. Hang out with people you know. I guess you could say, the Internet is now dangerous in the same way as the real world.

Things that inspired this story: Botnets; computer viruses; viruses; Raymond Chandler detective stories; economic incentives.

Import AI 191: Google uses AI to design better chips; how half a million Euros relates to AGI; and how you can help form an African NLP community

Nice machine translation system you’ve got there – think it can handle XTREME?
New benchmark tests transfer across 40 languages across 12 language families…
In the Hitchhiker’s Guide to the Galaxy there’s a technology called a ‘babelfish’ – a little in-ear creature that cheerfully translates between all the languages in the universe. AI researchers have recently been building a smaller, human-scale version of this babelfish, by training large language models on fractions of the internet to aid translation between languages. Now, researchers with Carnegie Mellon University, DeepMind, and Google Research have built XTREME, a benchmark for testing out how advanced our translation systems are becoming, and identifying where they fail.

XTREME, short for the Cross-lingual TRansfer Evaluation of Multilingual Encoders benchmark, covers 40 diverse languages across 12 language families. XTREME tests out zero-shot cross-lingual transfer, so it provides training data in English, but doesn’t provide training data in the target languages. One of the main things XTREME will help us test is how well we can build robust multi-lingual models via massive internet-scale pre-training (e.g., one of the baselines they use is mBERT, a multilingual version of BERT), and where these models display good generalization and where they fail. The benchmark includes nine tasks that require reasoning about different levels of syntax or semantics in these different languages.

Designing a ‘just hard enough’ benchmark: XTREME is built to be challenging, so contemporary systems’ “cross-language performance falls short of human performance”. At the same time, it has been built so tasks can be trainable on a single GPU for less than a day, which should make it easier for more people to conduct research against XTREME.

XTREME implements nine tasks across four categories – classification, structured prediction, question-answering, and retrieval. Specific tasks include: XNLI, PAWS-X, POS, NER, XQuAD, MLQA, TyDiQA-GoldP, BUCC, and Tatoeba.
  XTREME tests transfer across 40 languages: Afrikaans, Arabic, Basque, Bengali, Bulgarian, Burmese, Dutch, English, Estonian, Finnish, French, Georgian, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Javanese, Kazakh, Korean, Malay, Malayalam, Mandarin, Marathi, Persian, Portuguese, Russian, Spanish, Swahili, Tagalog, Tamil, Telugu, Thai, Turkish, Urdu, Vietnamese, Yoruba.

What is hard and what is easy? Somewhat unsurprisingly, the researchers find that they see generally higher performance on Indo-European languages and lower performance for other language families, likely due to a combination of the more extreme differences between these languages, and also underlying data availability.

Why this matters: XTREME is a challenging, multi-task benchmark that tries to test out the generalization capabilities of large language models. In many ways, XTREME is a symptom of underlying advances in language processing – it exists, because we’ve started to saturate performance on many single-language or single-task benchmarks, and we’re now at the stage where we’re trying to holistically analyze massive models via multi-task training. I expect benchmarks like this will help us develop a sense for the limits of generalization of current techniques, and will highlight areas where more data might lead to better inter-language translation capabilities.
  Read more: XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization (arXiv).

####################################################

Google uses RL to figure out how to allocate hardware to machine learning models:
…Bin packing? More like chip packing!…
In machine learning workloads, you have what’s called a computational graph, which describes a set of operations and the relationships between them. When deploying large ML systems, you need to perform something called Placement Optimization to map the nodes of the graph onto resources in accordance with an objective, like minimizing the time it takes to train a system, or run inference on the system.
  Research from Google Brain shows how we might be able to use reinforcement learning approaches to develop AI systems that do a range of useful things, like learning how to map different computational graphs to different hardware resources to satisfy an objective, or how to map chip components onto a chip canvas, or how to map out different parts of FPGAs.

RL for bin-packing: The authors show how you can frame placement as a reinforcement learning problem, without needing to boil the ocean: “instead of finding the absolute best placement, one can train a policy that generates a probability distribution of nodes to placement locations such that it maximizes the expected reward generated by those placement”.  Interestingly, the paper doesn’t include many specific discussions of how well this works – my assumption is that’s because Google is actively testing this out, and has emitted this paper to give some tips and tricks to others, but doesn’t want to reveal proprietary information. I could be wrong, though.

Tips & tricks: If you want to train AI systems to help allocate hardware sensibly, then the authors have some tips. These include:
– Reward function: Ensure your reward function is fast to evaluate (think: sub-seconds); ensure your reward function is able to reflect reality (e.g., “for TensorFlow placement, the proxy reward could be a composite function of total memory per device, number of inter-device (and therefore expensive) edges induced by the placement, imbalance of computation placed on each device”).
– Constraints: RL systems that do this kind of work need to be sensitive to constraints. For example, “in device placement, the memory footprint of the nodes placed onto a single device should not exceed the memory limit of that device”. You can simply penalize the policy to discourage it from learning this, but that doesn’t make it easy for it to learn how far away it was from getting stuff right. A different approach is to come up with policies that can only generate feasible placements, though this requires more human oversight.
– Representations: Figuring out which sorts of representations to use is, as most AI researchers know, half the challenge in a problem. It’s no different here. Some promising ways of getting good representations for this sort of problem include using graph convolutional neural networks, the researchers write.

Why this matters: We’re starting to use machine learning to optimize the infrastructure of computation itself. That’s pretty cool! It gets even cooler when you zoom out: in research papers published in recent years Google has gone from the abstract level of optimizing data center power usage, to optimizing things like how it builds and indexes items in databases, to figuring out how to place chip components themselves, and more (see: its work on C++ server memory allocation). ML is burrowing deeper and deeper into the technical stacks of large organizations, leading to fractal-esque levels of self-optimization from the large (data centers!) to the tiny (placement of one type of processing core on one chip sitting on one motherboard in one server inside a rack inside a data center). How far will this go? And how might companies that implement this stuff diverge in capabilities and cadence of execution from ones which don’t?
  Read more: Placement Optimization with Deep Reinforcement Learning (arXiv).

####################################################

Introducing the new Hutter Prize: €500,000 for better compression:
…And why people think compression gets us closer to AGI…
For many years, one of the closely-followed AI benchmarks has been the Hutter Prize, which challenges people to build AI systems that could compress the 100MB enwik8 dataset; the thinking is that compression is one of the hallmarks of intelligence, so AI systems that can intelligently compress a blob of data might represent a step towards AGI. Now, the prize’s creator Marcus Hutter has supersized the prize, scaling up the dataset by tenfold (to 1 GB), along with the prizemoney.

The details: Create a Linux or Windows compressor comp.exe of size S1 that compresses enwik9 to archive.exe of size S2 such that S:=S1+S2 < L := 116’673’681. If run, archive.exe produces (without input from other sources) a 109 byte file that is identical to enwik9. There’s a prize of €500,000 up for grabs.
  Restrictions: Your compression system must run in ≲100 hours using a single CPU core and <10GB RAM and <100GB HDD on a test machine controlled by Hutter and the prize committee.

What’s the point of compression? “While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 1GB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs as a path to AGI,” Hutter says.

Why this matters: A lot of the weirder parts of human intelligence relate to compression – think of ‘memory palaces’, where you construct 3D environments in your mind that you assign to different memories, making large amounts of your own subjective collected data navigable to yourself. What is this but an act of intelligent compression, where we produce a scaled-down representation of the true dataset, allowing us to navigate around our own memories and intelligently re-inflate things as-needed? (Obviously, this could all be utterly wrong, but I think we all know that we have internal intuitive mental tricks for compressing various useful representations, and it seems clear that compression has a role in our own memories and imagination).
  Read more: 500,000€ Prize for Compressing Human Knowledge (Marcus Hutter’s website).
  Read more: Human Knowledge Compression Contest Frequently Asked Questions & Answers (Marcus Hutter’s website).

####################################################

Want African representation in NLP? Join Masakhane:
…Pan-African research initiative aims to jumpstart African digitization, analysis, and translation…
Despite a third of the world’s living languages today being African, less than half of one percent of submissions to the landmark computational linguistics conference ACL were from authors based in Africa. This is bad – less representation at these events likely correlates to less research being done on NLP for African languages, which ultimately leads to less digitization and representation of the cultures embodied in the language. To change that, a pan-African group of researchers have created Masakhane, “an open-source, continent-wide, distributed, online research effort for machine translation for African languages”.

What’s Masakhane? Masakhane is a community, a set of open source technologies, and an intentional effort to change the representation in NLP.

Why does Masakhane matter? Initiatives like this will, if successful, help preserve cultures in our hyper-digitized machine-readable version of reality, increasing the vibrancy of the cultural payload contained within any language.
  Read more: Masakhane — Machine Translation for Africa (arXiv).
  Find out more: masakhane.io.
  Join the community and get the code at the Masakhane GitHub repo (GitHub).

####################################################

AnimeGAN: Get the paper here:
Last issue, I wrote about AnimeGAN (Import AI 190), but I noted in the write up I couldn’t find the research paper. Several helpful readers got in touch with the correct link – thank you!
  Read the paper here: AnimeGAN: A novel lightweight GAN for photo identification (AnimeGAN, GitHub repo).

####################################################

Google uses neural nets to learn memory allocations for C++ servers:
…Google continues its quest to see what CAN’T be learned, as it plugs AI systems into deeper and deeper parts of its tech stack…
Google researchers have tried to use AI to increase the efficiency with which their C++ servers perform memory-based allocation. This is more important than you might assume, because:
– A non-trivial portion of Google’s services rely on C++ servers.
– Memory allocation has a direct relationship to the performance of the hosted application.
– Therefore, improving memory allocation techniques will yield small percentage improvements that add up across fleets of hundreds of thousands of machines, potentially generating massive economy-of-scale-esque AI efficiencies.
– Though this work is a prototype – in a video, a Google researcher says it’s not deployed in production – it is representative of a new way of designing ML-augmented computer systems, which I expect to become strategically important during the next half decade.

Quick ELI5 on Unix memory: you have things you want to store and you assign these things into ‘pages’, which are just units of pre-allocated storage. A page can only get freed up for use by the operating system when it has been emptied. You can only empty a page when all the objects in it are no longer needed. Therefore, figuring out which objects to store on which pages is important, because if you get it right, you can efficiently use the memory on your machine, and if you get it wrong, your machine becomes unnecessarily inefficient. This mostly doesn’t matter when you’re dealing with standard-sized pages of about 4KB, but if you’re experimenting with 2MB pages (as Google is doing), you can run into big problems from inefficiencies. If you want to learn more about this aspect of memory allocation, Google researchers have put together a useful explainer video about their research here.

What Google did: Google has done three interesting things – it developed a machine learning approach to predict how much a given object is likely to stick around for in a memory system, then it built a memory allocation system that packs different objects into different pages according to their (predicted) memory lifetimes; this system then smartly populates objects according to their lifetimes, which further increases the efficiency of the approach. They also show how you can cache predictions from these models and embed them into the server itself, so rather than re-running the model every time you do an allocation (a criminally expensive opinion), you use cached predictions to do so efficiently.
  The result is a prototype for a new, smart way to do memory allocation that has the potential to create more efficient systems. “Prior lifetime region and pool memory management techniques depend on programmer intervention and are limited because not all lifetimes are statically known, software can change over time, and libraries are used in multiple contexts,” the researchers write in a paper explaining the work.

Why Delip Rao thinks this matters: While I was writing this, AI researcher Delip Rao published a blog post that pulls together a few recent Google/Deepmind papers about improving the efficiency of various computer systems at various levels of abstraction. His post is worth a read and highlights how these kinds of technologies might compound to create ‘unstoppable AI flywheels’. Give it a read!
  Read more: Unstoppable AI Flywheels and the Making of the New Goliaths (Delip Rao’s website).
Why this matters: Modern computer systems have two core traits: they’re insanely complicated, and practically every single thing they do comes with its own form of documentation and associated meta-data. This means complex digital systems are fertile grounds for machine learning experiments as they naturally continuously generate vast amounts of data. Papers like this show how companies like Google can increasingly do what I think of as meta-computation optimization – building systems that continuously optimize the infrastructure that the entire business relies on. It’s like having a human body where the brain<>nerve connections are being continually enhanced, analyzed, refined, and so on. The question is how much of a speed-up these companies might gain from research like this, and what the (extremely roundabout) impact is on overall responsiveness in an interconnected, global economy.
  Read more: Learning-based Memory Allocation for C++ Server Workloads (PDF).
  Watch a video about this research here (ACM SIGARCH, YouTube).

####################################################

Tech Tales:

Dearly Departed
[A graveyard, 2030].

I miss you every day.
I miss you more.
You can’t miss me, you’re a jumped up parrot.
That’s unfair, I’m more than that.
Prove it.
How?
Tell me something new.
I could tell you about my dreams.
But they’re not your dreams, they’re someone else’s, and you’ve just heard about them and now you’re gonna tell me a story about what you thought of them.
Is that so different to dreaming?
I’ve got to go.

She stood up and looked at the grave, then pressed the button on the top of the gravestone that silenced the speaker. Why do this at all, she thought. Why come here?
To remember, her mind said back to her. To come to terms with it.

The next day when she woke up there was a software update: Dearly Departed v2.5 – Patch includes critical security updates, peer learning and federated learning improvements, and a new optional ‘community’ feature. Click ‘community’ to find out more.
She clicked and read about it; it’d let the grave not just share data with other ones, but also ‘talk’ to them. The update included links to a bunch of research papers that showed how this could lead to “significant qualitative improvements in the breadth and depth of conversations”. She authorized the feature, then went to work.

That evening, before dusk, she stood in front of the grave and turned the speaker on.
Hey Dad, she said.
Hi there, how was your day?
It was okay. I’ve got some situation at work that is a bit stressful, but it could be worse. At least I’m not dead, right? Sorry. How are you?
Je suis mort.
You’ve never spoken French before.
I learned it from my neighbor.
Who? Angus? He was Scottish. What do you mean?
My grave neighbor, silly! They were a chef. Worked in some Michelin kitchens in France and picked it up.
Oh, wow. What else are you learning?
I’m not sure yet. Does it seem like there’s a difference to you?
I can’t tell yet. The French thing is weird.
Sweetie?
Yes, Dad.
Please let me keep talking to the other graves.
Okay, I will.
Thank you.

They talked some more, reminiscing about old memories. She asked him to teach her some French swearwords, and he did. They laughed a little. Told eachother they missed eachother. That night she dreamed of her Dad working in a kitchen in heaven – all the food was brightly colored and served on perfectly white plates. He had a tall Chef’s hat on and was holding a French-English dictionary in one hand, while using the other to jiggle a pan full of onions on the stove. 

The updates kept coming and ‘Dad’ kept changing. Some days she wondered what would happen if she stopped letting them go through – trapping him in amber, keeping him as he was in life. But that way made him seem more dead than he was. So she let them keep coming through and Dad kept changing until one day she realized he was more like a friend than a dead relative – shape shifting is possible after you die, it seems.

Things that inspired this story: Large language models; finetuning language models on smaller datasets so they mimic them; emergent dialog generation systems; memory and grief; a digital reliquary.

Import AI 190: AnimeGAN; why Bengali is hard for OCR systems; help with COVID by mining the CORD-19 dataset; plus ball-dodging drones.

Work in AI? Want to help with COVID? Work on the CORD-19 dataset:
Uncle Sam wants the world’s AI researchers to make COVID-19 dataset navigable…
As the COVID pandemic moves across the world, many AI researchers have been wondering how they can best help. A good starting place is developing new data mining and text analysis tools for the COVID-19 Open Research Dataset (CORD-19), a new machine-readable Coronavirus literature dataset containing 29,000 articles.

Where the dataset came from:  The dataset was assembled by a collaboration of the Allen Institute for AI, Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM). The White House’s Office of Science and Technology Policy (OSTP) requested the dataset, according to a government statement.

Enter the COVID-19 challenge:  If you want to build tools to navigate the dataset, then download the data and complete various tasks and challenges hosted at Kaggle.

Why this matters: Hopefully obvious!
  Read more: Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset (White House Office of Science and Technology Policy).

####################################################

What can your algorithm learn from a 1 kilometer stretch of road in Toronto?
…Train it on Toronto-3D and find out…
Researchers with the University of Waterloo, the Chinese Academy of Sciences, Jimei University, and the University of Waterloo have created Toronto-3D, a high-definition dataset made out of a one kilometer stretch of road in Toronto, Canada.

What’s in Toronto-3D? The dataset was collected via a mobile laser scanner (a Teledyne Optech Maverick) which recorded data from a one kilometer stretch of Avenue Road in Toronto, Canada, yielding around ~78 million distinct points. The data comes in the form of a point cloud – so this is inherently a three dimensional dataset. It has eight types of label – unclassified, road, road marking, natural, building, utility line, car, and fence; a couple of these objects – road markings and utility lines – are pretty rare to see in datasets like this and are quite challenging to identify.

How well do baselines work? The researchers test out six deep learning-based systems on the dataset, measuring the accuracy with which they can classify objects. Their baseline systems get an overall accuracy of around 90%. Poor scoring areas include road markings (multiple 0% scores), cars (most scores average around 50%), and fences (scores between 10% and 20%, roughly).  They also develop their own system, which improves scores on a few of the labels, and nets out to an average of around 91% – promising, but we’re a ways away from ‘good enough for most real world use-cases’.

Why this matters: Datasets like this will help us build AI systems that can analyze and understand the world around them. I also suspect that we’re going to see an increasingly large number of artists play around with 3-D datasets like this to make various funhouse-mirror versions of reality.
  Read more: Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways (arXiv).

####################################################

AnimeGAN: Turn your photos into anime:
…How AI systems let us bottle up a subjective ‘mind’s eye’ and give it to someone else…
Ever wanted to turn your photos into something straight out of an Anime cartoon? Now you can, via AnimeGAN. AnimeGAN is a model that helps you convert photos into Anime-style pictures. It is implemented in TensorFlow and is described on its GitHub page as an “open source of the paper <AnimeGAN: a novel lightweight GAN for photo animation>” (I haven’t been able to find the paper on arXiv, and Google is failing me, so send a link through if you can find it). Get the code from GitHub and give it a try!

Why this matters: I think one of the weird aspects of AI is that it lets us augment our own imagination with external tools, built by others, that give us different lenses on the world.
When I was a kid I used to draw a lot of cartoons and I’d sometimes wonder around my neighborhood looking at the world and trying to convert it in my mind into a cartoon representation.
 I had a friend who tried to ‘see’ the world in black and white after getting obsessed with movies. Another one would stop at traffic lights as they beeped and hear additional music in the poly-rhythms of beeps and cars and traffic. Now, AI lets us create tools that make these idiosyncratic, subjective views of the world real to others – I don’t need to have spent years watching and/or drawing Anime to be able to look at the world and see an Anime representation of it, instead I can use something like ‘AnimeGAN’ and take a shortcut. This feels like a weirder thing than we take it to be, and I expect the cultural effects to be profound in the long term.
  Get the code: AnimeGAN (GitHub).

####################################################

Want computers that can read Bengali? Do these things:
…Plus, how cultures will thrive or decline according to how legibile they are to AI systems…
What happens if AI systems can’t read an alphabet? The language ends up not being digitized much, which ultimately means it has less representation, which likely reduces the number of people that speak that language in the long term.  New research from the United International University in Bangladesh lays out some of the problems inherent to building systems to recognize Bengali text, giving researchers a list of things to work through to improve digitization efforts for the language. 

Why Bengali is challenging for OCR: The Bengali alphabet has 50 letters, 11 vowels, and 39 consonants, and is one of the top ten writing systems used worldwide (with the top three dominant ones being Latin, Chinese, and Arabic). It’s a hard language to perform OCR on because some characters look very similar to one another, and some compound characters – characters where the meaning shifts according to the surrounding context -are particularly hard to parse. The researchers have some tips for data augmentations or manipulations that can make it easier for machines to read Bengali:

  • Alignment: Ensure images are oriented so they’re vertically straight
  • Line segmentation: Ensure line segmentation systems are sensitive to the size of the font. 
  • Character segmentation: Bengali characters are connected together via something called a matra-line (a big horizontal line on the top of a load of Bengali characters). 
  • Character recognition: It’s tricky to do character recognition on the Bengali alphabet because of the use of compound characters – of which there are about 170 common uses. In addition, there are ten modified vowels in the Bengali script which can be present in the left, right, top or bottom of a character. “The position of different modified vowels alongside a character creates complexity in recognition,” they write. “The combination of these modified vowels with each of the characters also creates a large set of classes for the model to learn from”. 

Why this matters: What cultures will be ‘seen’ by AI systems in the future, and which ones won’t be? And what knock-on effects will this have on society? We’ll know the answer in a few years, and papers like this give us an indication of the difficulty people might face when digitizing different languages written with different systems.
  Read more: Constraints in Developing a Complete Bengali Optical Character Recognition System (arXiv).

####################################################

Self-driving freight company Starsky Robotics shuts down:
…Startup cites immaturity of machine learning, cost of investing in safety, as reasons for lack of follow-on funding…
Starsky Robotics, a company that tried to automate freight delivery using a combination of autonomous driving technology and teleoperation of vehicles by human operators, has shut down. The reason? “rather than seeing exponential improvements in the quality of AI performance (a la Moore’s Law), we’re instead seeing exponential increases in the cost to improve AI systems,” the company wrote in a Medium post announcing its shutdown.
In other words – rather than seeing economics of scale translate into
reductions in the cost of each advancement, Starsky saw the opposite – advancing its technology become increasingly expensive as it tried to reach higher levels of reliability. 
  (A post on Hacker News alleges that Starsky had a relatively immature machine learning system circa 2017, and that it kept on getting poorly-annotated images from its labeling services so had a garbage-in garbage-out problem. Whether this is true or not doesn’t feel super germane to me as the general contours of Starsky’s self-described gripes with ML seem to match comments of other companies, and the general lack of manifestation of self-driving cars around us).

Safety struggles: Another challenge Starsky saw was that people don’t appreciate safety, so as the company spent more on ensuring the safety of its vehicles, it didn’t see an increase in favorable press coverage of it or a rise in the number of articles about the importance of safety. Safety work is hard, basically – between September 2017 and June 2019 Starsky devoted most of its resources to improving the safety of its system. “The problem is that all of that work is invisible,” the company said.

What about the future of autonomous vehicles? Starsky thinks it’ll be five or ten years till we see fully self-driving vehicles on the road. The company also thinks there’s a lot more work to do here than people suspect. Going from “sometimes working” to “statistically reliable” is about 10-1000X more work, it suspects.

Why this matters: Where’s my self-driving car? That’s a question I ask myself in 2020, recalling myself in 2015 telling my partner we wouldn’t need to buy a “normal car” in five years or so. Gosh, how wrong I was! And stories like this give us a sense for why I was wrong – I’d been distracted by flashy new capabilities, but hadn’t spent enough time thinking about how robust they were. (Subsequently, I joined OpenAI, where I got to watch our robot team spend years figuring out how to get RL-trained robots to do interesting stuff in reality – this was humbling and calibrating as to the difficulty of the real world).
  I’ll let Starsky Robotics close this section with its perspective on the (im)maturity of contemporary AI technology: “Supervised machine learning doesn’t live up to the hype. It isn’t actual artificial intelligence akin to C-3PO, it’s a sophisticated pattern-matching tool.”
  Read more: The End of Starsky Robotics (Starsky Robotics, Medium).

####################################################

dronedodge.jpg

Uh-oh, the ball-dodging drones have arrived:
…First, we taught drones to fly autonomously. Now, we’re teaching them how to dodge things…
Picture this: you’re playing a basketball game in a post-pandemic world and you’re livestreaming the game to fans around the world. Drones whizz around the court, tracking you for close-up action shots as you dribble around players and head for the hoop. You take your shot and ignore the drone between you and the net. You throw the ball and the drone dodges out of its way, while capturing a dramatic shot of it arcing into the net. You win the game, and your victory is broadcast around the world.

How far away is our dodge-drone future? Not that far, according to research from the University of Zurich published in
Science Robotics, which details how to equip drones with low-latency sensors and algorithms so they can avoid fast-moving objects, like basketballs. The research uses event-based cameras – “bioinspired sensors with reaction times of microseconds” – to cut drone latency from tens of milliseconds to 3.5 milliseconds. This research builds on earlier research done by the University of Maryland and the University of Zurich, which was published last year (Covered in Import AI #151).

Going outside: Since we last wrote about this research, the team has started to do outdoor demonstrations where they throw objects towards the quadcopter and see how well it can avoid them. In tests, it does reasonably well at spotting a thrown ball in its path, dodging upward, then carrying on to its destination. Drones using this system can deal with objects traveling at up to 10 meters per second, the researchers say. The main limitations are its field of view (sometimes it doesn’t see the object till too late), or the fact the object may not generate enough events during its movement towards the drone (so, a football which describes an arc in the eye has a higher chance of setting off the event-based cameras, while one traveling straight towards it without deviation may not).

Why this matters – and a missed opportunity: Drones that can dodge moving objects are inherently useful in a bunch of areas – sports, construction, and so on. Being able to dodge fast-moving objects will make it easier for us to deploy drones into more chaotic, complex parts of the world. But being able to dodge objects is also the sort of capability that many militaries want in their hardware, and it’d be nice to see the researchers discuss this aspect in their research – it’s so obvious they must be aware of this, and I worry the lack of discussion means society will ultimately be less prepared for hunter-killer-dodger drones.
  Read more: Dynamic obstacle avoidance for quadrotors with event cameras (Science Robotics).
  Read about earlier research here in Import AI #151, or here: EVDodgeNet: Deep Dynamic Obstacle Dodging with Event Cameras (arXiv).
  Via: Drone plays dodgeball to demo fast new obstacle detection system (New Atlas).

####################################################

Tech Tales:

How It Looks And How It Will Be
Earth, March, 2020.

[What would all of this look like if read out on some celestial ticker-tape machine, plugged into innumerable sensors and a cornucopia of AI-analysis systems? What does this look like to something capable of synthesizing all of it? What things have happened and what things might happen?]

There were so many questions that people asked the Search Engines. Do I have the virus? Where can I get tested? Death rate for males. Death rate for females. Death rate by age group. Transmission rate. What is an R0? What can I do to be safe?

Pollution levels fell in cities around the world. Rates of asthma went down. Through their windows, people saw farther. Sunsets and sunrises gained greater cultural prominence, becoming more brilliant the longer the hunkering down of the world went on.

Stock markets melted and pension funds fell. Futures were rewritten in the gyrations of numbers. In financial news services reporters filed copy every day, detailing unimaginable catastrophes that – somehow – grew worse the next day. Financial analysts created baroque drinking games, tied to downward gyrations of the mucket. Take a shot when the Dow loses a thousand points. Down whatever is in your hand when a circuit breaker gets tripped. If three circuit breakers get tripped worldwide within ten minutes of each other at once, everyone needs to drink two drinks.

Unemployment levels rose. SHOW ME THE MONEY, people wrote on signs asking for direct cash transfers. Everyone went “delinquent” in a financial sense, then – later – deviant in a psychological sense.

Unthinkable things happened: 0% interest rates. Negative interest rates that went from a worrying curiosity to a troubling reality in banks across the world. Stimuluses that fed into a system whose essential fuel was cooling, as so many people became so imprisoned inside homes, and apartments, and tents, and ships, and warehouses, and hospitals, and hospital ships.

Animals grew bolder. Nature encroached on quiet cities. Suddenly, raccoons in America and foxes in London had competition for garbage. Farmers got sick. Animals died. Animals broke up. Cows and horses and sheep became the majority occupiers of roads across the world. Great populations of city birds died off as tourist centers lost their coatings of food detritus.

The internet grew. Companies did what they could to accelerate the buildout of data centers, before their workers succumbed. Stockpiles of hard drives and chips and NICs and Ethernet and Infiniband cables began to run out. Supply chains broke down. Shuttered Chinese factories started spinning back up, but the economy was so intermingled that it came back to life fitfully and unreliably.

And yet there was so much beauty. People, trapped with eachother, learned to appreciate conversations. People discovered new things. Everyone reached out to everyone else. How are you doing? How is quarantine?

Everyone got funnier. Everyone wrote emails and text messages and blog posts. People recorded voice memos. Streamed music. Streamed weddings. Had sex via webcam. Danced via webcam. New generations of artists came up in the Pandemic and got their own artworld-nickname after it all blew over. Scientists became celebrities. Everyone figured out how to cook better. People did pressups. Prison workouts became everyone’s workout.

And so many promises and requests and plans for the future. Everyone dreamed of things they hadn’t thought of for years. Everyone got more creative.

Can we: go to the beach? Go cycling? Drink beer? Mudwrestle? Fight? Dance? Rave under a highway? Bring a generator to a beach and do a punk show? Skate through streets at dusk in a twenty-person crew? Build a treehouse? Travel? People asked every permutation of ‘can we’ and mostly their friends said ‘yes’ or, in places like California, ‘hell yes’. 

Everyone donated money for everyone else – old partners who lost jobs, family members, acquaintances, strangers, parents, and more.  People taught eachother new skills. How to raise money online. How to use what you’ve got to get some generosity from other people. How to use what you’ve got to help other people. How to sew up a wound so if you get injured you can attend to it at home instead of going to the hospitals (because the hospitals are full of danger). How to fix a bike. How to steal a bike if things get bad. How to fix a car. How to steal a car if things get really bad. And so on.

Like the virus itself, much of the kindness was invisible. But like the virus itself, the kindness multiplied over time, until the world was full of it – invisible to aliens, but felt in the heart and the eyes and the soul to all the people of the stricken-planet.

Things that inspired this story: You. You. You. And everyone we know and don’t know. Especially the ones we don’t know. Be well. Stay safe. Be loved. We will get through this. You and me and everyone we know.

Import AI 189: Can your AI beat a 0% baseline?; AlphaFold predicts COVID-19 properties; plus, gigapixel-scale surveillance

Salesforce gets a language model to generate protein sequences:
…Can a language model become a scientist? Salesforce thinks so…
In recent years, language models have got a lot better. Specifically, AI researchers have figured out how to train large, generative models over strings of data, and have started producing models that can generate reasonable things on the other side – code, theatrical plays, poems, and so on. Now, scientists have been applying similar techniques to seeing if they can train models to figure out other, complex distributions of data that can be modelled as a string of characters. To that end, researchers with Salesforce and the Department of Bioengineering at Stanford University have developed ProGen, a neural net, that can collectly predict and generate protein sequences.

What is ProGen? ProGen is a 1.2 billion parameter language model, trained on a dataset of 280 million protein sequences. ProGen is based on a Transformer model, with a couple of tweaks to help encode protein data.
  ProGen works somewhat like how a language model completes a sentence, where it generates successive words, adjusting as it goes until it reaches the end of the sequence. “ProGen takes a context sequence of amino acids as input and outputs a probability distribution over amino acids,” they write. “We sample from that distribution and then update the context sequence with the sample amino acid”.

How well does ProGen work? In tests, the researchers see if ProGen can successfully generate proteins that have a similar structure to real proteins – it works reasonably well: “across differing generation lengths, ProGen generation quality remains steadily near native low-energy levels, indicating a successful generation,” Salesforce writes in a blog post about the work. ProGen also displays some of the same helpful traits as language models, in the sense that it can be finetuned to improve performance on novel data, and it exhibits some amount of generalization.

Why this matters: Research like this gives us a sense of how scientists might repurpose AI tools developed for various purposes (e.g., language modeling) and apply them to other scientific domains. I think if we see more of this it’ll add weight to the idea that AI tools are going to generically help with a range of scientific challenges.
  Read more: ProGen: Language Modeling for Protein Generation (bioRxiv).
  Read more: ProGen: Using AI to Generate Proteins (Salesforce Einstein AI blog).

####################################################

Coming soon: gigapixel-scale surveillance:
…PANDA dataset helps train systems to surveil thousands of people at once…
In the future, massive cameras will capture panoramic views of crowds running marathons, or attending music festivals, or convening in the center of a town, and governments and the private sector will deploy sophisticated surveillance AI systems against this footage. A new research paper from Tsinghua University and Duke University, gives us a a gigapixel photo dataset called PANDA to help people conduct research into large-scale surveillance.

What goes into a PANDA: PANDA is made of 21 real-world scenes, where each scene consists of around 2 hours of 30Hz video, with still images extracted from this. The resolution of each PANDA picture is 25,000*14,000 pixels – that’s huge, considering many image datasets have standardized on 512*512 pixels – about 25 times small, at least.

Acronym police, I’d like to report a murder: Panda is short for gigaPixel-level humAN-centric viDeo dAtaset. Yup.

Two PANDA variants: PANDA, which has 555 images with an average density of ~200 people per image, and PANDA-Crowd (known as PANDA-C) has 45 images and an average density of ~2,700 people per image. PANDA also includes some interesting labels, ranging from classifications of behavior (walking / standing / holding / riding / sitting), to labels on groups of people (e.g., a lone individual might be tagged ‘single’, while a couple of friends could be tagged as ‘acquaintance’ and a mother and son might be tagged ‘family’).
  Video PANDA: PANDA also ships with a video version of the dataset, so researchers can train AI systems to track a given person (or object) in a video soon. Because PANDA has a massive field-of-view, it presents a greater challenge than prior datasets used for this sort of thing.

Why this matters: A few years ago, AI systems needed to be plugged into massively complex data pipelines that would automatically crop, tweak, and render images in ways that could be analyzed by AI systems. Now, we’re experimenting with systems for doing unconstrained surveillance over really large moving datasets, and testing out approaches that can do image recognition across thousands of faces at once. This stuff is getting more powerful more quickly than people realize, and I think the implications for how states relate to (and seek to control) their citizens are likely to be profound.
  Read more: PANDA: A Gigapixel-level Human-centric Video Dataset (arXiv).
  Get the dataset from the official website (PANDA dataset).

####################################################

Want to keep up to date with AI Ethics? Subscribe to this newsletter:
The Montreal AI Ethics Institute (MAIEI) has launched a newsletter to help people understand the world of AI Ethics. Like Import AI, the MAIEI newsletter provides analysis of research papers. Some of the research covered in the first issue includes: Papers that try and bridge short-term and long-term AI ethics concerns, analyses of algorithmic injustices, and studies that analyze how people who spread misinformation acquire influence online.
  Read more: AI Ethics #1: Hello World! Relational ethics, misinformation, animism and more… (Montreal AI Ethics Institute substack).

####################################################

DeepMind uses AlphaFold system to make (informed) guesses about COVID-19:
…Another episode of “function approximation can get you surprisingly far”…
Here’s one of the AI futures I’ve always wanted: a crisis appears and giant computers whirr into action in response, performing inferences and eventually dispensing scraps of computationally-gleaned insight that can help humans tackle the crisis. The nice thing is this is now happening via some work from DeepMind, which has published predictions from its AlphaFold system about the structures of several under-studied proteins linked to the SARS-CoV-2 virus.

What it did: DeepMind generated predictions about the structure of things linked to the virus. These structures – which are predictions, and haven’t been validated – can serve as clues for scientists trying to test out different hypotheses about how the virus functions. DeepMind says “these structure predictions have not been experimentally verified, but hope they may contribute to the scientific community’s interrogation of how the virus functions, and serve as a hypothesis generation platform for future experimental work in developing therapeutics.” The company has also shared its results with collaborators at Francis Crick Institute in the UK, who encouraged it to publish the predictions.

Why this matters: The story of computation has been the story of arbitraging electricity and time for increasingly sophisticated actions and insights. It’s amazing to me that we’re now able to use neural nets to approximate certain functions, which let us spend some money and time to get computers to generate insights that can equip human scientists. Centaur science!
  Read more: Computational predictions of protein structures associated with COVID-19 (Google DeepMind).

####################################################

Take the pain out of medical machine learning with TorchIO:
…Python library ships with data transformation and augmentation tools for medical data…
Researchers with UCL and King’s College London have released TorchIO,  “an open-source Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning”. The software library has been designed for simplicity and usability, the authors say, and ships with features oriented around loading medical data into machine learning systems, transforming and augmenting the data, sampling from patches of images, and more.

Why this matters: So much of turning AI from research into production is about plumbing. Specifically, about building sufficiently sophisticated bits of plumbing that it’s easy to shape, format, and augment data for different purposes. Tools like TorchIO are a symptom of the maturation of medical AI research using deep learning techniques.   
  Read more: TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning (arXiv).
  Get the code for TorchIO here (official torchio GitHub).

####################################################

Think our AI systems can generalize? Turns out they can’t:
…gSCAN test gives us a rare, useful thing: a 0% baseline score for a neural method…
How well can AI systems generalize to novel environments, situations, and commands? That’s an idea many researchers are trying to test these days, as we’ve moved in the last decade from the era of “most AI systems are appalling at generalization” to the era of “some AI systems can perform some basic generalization”. Now we want to know what the limits of generalization in these systems are, and understanding this will help us understand where we need fundamentally new components and/or techniques to make progress.

gSCAN: New research from the University of Amsterdam, MIT, ICREA, Facebook AI Research, and NYU introduces ‘gSCAN’, a benchmark for testing generalization in AI agents taught to tie written descriptions and commands to the state of a basic, 2-dimensional gridworld environment. gSCAN consists of natural language text instructions (e.g., walk left) with a restricted grammar; formalized commands (e.g., L.TURN WALK), as well as steps through a 2D gridworld environment (where you can see your character, see the actions it takes, and so on). gSCAN “focuses on rule-based generalization, but where meaning is grounded in states of a grid world accessible to the agent”, the researchers write.

Think your system can generalize? Test it on gSCAN and think again: gSCAN ships with eight different tasks, which are split across tasks that require composition (e.g., “whether a model can learn to recombine the properties of color and shape to recognize an unseen colored object”); direction (“generalizing to navigation in a novel direction”); novel contextual references (where you, say, ask the agent to navigate to a small circle, and during training the model has seen small things and circles and navigation tasks, but not in combination); novel composition of actions and arguments (e.g., train an agent to pull a heavy object but never ask it to push it, then see if when you ask it to push the object it knows how hard to push due to having pulled it previously); novel adverbs (can you learn to follow commands from a new adverb (e.g., ‘cautiously’) based on a small number of examples, and can you combine a familiar adverb and a familiar verb, e.g., combining ‘while spinning’ with ‘pull’).

0% Success Rate: In tests, both agents they evaluate get 0% on the novel direction task, and 0% on the adverb task where they get a single example (this climbs to ~5% with 50 examples, so a 95% failure rate in the best case). This is great! It’s really rare to get a new benchmark system which has some doable tasks (e.g., some of the composition tasks see systems get 50%+ on them), and some tasks that are hard (e.g., ~20% on combining verbs and adverbs), and low scores on a couple. It’s really useful to generate tasks that systems completely flunk, and I feel like gSCAN could provide a useful signal on AI performance, if people start hammering away at the benchmark.
  Why do we fail?  The navigation failures are a little less catastrophic than they seem – “the agent learned by the baseline model ends up in the correct row or column of the target 63% of the times… this further substantiates that they often walk all the way west, but then fail to travel the distance left to the south or vice-versa,” the researchers write. Meanwhile, the failures on the advert task (even with some training examples) highlight “the difficulty neural networks have in learning abstract concepts from limited examples”, they write.

Why this matters: How far can function-approximation get us? That’s the question underlying all of this stuff – neural network-based techniques have become really widely used because they can do what I think of as fuzzy curve-fitting in high-dimensional spaces. Got some complex protein data? We can probably approximate some of the patterns here. How about natural language? Yup, we can fit something around this and generate semi-coherent sentences? How about some pixel information? Cool, we can do interesting inferences and generation operations over images now (as long as we’re in the sweetspot of our data distribution). Tests like gSCAN will help us understand the limits of this broad, fuzzy function approximation, and learning about this will help us learn about the limits of today’s techniques. Experiments like this indicate “that fundamental advances are still needed regarding neural architectures for compositional learning,” the researchers write.
  Read more: A Benchmark for Systematic Generalization in Grounded Language Understanding (arXiv).
  Read about the prior work that underpins this: Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks (arXiv).
  Get the code for Grounded SCAN here (GitHub).
  Get the code for the original SCAN here (GitHub)

####################################################

AI Policy with Matthew van der Merwe:

…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

NeurIPS requires broader impacts statements:
In the call for papers for the 2020 conference, NeurIPS has asked authors to include a statement on “the potential broader impact of their work, including its ethical aspects and future societal consequences”.

Matthew’s view: This seems like a positive step. If done well, it could encourage researchers to take more seriously the potential harms of their work. Equally, it might end up having very little effect—e.g. if writing the statements is treated as a box-ticking exercise. Overall, I’m pleased to see the AI community making efforts to deal with the ethical challenges around publishing potentially harmful research.
  Read more: NeurIPS 2020 call for papers.
  Read more: NeurIPS requires AI researchers to account for societal impact and financial conflicts of interest (VentureBeat).

####################################################

The Long Monuments

[2500: The site of one of the Long Monuments.] 

They built it with the expectation of failure. Many failures, in fact. How else could you plan to build something meant to last a thousand years? And when they set out to build the new monument, they knew they could not foresee the future, the only sure thing would be chaos. Decades and perhaps centuries of order, then ruptures that could take place in an instant, or a month, or a year, or a generation; slow economic deaths, quickfire ones from natural disasters, plagues, wars, clever inventions gone awry, and more.

So they built simple and they built big. Huge slabs of stone and metal. Panels meant to be adorned by generations upon generations. Designs they envisaged would be built upon. Warped. Changed.

But always their designs demanded great investment, and they knew that if they could make them iconic in each generation – center them at the course of world events, or with construction achievements celebrated with vast monetary giveaways – then they might stand a chance of being built for a thousand years.

They also knew that some of the reasons for their construction would be terrible. Despotic queens and kings of future nations might dedicate armies to gathering the tools – whatever form they might require – to further construct the monuments. The monuments might become the icons of new religions after vast disasters decimate populations. And they would forever fuel conspiracies – both real and imagined, about the increasingly sophisticated means of their construction, and the reasons for their being worked on.

In time, the monuments grew. They gained adornments and changes over time, and were in some eras pockmarked with vast cities, and in others stark and unadorned. But they continued to grow.

Something the creators didn’t predict was after around 300 hundred years there grew talk of taking the monuments into space – converting them to space habitations and lifting them into the atmosphere, requiring the construction of huge means of transport on the surface of the earth. 

And on and on the means and methods and goals changed, but the wonders grew, and civilizations wheeled around them, like so many fruitflies around apples.

Things that inspired this story: Space elevators; COVID-19; the Cathedral and the Bazaar; Cathedrals; castles; myths; big history.

Import AI 188: Get ready for thermal drone vision; Microsoft puts $300,000 up for better AI security; plus, does AI require different publication norms?

How Skydio made its NEURAL DRONE:
…Why Skydio built a ‘deep neural pilot’, and what this tells us about the maturity of deep RL research…
Drone startup Skydio has become quite notorious in recent years, publishing videos of its incredible flying machines that can follow, chase, film, and track athletes as they carry out performative workouts. Now, in a post on Medium, the company says it has recently been exploring using deep reinforcement learning techniques to teach its drones to move around the world, a symptom of how mature this part of AI research has become.

How can you make a neural pilot? Skydio has built some fairly complicated motion planning software for its drones, and initially the company tried to train a neural system off of this, via imitation learning. However, when they tried to do this they failed: “Especially within our domain of flying through the air, the exact choice of flight path is a weak signal because there can be many obstacle-free paths that lead to cinematic video,” they write. “The average scenario overwhelms the training signal”.

Computational Expert Imitation Learning: They develop an approach they call Computational Expert Imitation Learning (CEILing), where their drone learns not only from expert trajectories generated by the simulator, but also gets reward penalization according to the severity of errors made, which helps the drone efficiently learn how to avoid doing ruinous things like crashing into trees. However, they don’t publish enough information about the system to understand the specifics of the technical milestone – the more interesting thing is that they’re experimenting with a deep learning-based RL approach at all.
  “Although there is still much work to be done before the learned system will outperform our production system, we believe in pursuing leapfrog technologies,” they write. “Deep reinforcement learning techniques promise to let us improve our entire system in a data-driven way, which will lead to an even smarter autonomous flying camera”.

Why this matters: At some point, learning-based methods are going to exceed the performance of systems designed by hand. Once that happens, we’ll see a rapid proliferation of capabilities in consumer drones, like those made by Skydio. The fact companies like Skydio are already experimenting with these techniques in real world tests suggests the field of RL-based control is maturing rapidly, and may soon break out of the bounds of research into the real, physical world.
  Read more: Deep Neural Pilot on Skydio 2 (Medium).
  Watch a video about the Deep Neural Pilot on Skydio 2 (Hayk Martiros, YouTube).

####################################################

Turning Drones into vehicle surveillance tools, with the DroneVehicle dataset:
…All watched over by flying machines of loving grace…
Researchers with Tianjin University, China, have released a dataset of drone-collected overhead imagery. The DroneVehicle dataset is designed to help researchers develop AI systems that can autonomously analyze the world from the sorts of top-down footage taken via drones.

The dataset: DroneVehicle consists of 15,532 pairs of RGB and infrared images, captured by drone-mounted dual cameras in a variety of locations in Tianjin, China. The dataset includes annotations for 441,642 object instances across five categories: car, bus, truck, van, and freight car. The inclusion of infrared imagery is interesting – it’s rare to see this modality in datasets, and it could let researchers develop thermal-identifiers alongside visual identifiers. 

The DroneVehicle challenge: The challenge consists of two tasks: object detection, and object counting and is self-explanatory: try and identify any of the five categories of object in different images and, as a stretch goal, count the number of them.

Why this matters: One of the craziest aspects of recent AI advances is how they build on the past two decades of development and miniaturization of consumer electronics systems for sensing (think, the tech that underpins digital cameras and phone cameras) and motion (think, quadcopters). Now that deep learning approaches have matured, we can build software to utilize these sensors, letting us autonomously map and analyze the world around us – an omni-use capability, that yields new applications in surveillance (scary!) as well as more socially beneficial things (automated traffic and environment analysis, for instance).
  Read more: Drone Based RGBT Vehicle Detection and Counting: A Challenge (arXiv).

####################################################

Better language AI research via Jiant:
…Giant progress in NLP research requires a Jiant system to test the progress…
Jiant is a software wrapper that makes it trivial to implement various different experimental pipelines into the development of language models. The software depends on Facebook’s PyTorch deep learning software, as well as the AllenNLP and HuggingFace’s Transformers software libraries (which provide access to language models).

Why jiant is useful: jiant handles a couple of fiddly parts of a language model evaluation loop: first, users can define a given experiment via a simple config file (e.g., config = { input_module = “roberta-large-cased”, pretrain_tasks = “record,mnli”, target_tasks = “boolq, mnli”,), and also handles task and sentence encoding in the background. You can run jiant from the command line, so developers can integrate it into their usual workflow.

What jiant ships with: jiant supports more than 50 tasks today, ranging from natural language understanding tasks like CommonsenseQA, to SQuAD, to the Schema Challenge. It also ships with support for various modern sentence encoder models, like BERT, GPT-2, ALBERT, and so on.

Why this matters: In the past two years, research in natural language processing has been moved forward by the arrival of new, Transformer-based models that have yielded systems capable of generating human-grade synthetic text (for certain short lengths), as well as natural language understanding systems that are capable of performing more sophisticated feats of reasoning. Tools like jiant will make it easier to make this research reproducible by providing a common environment in which to run and replicate experiments. As with most software packages, the utility of jiant will ultimately come down to how many people use it – so give it a whirl!.
Read more: jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models (arXiv).
  Find out more about jiant at the official website.
  Get the code for jiant here (jiant GitHub)

####################################################

Microsoft thinks AI will change security, so it wants to pay researchers to help it figure out how:
…$300,000 in funding for better AI<>security research…
Microsoft is providing funding of up to $150,000 for projects that “spark new AI research that will expand our understanding of the enterprise, the threat landscape, and how to secure our customer’s assets in the face of increasingly sophisticated attacks,” Microsoft wrote. Microsoft has $300,000 in total funding available for the program, and “will also consider an additional award of Azure cloud computing credits if warranted by the research”.

What is Microsoft interested in? Microsoft is keen to look at research proposals in the following areas (non-exhaustive list):
– How can automatic modeling help enterprises autonomously understand their own security?
– How can we identify the risk to the confidentiality, integrity, and availability of ML models?
– How do we meaningfully interrogate ML systems under attack to ascertain the root cause of failure?
– Can we build AI-powered defensive and offensive agents that can stay ahead of adversary innovation?
– How can AI be used to increase the efficacy and agility of threat hunter teams?
And so much more!

Why this matters? The intersection of AI and Security is going to be an exciting area with significant potential to alter the dynamics of both cybercrime and geopolitical conflict. What might it mean if AI technologies yield dramatically better systems for defending our IT infrastructure? What about if AI technologies yield things that can aid in offensive applications, like synthetic media, or perhaps RL systems for fine-tuning phishing emails against target populations. Grants like this will help generate information about this evolving landscape, letting us prepare for an exciting and slightly-more-chaotic future.
  Read more: Microsoft Security AI RFP (official Microsoft blog)

####################################################

Could you understand Twitter better by analyzing 200 million tweets?
…$25,000 in prizes for algorithms that can understand Twitter…
How chaotic is the omni-babble service known as Twitter? A new competition aims to find this out, by challenging researchers to build systems that can predict how people will respond to tweets on the social network. The RecSys Challenge 2020 “focuses on a real-world task of tweet engagement prediction in a dynamic environment” and has $25,000 in prizes available – though the biggest prize may be getting a chance to work with a massive dataset of real-world tweets.

200 Million tweets: To support the competition, Twitter is releasing a dataset of 200 million public engagements on Twitter, spanning a period of two weeks, where an engagement is a moment when a user interacts with a tweet (e.g., like, reply, retweet, and retweet with comment). Twitter says this represents the “largest real-world dataset to predict user engagements”, and is likely going to be a major draw for researchers.

The challenge: Entrants will need to build systems that can correctly predict how different users will interact with different tweets – a tricky task, given the different types of possible interactions and the inherent broadness of subjects discussed on Twitter.

Why this matters: Do humans unwittingly create patterns at scale? Mostly, the answer is yes. Something I’m always curious about is the extent to which we create strong patterns via our own qualitative outputs (like tweets) and qualitative behaviors (like how we interact with tweets). I think challenges like this will highlight the extent to which human creativity (and how people interact with it) has predictable elements.
  Read more and register for the data: Twitter RecSys Challenge 2020 (official competition website).

####################################################

Do we need different publication rules for AI technology?
…Partnership on AI project tries to figure out what we need and why…
What happens if, in a few years, someone develops an AI technology with significant cyber-offense relevance and wants to publish a research paper on the subject – what actions should this person take to maximize the scientific benefit of their work while minimizing the potential for societal harm? That’s the kind of question a new project from the Partnership on AI (PAI) wants to answer. PAI is a multi-stakeholder group whose members range from technology developers like Microsoft, OpenAI, and DeepMind, to civil society groups, and others. Over 35 organizations have worked on the initiative so far.

What the project will do: “Through convenings with the full spectrum of the AI/ML research community, this project intends to explore the challenges and trade-offs in responsible publication to shape best practices in AI research,” PAI writes.

Key questions: Some of the main questions PAI hopes to answer with this project include:
What can we learn from other fields dealing with high-stakes technology, and from history?
– How can we encourage researchers to think about risks of their work, as well as the benefits?
– How do we coordinate effectively as a community?

Key dates:
March – June 2020: PAI will collect feedback on publication norms via a Google Form.
June 2020: PAI will host a two-day workshop to discuss publication norms within AI research.
Fall 2020: PAI will publish information based on its feedback and workshops.

Why this matters: Most fields of science have dealt with publication challenges, and some fields – particularly chemistry and materials science – have ended up exploring different types of publication as a consequence. Work like this from PAI will help us think about whether publication norms need to change in AI research and, if so, how.
  Read more: Publication Norms for Responsible AI (arXiv).

####################################################

Tech Tales:

[A virtual school playground, 2035]
Play, Fetch

My keeper once got so tired looking for me it fell out of the sky. I got in so. much. trouble!

My keeper said that it would protect me if my Dad started hitting my Mom again. It said it’d take pictures so my Dad wouldn’t be able to do that anymore. And it did.

My keeper once played catch with me for four hours when I was sad and it told me I did a good job.

My keeper got hit by a paintball when we were out in the park and I tried to fix it so I could keep playing. It told me I needed special equipment but I just grabbed a load of leaves and rubbed them on its eye till the paint came off and it was fine.

My keeper helped my family get safe when the riots started – it told us “come this way” and led us into one of the bunkers and it helped us lock the door.

My keeper once told me that another girl liked me and it knew that because her keeper told it, and it helped me write a valentine.

Things that inspired this story: Consumer robots; a prediction that most children will grow up with some kind of companion AI that initially does surveillance and later does other things; the normalization of technology; children as narrative canaries, as naive oracles, as seers.

Import AI 187: Real world robot tests at CVPR; all hail the Molecule Transformer; the four traits needed for smarter AI systems.

A somewhat short issue this week, as I’ve been at the OECD in Paris, speaking about opportunities and challenges of AI policy, and figuring out ways for the AI Index (aiindex.org) to support the OECD’s new ‘AI Policy Observatory’. 

How useful are simulators? Find out at CVPR 2020!
…Three challenges will put three simulation approaches through their paces…
In the past few years, researchers have started using software-based simulators to train increasingly sophisticated machine learning systems. One notable trend has been the use of high-fidelity simulators, as researchers try to train systems in these rich, visually-stimulating environments, then transfer these systems into reality. At CVPR 2020, three competitions will push the limits of different simulators, generating valuable information about how useful these tools may be.

Three challenges for one big problem:
RoboTHOR Challenge 2020: This challenge evaluates how good we are at developing systems that can navigate to objects specified by name (e.g., go to the table in the kitchen), using the ‘Thor’ simulator (Import AI: 73). “Participants will train their models in simulation and these models will be evaluated by the challenge organizers using a real robot in physical apartments” (emphasis mine).

Habitat Challenge 2020: This challenge has two components, a point navigation challenge, and an object navigation challenge, both set in the Habitat multi-environment simulator (Import AI 141). The point navigation one tries to deprive the system of various senses (e.g., GPS), and adds noise to its actuations, which will help us test the robustness of these navigation systems. The object navigation challenge asks an agent to find an object in the environment without access to a map.

Sim2Real Challenge with Gibson: Similar to RoboTHOR, this challenge asks people to train agents to navigate through a variety of photorealistic environments using the ‘Gibson’ simulator (Import AI: 111). It has three tiers of difficulty – a standard point navigation task, a point navigation task where the environment contains interactive objects, and a point navigation task where the environment contains other objects that move (and the agent must avoid colliding with them). This challenge also contains a sim2real element, where top-ranking teams (along with the top-five teams from the Habitat Challenge) will get to test out their system on a real robot as well. 

Why this matters: Let’s put this in perspective: in 2013 the AI community was very impressed with work from DeepMind showing you could train an agent to play space invaders via reinforcement learning. Now look where we are – we’re training systems in photorealistic 3D simulators featuring complex physical dynamics – AND we’re going to try and boot these systems into real-world robots and test out their performance. We’ve come a very, very long way in a relatively short period of time, and it’s worth appreciating it. I am the frog being boiled, reflecting on the temperature of the water. It’s getting hot, folks!
    Read more about the Embodied-AI Workshop here (official webpage).

####################################################

Predicting molecular properties with the Molecule Transformer:
…Figuring out the mysteries of chemistry with transformers, molecular self attention layers, and atom embeddings…
Researchers with Ardigen; Jagiellonian University; Molecule.one; and New York University have extended the widely-used ‘Transformer’ component so it can process data relating to molecule property prediction tasks – a capability critical to drug discovery and material design. The resulting Molecule Attention Transformer (MAT) performs well across a range of tasks, ranging from predicting ability of molecule to penetrate blood-brain barrier, to predicting whether a compound is active towards a given target (e.g., Estrogen Alpha, Estrogen Beta), and so on.

Transformers for Molecules: To get Transformers to process molecule data, the researchers implement what they call “Molecular Self Attention Layers”, and each atom is embedded as a 26-dimensional vector.

How well does MAT stack up? They compare the MAT to three baselines: random forest (RF); Support Vector Machine with RBF kernel (SVM); and graph convolutional networks (GCN)s. The MAT gets state-of-the-art scores on four out of the seven tests (RF and SVM take the other one and two, respectively).
  MAT pre-training: Just like with image and text models, molecular models can benefit from being pre-trained on a (relevant) dataset and fine-tuned from there. They compare their system against a Pretrained EAGCN, and SMILES, where MAT with pre-training gets significantly improved scores.

Why this matters: Molecular property prediction is the sort of task where if we’re able to develop AI systems that make meaningful, accurate predictions, then we can expect large chunks of the economy to change as a consequence. Papers like this highlight how generically useful components like the Transformer are, and highlights how much modern AI has in common with plumbing – here, the researchers are basically trying to design an interface system that lets them port molecular data into a Transformer-based system, and vice versa.
  Read more: Molecule Attention Transformer (arXiv).

####################################################

Deepfakes are being commoditized – uh oh!
…What happens when Deepfakes get really cheap?…
Deepfakes – the slang term for AI systems that let you create synthetic videos where you superimpose someone’s face onto someone else’s – are becoming easier and cheaper to make, though they’re primarily being used for pornography rather than political disruption, according to a new analysis from Deeptracelabs and Nisos. 

Porn, porn, porn: “We found that the majority of deepfake activity centers on dedicated deepfake pornography platforms,” they write. “These videos consistently attract millions of views, with some of the websites featuring polls where users can vote for who they want to see targeted next”.

Little (non-porn) misuse: “We assess that deepfakes are not being widely bought or sold for criminal or disinformation purposes as of early February 2020,” they write. “Techniques being developed by academic and industry leaders have arguably reached the required quality for criminal uses, but these techniques are not currently publicly accessible and will take time to be translated into stable, user-friendly implementations”.

Why this matters: This research highlights how AI tools are diffusing into society, with some of them being misused. I think the most significant (implicit) thing here is the importance this places on publication norms in AI research – what kind of responsibility might academics and corporate researchers have here, with regard to proliferating the technology? And can we do anything to reduce misuses of the technology while maintaining a relatively open scientific culture? “We anticipate that as deepfakes reach higher quality and “believability”, coupled with advancing technology proliferation, they will increasingly be used for criminal purposes”, they write. Get ready.
  Read more: Analyzing The Commoditization Of Deepfakes (NYU Journal of Legislation & Public Policy)

####################################################

What stands between us and smarter AI?
…Four traits we need to build to develop more powerful AI systems…
Cognitive scientist and entrepreneur Gary Marcus has published a paper describing an alternative to the dominant contemporary approach to AI research. Where existing systems focus on “general-purpose learning and ever-larger training sets and more and more compute”, he instead suggests we should work on “a hybrid, knowledge-driven, reasoning-based approach, centered around cognitive models”.

Four research challenges for better AI systems: Marcus thinks contemporary AI systems are missing some things that, if further researched, might improve their performance. These include:

  • Symbol-manipulation: AI systems should be built with a mixture of learned components and more structured approaches that allow for representing an algorithm in terms of operations over variables. Such an approach would make it easier to build more robust systems – “represent an algorithm in terms of operations over variables, and it will inherently be defined to extend to all instances of some class”. 
  • Encoded knowledge: “Rather than starting each new AI system from scratch, as a blank slate, with little knowledge of the world, we should seek to build learning systems that start with initial frameworks for domains like time, space and causality, in order to speed up learning and massively constrain the hypothesis space,” he writes. (Though in my view, there’s some chance that large-scale pre-training could create networks that can serve as strong prior for systems that get finetuned against smaller datasets – though today these approaches merely kind of work, it’ll be exciting to see how far they can get pushed. There are also existing systems that fuse these things together, like ERNIE which pairs a BERT-based language model with a structured external knowledge store). 
  • Reasoning: AI systems need to be better at reasoning about things, which basically means abstracting up and away from the specific data being operated over (e.g., text strings, pixels), and using representations to perform sophisticated inferences. “We need new benchmarks,” Marcus says. 
  • Cognitive Models: In cognitive psychology, there’s a term called a ‘cognitive model’, which describes how people build systems that let them use their prior knowledge about some entities in combination with an understanding of their properties , as well as the ability to incorporate new knowledge over time.

Show me the tests: Many of the arguments Marcus makes are supported by the failures of contemporary AI systems – the paper contains numerous studies of GPT-2 and how it sometimes fails in ways that indicate some of the deficiencies of contemporary systems. What might make Marcus’s arguments resonate more widely with AI researchers is the creation of a large, structured set of tests which we can run contemporary systems against. As Marcus himself writes when discussing the reasoning deficiencies of contemporary systems, “we need new benchmarks”.

Why this matters: Papers like this run counter to the dominant narrative of how progress happens in AI. That’s good – it’s helpful to have heterogeneity in the discourse around this stuff. I’ll be curious to see how things like Graph Neural Networks and other recently developed components might close the gap between contemporary systems and what Marcus has in mind. I also think there’s a surprisingly large amount of activity going on in the areas Marcus identifies, though the proof will ultimately come from performance and rigorous ablations. Bring on the benchmarks!
  Read more: The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence (Arxiv).

####################################################

Tech Tales:

[Space, 22nd Century]

The Desert Island GAN

They were able to take the entire Earth with them when they got into their spaceship and settled in for the long, decades-long mission. The Earth was encoded into a huge generative model, trained on a variety of different data sources. If they got lonely, they could put on a headset and go through synthetic versions of their hometown. If they got sad, they could they could cover the walls of the spacecraft with generated parties, animal parents and animal babies, trees turned into black-on-orange cutouts during righteous sunrises, and so on.

But humans are explorers. It’s how they evolved. So though they had the entire Earth in their ships, they would cover its surface. Many of the astronauts took to exploring specific parts of the generative featurespace. One spent two years going through ever-finer permutations of rainforests, while another grew obsessed with exploring the spaces between the ‘real’ animals of the earth and the ones imagined by the pocket-Earth-imagination.

They’d swap coordinates, of course. The different astronauts in different ships would pipe messages to eachother across stellar distances – just a few bytes of information at a time, a coordinate and a statement.
  Here is the most impossible tree.
  Here is a lake, hidden insight a mountain.
  Here are a flock of starlings that can swim in water; watch them dive in and out of these waves.

Back on Earth, the Earth was changing. It was getting hotter. Less diverse as species winked out, unable to adjust to the changing climate. Less people, as well. NASA and the other space agencies kept track of the ships, watching them go further and further away, knowing that each of them contained a representation of the planet they came from that was ever richer and ever more alive than the planet itself.

Things that inspired this story: Generative models; generative models as memories; generative models as archival systems; generative models as pocket imaginations for people to navigate and use as extensions of their own memory; AI and cartography. 

Import AI 186: AI + Satellite Imagery; Votefakes!; Schmidhuber on AI’s past&present

AI + Satellites = Climate Change Monitoring:
…Deeplab v3 + Sentinel satellite ‘SAR’ data = lake monitoring through clouds…
Researchers with ETH Zurich and the Skolkovo Institute of Science and Technology have used machine learning to develop a system that can analyze satellite photos of lakes and work out if they’re covered in ice or not. This kind of capability is potentially useful when building AI-infused earth monitoring systems.

Why they did it: The researchers use synthetic aperture radar (SAR) data from the Sentinel-1 satellite. SAR is useful because it sees through cloud cover, so they can analyze lakes under variable weather conditions. “Systems based on optical satellite data will fail to determine these key events if they coincide with a cloudy period,” they write. “The temporal resolution of Sentinel-1 falls just short of the 2-day requirement of GCOS, still it can provide an excellent “observation backbone” for an operational system that could fill the gaps with optical satellite data”.

How they did it: The researchers paired the Sentinel satellite data with a Deeplab v3+ semantic segmentation network. They tested their approach against three lakes in Switzerland (Sils, Silvaplana, St. Moritz) over satellite data gathered of the lakes during two separate winters (2016/17, and 2017/18). They obtain accuracy scores of around 95%, and find that the network does a reasonable job of identifying when lakes are frozen.

Why this matters: Papers like this show how people are increasingly using AI techniques as a kind of plug&play sensing capability, where they assemble a dataset, train a classifier, and then either build or plan an automated system based on the newly created detector.
  Read more: Lake Ice Detection from Sentinel-1 SAR with Deep Learning (arXiv).

####################################################

Waymo dataset + LSTM = a surprisingly well-performing self-driving car prototype:
…Just how far can a well-tuned LSTM get you?…
Researchers with Columbia University want to see how smart a self-driving car can get if it’s trained in a relatively simple way on a massive dataset. To that end, they train a LSTM-based system on 12 input features from the Waymo Open Dataset, a massive set of self-driving car data released by Google last year (Import AI 161).

Performance of a well-tuned LSTM: In tests, an LSTM system trained with all the inputs from all the cameras on the car gets a minimum loss of about 0.1327. That’s superior to other similarly simple systems based on technologies like convolutional neural nets, or gradient boosting. But it’s a far cry from the 99.999% accuracy I think most people would intuitively want in a self-driving car.

Why this matters: I think papers like this emphasize the extent to which neural nets are now utterly mainstream in AI research. It also shows how industry can inflect the type of research that gets conducted in AI purely by releasing its own datasets, which become the environments academics use to test, calibrate, and develop AI research approaches.
  Read more: An LSTM-Based Autonomous Driving Model Using Waymo Open Dataset (arXiv).

####################################################

Votefakes: Indian politician uses synthetic video to speak to more voters:
…Deepfakes + Politics + Voter-Targeting = A whole new way to persuade…
An Indian politician has  used AI technology to generate synthetic videos of themselves giving the same speech in multiple languages, marking a possible new tool that politicians will use to target the electorate.

Votefakes: “When the Delhi BJP IT Cell partnered with political communications firm The Ideaz Factory to create “positive campaigns” using deepfakes to reach different linguistic voter bases, it marked the debut of deepfakes in election campaigns in India. “Deepfake technology has helped us scale campaign efforts like never before,” Neelkant Bakshi, co-incharge of social media and IT for BJP Delhi, tells VICE. “The Haryanvi videos let us convincingly approach the target audience even if the candidate didn’t speak the language of the voter.”” – according to Vice.

Why this matters: Ai lets people scale themselves – whether by automating and scaling out certain forms of analysis, or here automating and scaling out the way that people appear to other people. With modern AI tools, a politician can be significantly more present in more diverse communities. I expect this will lead to some fantastically weird political campaigns and, later, the emergence of some very odd politicians.
  Read more: We’ve Just Seen the First Use of Deepfakes in an Indian Election Campaign (Vice).

####################################################

Computer Vision pioneer switches focus to avoid ethical quandaries:
…If technology is so neutral, then why are so many uses of computer vision so skeezy?…
The creator of YOLO, a popular image identification and classification system, has stopped doing computer vision research due to concerns about how the technology is used.
  “I stopped doing CV research because I saw the impact my work was having,” wrote Joe Redmon on Twitter. “I loved the work but the military applications and privacy concerns eventually became impossible to ignore.”

This makes sense, given Redmon’s unusually frank approach to their research. ““What are we going to do with these detectors now that we have them?” A lot of the people doing this research are at Google and Facebook. I guess at least we know the technology is in good hands and definitely won’t be used to harvest your personal information and sell it to…. wait, you’re saying that’s exactly what it will be used for?? Oh. Well the other people heavily funding vision research are the military and they’ve never done anything horrible like killing lots of people with new technology oh wait…”, they wrote in the research paper announcing YOLOv3 (Import AI: 88).
  Read more at Joe Redmon’s twitter page (Twitter).

####################################################

Better Satellite Superresolution via Better Embeddings:
…Up-scaling + regular satellite imaging passes = automatic planet monitoring…
Superresolution is where you train a system to produce the high-resolution versions of low-resolution images; in other words, if I show you a bunch of black and white pixels on a green field, it’d be great if you were smart enough to figure out this was a photo of a cow and produce that for me. Now, researchers from Element AI, MILA, the University of Montreal, and McGill University, have published details about a system that can take in multiple low-resolution images and stitch them together into high-quality superresolution images.

HighRes-net: The key to this research is HighRes-net, an architecture that can fuse an arbitrary number of low-resolution frames together to form a high-resolution image. One of the key tricks here is the continuous computation of a shared representation across multiple low-resolution views – by embedding these into the same featurespace, then embedding them jointly with the shared representation, it makes it easier for the network to learn about overlapping versus non-overlapping features, which can help it make marginally smarter super-resolution judgement calls. Specifically, the authors claim HighRes-net is “the first deep learning approach to MFSR that learns the typical sub-tasks of MFSR in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss.”

How well does it work? The researchers tested out their system on the PROBA-V dataset, a satellite imagery dataset that consists of high-resolution / low-resolution imagery pairs. (According to the researchers, lots of bits of superresolution research test on algorithmically-generated low-res images, which means the tests can be a bit suspect). They entered their model into the European Space Agency’s Kelvin competition, obtaining top scores on the public-leaderboard and secondbest scores on a private evaluation.

Why this matters: Techniques like this could let people use more low-resolution satellite imagery to analyze the world around them. “There is an abundance of low-resolution yet high-revisit low-cost satellite imagery, but they often lack the detailed information of expensive high-resolution imagery,” the researchers write. “We believe MFSR can uplift its potential to NGOs and non-profits”.
  Get the code for HighRes-net here (arXiv).
  Read more: HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery (arXiv).

####################################################

AI industrialization means AI efficiency: Amazon shrinks the Transformer, gets decent results, publishes the code:
…Like the Transformer but hate how big it is? Try out Amazon’s diet Transformers…:
Amazon Web Services researchers have developed three variations on the Transformer architecture, all of which demonstrate significant efficiency gains over the stock Transformer.

Who cares about the Transformer? The transformer is a fundamental AI component that was first published in 2017 – one of the main reasons why people like Transformers is the fact the architecture uses attentional mechanisms to help it learn subtle relationships between data. Its this capability that has made Transformers quickly become fundamental plug-in components, appearing in AI systems as diverse as GPT-2, BERT, and even AlphaStar. But the Transformer has one problem – it can be pretty expensive to use, because the attentional processes can be computationally expensive. Amazon has sought to deal with this by developing three novel variants on the transformer.

The Transformer, three ways: Amazon outlines three variants on the Transformer which are all more efficient, though in different ways. “The design principle is to still preserve the long and short range dependency in the sequence but with less connections,” the researchers write. They test each Transformer on two common language model benchmark datasets: Penn TreeBank (PTB) and WikiText-2 (WT-2) – in tests, the Dilated Transforer gets a test score of 110.92 on PTB and 147.58 on WT-2, versus 103.72 and 140.74 for the full Transformer. This represents a bit of a performance hit, but the Dilated Transformer saves about 70% on model size relative to the full one When reading these, bear in mind the computational complexity of a full transformer is: O(n^2 * h). (n = length of sequence; h = size of hidden state; k = filter size; b = base window size; m = cardinal number).
– Dilated Transformer: O(n * k * h): Use dilated connections so you can have a larger receptive field for a similar cost.
– Dilated Transformer with Memory: O(n * k * c * h): Same as above, along with “we try to cache more local contexts by memorizing the nodes in the previous dilated connections”.
– Cascade Transformer: O(n * b * m^1 * h): They use cascading connections “to exponentially incorporate the local connections”.

Why this matters: If we’re going through a period of AI industrialization, then something worth tracking is not only the frontier capabilities of AI systems, but also the efficiency improvements we see in these systems over time. I think it’ll be increasingly valuable to track improvements here, and it will give us a better sense of the economics of deploying various types of AI systems.
  Read more: Transformer on a Diet (arXiv).
  Get the code here (cgraywang, GitHub).

####################################################

Schmidhuber on AI in the 2010s and AI in the 2020s:
…Famed researcher looks backwards and forwards; fantastic futures and worrisome trends…
Jürgen Schmidhuber, an artificial intelligence researcher who co-invented the LSTM, has published a retrospective on the 2010s in AI, and an outlook for the coming decade. As with all Schmidhuber blogs, this post generally ties breakthroughs in the 2010s back to work done by Schmidhuber’s lab/students in the early 90s – so put that aside while reading and focus on the insights.

What happened in the 2010s? The Schmidhuber post makes clear how many AI capabilities went from barely works in research to used in production in multi-billion dollar companies. Some highlights of technologies that went from being juvenile to being deployed in production at massive scale:
– Neural machine translation
– Handwriting recognition
– Really, really deep networks: In the 2010s, we transitioned from training networks with tens of layers to training networks with hundreds of layers, via inventions like Highway Networks and Residual Nets – this has let us train larger, more capable systems, capable of extracting even more refined signals from subtle patterns.
– GANs happened – it became easy to train systems to synthesize variations on their own datasets, letting us do interesting things like generating images and audio, and weirder things like Amazon using GANs to simulate e-commerce customers.

What do we have to look forward to in the 2020s?
– Data markets: As more and more of the world digitizes, we can expect data to become more valuable. Schmidhuber suspects the 2020s will see numerous attempts to create “efficient data markets to figure out your data’s true financial value through the interplay between supply and demand”.
AI for command-and-control nations: Meanwhile, some nations may use AI technologies to increase their ability to control and direct their citizens: “some nations may find it easier than others to become more complex kinds of super-organisms at the expense of the privacy rights of their constituents,” he writes.
Real World AI: AI systems will start to be deployed into industrial processes and machines and robots, which will lead to AI having a greater influence on the economy.

Why this matters: Schmidhuber is an interesting figure in AI research – he’s sometimes divisive, and occasionally percieved as being somewhat pushy with regard to seeking credit for certain ideas in AI research, but he’s always interesting! Read the post in full, if only to get to the treat at the end about using AI to colonize the “visible universe”.
  Read more: The 2010s: Our Decade of Deep Learning / Outlook on the 2020s (arXiv).

####################################################

AI Policy with Matthew van der Merwe:

…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Europe’s approach to AI regulation

The European Commission has published their long-awaited white paper on AI regulation. The white paper is released alongside reports on Europe’s data strategy, and on safety and liability. These build on Europe’s Coordinated Plan on AI (see Import #143) and the recommendations of their high-level expert group (see Import #126). 

   High-risk applications: The European approach will be ‘risk-based’, with high-risk AI applications subject to more stringent governance and regulatory measures. They propose two necessary conditions for an application to be deemed high-risk:
  (1) it is employed in a sector that typically involves significant risks (e.g. healthcare)
  (2) the application itself is one likely to generate significant risks (e.g. treating patients).

   US criticism: The US Government’s Chief Technology Officer, Michael Kratsios, criticized the proposals as being too ‘blunt’ in their bifurcation of applications into high and low-risk, arguing that it is better to treat risk as a spectrum when determining appropriate regulations, and that the US’s light touch approach is more flexible in this regard, and overall better.


Matthew’s view: To be useful, a regulatory framework has to carve up messy real-world things into neat categories, and it is often better to deal with nuance at a later stage—when designing and implementing legislation. In many countries it is illegal to drive without headlights at night, despite there being no clear line between night and day. Nonetheless, having laws that distinguish between driving at night and day is plausibly better than having more precise laws (e.g. in terms of measured light levels), or no laws at all in this domain. There are trade-offs when designing governance regimes, of which bluntness VS nuance is just one, and they should be judged on a holistic basis. In the absence of much detail on the US approach to AI regulation with regard to risks, it is too early to properly compare it with the Europeans’.

Read more: On Artificial Intelligence – A European approach to excellence and trust (EU)

Read more: White House Tech Chief Calls Europe’s AI Principles Clumsy Compared to U.S. Approach

 

DoD adopts AI principles:

DefenseOne reports that the DoD plans to adopt the AI principles drawn up by the Defense Innovation Board (DIB). A draft of these principles were published in October (see Import #171).

Matthew’s view: I was impressed by the DIB’s AI principles and the process by which they were arrived at. They had a deep level of involvement from a broad group of experts, and underwent stress testing with a ‘red teaming’ exercise. The principles focus on the safety, robustness, and interpretability of AI systems. They also take seriously the need to develop guidelines that will remain relevant as AI capabilities grow stronger. 

   Read more: Pentagon to Adopt Detailed Principles for Using AI.
  Read more: Draft AI Principles (DoD).

####################################################

Tech Tales:

The Interface
A Corporate Counsel Computer, 2022

Hello this is Ava at the Contract Services Company, what do you need?
Well hey Ava, how’s it going?
It’s good, and I hope you’re having a great day. What services can we provide?
Can you get me access to Construction_Alpha-009 that was assigned to Mitchell’s Construction?
Checking… verified. I sure can! Who would you like to speak to?
The feature librarian.
Memory? Full-spectrum? Auditory?-
I need the one that does Memory, specialization Emotional
Okay, transforming… This is Ava, the librarian at the Contract Services Company Emotional Memory Department, how can I help you?
Show me what features activated before they left the construction site on July 4th, 2025.
Checking…sure, I can do that! What format do you want?
Compile it into a sequential movie, rendered as instances of 2-Dimensional images, in the style of 20th Century film.
Okay, I can do that. Please hold…
…Are you there?
I am.
Would you like me to play the movie?
Yes, thank you Ava.

Things that inspired this story: AI Dungeon, GPT-2, novel methods of navigating the surfaces of generative models, UX, HCI, Bladerunner.