Import AI 189: Can your AI beat a 0% baseline?; AlphaFold predicts COVID-19 properties; plus, gigapixel-scale surveillance

by Jack Clark

Salesforce gets a language model to generate protein sequences:
…Can a language model become a scientist? Salesforce thinks so…
In recent years, language models have got a lot better. Specifically, AI researchers have figured out how to train large, generative models over strings of data, and have started producing models that can generate reasonable things on the other side – code, theatrical plays, poems, and so on. Now, scientists have been applying similar techniques to seeing if they can train models to figure out other, complex distributions of data that can be modelled as a string of characters. To that end, researchers with Salesforce and the Department of Bioengineering at Stanford University have developed ProGen, a neural net, that can collectly predict and generate protein sequences.

What is ProGen? ProGen is a 1.2 billion parameter language model, trained on a dataset of 280 million protein sequences. ProGen is based on a Transformer model, with a couple of tweaks to help encode protein data.
  ProGen works somewhat like how a language model completes a sentence, where it generates successive words, adjusting as it goes until it reaches the end of the sequence. “ProGen takes a context sequence of amino acids as input and outputs a probability distribution over amino acids,” they write. “We sample from that distribution and then update the context sequence with the sample amino acid”.

How well does ProGen work? In tests, the researchers see if ProGen can successfully generate proteins that have a similar structure to real proteins – it works reasonably well: “across differing generation lengths, ProGen generation quality remains steadily near native low-energy levels, indicating a successful generation,” Salesforce writes in a blog post about the work. ProGen also displays some of the same helpful traits as language models, in the sense that it can be finetuned to improve performance on novel data, and it exhibits some amount of generalization.

Why this matters: Research like this gives us a sense of how scientists might repurpose AI tools developed for various purposes (e.g., language modeling) and apply them to other scientific domains. I think if we see more of this it’ll add weight to the idea that AI tools are going to generically help with a range of scientific challenges.
  Read more: ProGen: Language Modeling for Protein Generation (bioRxiv).
  Read more: ProGen: Using AI to Generate Proteins (Salesforce Einstein AI blog).

####################################################

Coming soon: gigapixel-scale surveillance:
…PANDA dataset helps train systems to surveil thousands of people at once…
In the future, massive cameras will capture panoramic views of crowds running marathons, or attending music festivals, or convening in the center of a town, and governments and the private sector will deploy sophisticated surveillance AI systems against this footage. A new research paper from Tsinghua University and Duke University, gives us a a gigapixel photo dataset called PANDA to help people conduct research into large-scale surveillance.

What goes into a PANDA: PANDA is made of 21 real-world scenes, where each scene consists of around 2 hours of 30Hz video, with still images extracted from this. The resolution of each PANDA picture is 25,000*14,000 pixels – that’s huge, considering many image datasets have standardized on 512*512 pixels – about 25 times small, at least.

Acronym police, I’d like to report a murder: Panda is short for gigaPixel-level humAN-centric viDeo dAtaset. Yup.

Two PANDA variants: PANDA, which has 555 images with an average density of ~200 people per image, and PANDA-Crowd (known as PANDA-C) has 45 images and an average density of ~2,700 people per image. PANDA also includes some interesting labels, ranging from classifications of behavior (walking / standing / holding / riding / sitting), to labels on groups of people (e.g., a lone individual might be tagged ‘single’, while a couple of friends could be tagged as ‘acquaintance’ and a mother and son might be tagged ‘family’).
  Video PANDA: PANDA also ships with a video version of the dataset, so researchers can train AI systems to track a given person (or object) in a video soon. Because PANDA has a massive field-of-view, it presents a greater challenge than prior datasets used for this sort of thing.

Why this matters: A few years ago, AI systems needed to be plugged into massively complex data pipelines that would automatically crop, tweak, and render images in ways that could be analyzed by AI systems. Now, we’re experimenting with systems for doing unconstrained surveillance over really large moving datasets, and testing out approaches that can do image recognition across thousands of faces at once. This stuff is getting more powerful more quickly than people realize, and I think the implications for how states relate to (and seek to control) their citizens are likely to be profound.
  Read more: PANDA: A Gigapixel-level Human-centric Video Dataset (arXiv).
  Get the dataset from the official website (PANDA dataset).

####################################################

Want to keep up to date with AI Ethics? Subscribe to this newsletter:
The Montreal AI Ethics Institute (MAIEI) has launched a newsletter to help people understand the world of AI Ethics. Like Import AI, the MAIEI newsletter provides analysis of research papers. Some of the research covered in the first issue includes: Papers that try and bridge short-term and long-term AI ethics concerns, analyses of algorithmic injustices, and studies that analyze how people who spread misinformation acquire influence online.
  Read more: AI Ethics #1: Hello World! Relational ethics, misinformation, animism and more… (Montreal AI Ethics Institute substack).

####################################################

DeepMind uses AlphaFold system to make (informed) guesses about COVID-19:
…Another episode of “function approximation can get you surprisingly far”…
Here’s one of the AI futures I’ve always wanted: a crisis appears and giant computers whirr into action in response, performing inferences and eventually dispensing scraps of computationally-gleaned insight that can help humans tackle the crisis. The nice thing is this is now happening via some work from DeepMind, which has published predictions from its AlphaFold system about the structures of several under-studied proteins linked to the SARS-CoV-2 virus.

What it did: DeepMind generated predictions about the structure of things linked to the virus. These structures – which are predictions, and haven’t been validated – can serve as clues for scientists trying to test out different hypotheses about how the virus functions. DeepMind says “these structure predictions have not been experimentally verified, but hope they may contribute to the scientific community’s interrogation of how the virus functions, and serve as a hypothesis generation platform for future experimental work in developing therapeutics.” The company has also shared its results with collaborators at Francis Crick Institute in the UK, who encouraged it to publish the predictions.

Why this matters: The story of computation has been the story of arbitraging electricity and time for increasingly sophisticated actions and insights. It’s amazing to me that we’re now able to use neural nets to approximate certain functions, which let us spend some money and time to get computers to generate insights that can equip human scientists. Centaur science!
  Read more: Computational predictions of protein structures associated with COVID-19 (Google DeepMind).

####################################################

Take the pain out of medical machine learning with TorchIO:
…Python library ships with data transformation and augmentation tools for medical data…
Researchers with UCL and King’s College London have released TorchIO,  “an open-source Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning”. The software library has been designed for simplicity and usability, the authors say, and ships with features oriented around loading medical data into machine learning systems, transforming and augmenting the data, sampling from patches of images, and more.

Why this matters: So much of turning AI from research into production is about plumbing. Specifically, about building sufficiently sophisticated bits of plumbing that it’s easy to shape, format, and augment data for different purposes. Tools like TorchIO are a symptom of the maturation of medical AI research using deep learning techniques.   
  Read more: TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning (arXiv).
  Get the code for TorchIO here (official torchio GitHub).

####################################################

Think our AI systems can generalize? Turns out they can’t:
…gSCAN test gives us a rare, useful thing: a 0% baseline score for a neural method…
How well can AI systems generalize to novel environments, situations, and commands? That’s an idea many researchers are trying to test these days, as we’ve moved in the last decade from the era of “most AI systems are appalling at generalization” to the era of “some AI systems can perform some basic generalization”. Now we want to know what the limits of generalization in these systems are, and understanding this will help us understand where we need fundamentally new components and/or techniques to make progress.

gSCAN: New research from the University of Amsterdam, MIT, ICREA, Facebook AI Research, and NYU introduces ‘gSCAN’, a benchmark for testing generalization in AI agents taught to tie written descriptions and commands to the state of a basic, 2-dimensional gridworld environment. gSCAN consists of natural language text instructions (e.g., walk left) with a restricted grammar; formalized commands (e.g., L.TURN WALK), as well as steps through a 2D gridworld environment (where you can see your character, see the actions it takes, and so on). gSCAN “focuses on rule-based generalization, but where meaning is grounded in states of a grid world accessible to the agent”, the researchers write.

Think your system can generalize? Test it on gSCAN and think again: gSCAN ships with eight different tasks, which are split across tasks that require composition (e.g., “whether a model can learn to recombine the properties of color and shape to recognize an unseen colored object”); direction (“generalizing to navigation in a novel direction”); novel contextual references (where you, say, ask the agent to navigate to a small circle, and during training the model has seen small things and circles and navigation tasks, but not in combination); novel composition of actions and arguments (e.g., train an agent to pull a heavy object but never ask it to push it, then see if when you ask it to push the object it knows how hard to push due to having pulled it previously); novel adverbs (can you learn to follow commands from a new adverb (e.g., ‘cautiously’) based on a small number of examples, and can you combine a familiar adverb and a familiar verb, e.g., combining ‘while spinning’ with ‘pull’).

0% Success Rate: In tests, both agents they evaluate get 0% on the novel direction task, and 0% on the adverb task where they get a single example (this climbs to ~5% with 50 examples, so a 95% failure rate in the best case). This is great! It’s really rare to get a new benchmark system which has some doable tasks (e.g., some of the composition tasks see systems get 50%+ on them), and some tasks that are hard (e.g., ~20% on combining verbs and adverbs), and low scores on a couple. It’s really useful to generate tasks that systems completely flunk, and I feel like gSCAN could provide a useful signal on AI performance, if people start hammering away at the benchmark.
  Why do we fail?  The navigation failures are a little less catastrophic than they seem – “the agent learned by the baseline model ends up in the correct row or column of the target 63% of the times… this further substantiates that they often walk all the way west, but then fail to travel the distance left to the south or vice-versa,” the researchers write. Meanwhile, the failures on the advert task (even with some training examples) highlight “the difficulty neural networks have in learning abstract concepts from limited examples”, they write.

Why this matters: How far can function-approximation get us? That’s the question underlying all of this stuff – neural network-based techniques have become really widely used because they can do what I think of as fuzzy curve-fitting in high-dimensional spaces. Got some complex protein data? We can probably approximate some of the patterns here. How about natural language? Yup, we can fit something around this and generate semi-coherent sentences? How about some pixel information? Cool, we can do interesting inferences and generation operations over images now (as long as we’re in the sweetspot of our data distribution). Tests like gSCAN will help us understand the limits of this broad, fuzzy function approximation, and learning about this will help us learn about the limits of today’s techniques. Experiments like this indicate “that fundamental advances are still needed regarding neural architectures for compositional learning,” the researchers write.
  Read more: A Benchmark for Systematic Generalization in Grounded Language Understanding (arXiv).
  Read about the prior work that underpins this: Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks (arXiv).
  Get the code for Grounded SCAN here (GitHub).
  Get the code for the original SCAN here (GitHub)

####################################################

AI Policy with Matthew van der Merwe:

…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

NeurIPS requires broader impacts statements:
In the call for papers for the 2020 conference, NeurIPS has asked authors to include a statement on “the potential broader impact of their work, including its ethical aspects and future societal consequences”.

Matthew’s view: This seems like a positive step. If done well, it could encourage researchers to take more seriously the potential harms of their work. Equally, it might end up having very little effect—e.g. if writing the statements is treated as a box-ticking exercise. Overall, I’m pleased to see the AI community making efforts to deal with the ethical challenges around publishing potentially harmful research.
  Read more: NeurIPS 2020 call for papers.
  Read more: NeurIPS requires AI researchers to account for societal impact and financial conflicts of interest (VentureBeat).

####################################################

The Long Monuments

[2500: The site of one of the Long Monuments.] 

They built it with the expectation of failure. Many failures, in fact. How else could you plan to build something meant to last a thousand years? And when they set out to build the new monument, they knew they could not foresee the future, the only sure thing would be chaos. Decades and perhaps centuries of order, then ruptures that could take place in an instant, or a month, or a year, or a generation; slow economic deaths, quickfire ones from natural disasters, plagues, wars, clever inventions gone awry, and more.

So they built simple and they built big. Huge slabs of stone and metal. Panels meant to be adorned by generations upon generations. Designs they envisaged would be built upon. Warped. Changed.

But always their designs demanded great investment, and they knew that if they could make them iconic in each generation – center them at the course of world events, or with construction achievements celebrated with vast monetary giveaways – then they might stand a chance of being built for a thousand years.

They also knew that some of the reasons for their construction would be terrible. Despotic queens and kings of future nations might dedicate armies to gathering the tools – whatever form they might require – to further construct the monuments. The monuments might become the icons of new religions after vast disasters decimate populations. And they would forever fuel conspiracies – both real and imagined, about the increasingly sophisticated means of their construction, and the reasons for their being worked on.

In time, the monuments grew. They gained adornments and changes over time, and were in some eras pockmarked with vast cities, and in others stark and unadorned. But they continued to grow.

Something the creators didn’t predict was after around 300 hundred years there grew talk of taking the monuments into space – converting them to space habitations and lifting them into the atmosphere, requiring the construction of huge means of transport on the surface of the earth. 

And on and on the means and methods and goals changed, but the wonders grew, and civilizations wheeled around them, like so many fruitflies around apples.

Things that inspired this story: Space elevators; COVID-19; the Cathedral and the Bazaar; Cathedrals; castles; myths; big history.