Import AI 199: Drone cinematographer; spotting toxic content with 4chan word embeddings; plus, a million text annotations help cars see

by Jack Clark

Get ready for the droneswarm cinematographer(s):
…But be prepared to wait awhile; we’re in the Wright Brothers era…
Today, people use drones to help film tricky things in a variety of cinematic settings. These drones are typically human-piloted, though there are the beginnings of some mobile drones that can autonomously follow people for sport purposes (e.g, Skydio). How might cinema change as people begin to use drones to film more and more complex shots? That’s an idea inherent to new research from the University of Seville, which outlines “a multi-UAV approach for autonomous cinematography planning, aimed at filming outdoor events such as cycling or boat races”.

The proposed system gives a human director software that they can use to lay out specific shots – e.g, drones flying to certain locations, or following people across a landscape – then the software figures out how to coordinate multiple drones to pull of the shot. This is a complex problem, since drones typically have short battery lives, and are themselves machines. The researchers use a graph-based solution to the problem that can find optimal solutions for single drones and approximate solutions for multi-drones scenarios. “We focus on high-level planning. This means how to distribute filming tasks among the team members,” they write.

They run the drones through a couple of basic in-the-wild experiments, involving collectively filming a single object from multiple angles, as well as filming a cyclist and relaying the shot from one drone to the other. The latter experiment has an 8 second gap, as the drones need to create space for eachother for safety reasons, which means there’s not a perf overlap during the filming handover.

Why this matters: This research is very early – as the video shows – but drones are a burgeoning consumer product, and this research is backed up a by an EU-wide project named ‘MULTIDRONE‘ which is pouring money into increasing drone capabilities in this area.
  Read more: Autonomous Planning for Multiple Aerial Cinematographers (arXiv).
    Video: Multi-drone cinematographers are coming, but they’re a long way off (YouTube).

####################################################

Want to give your machines a sense of fashion? Try MMFashion:
…Free software includes pre-trained models for specific fashion-analysis tasks…
Researchers with the Chinese University of Hong Kong have released a new version of MMFashion, an open source toolbox for using AI to analyze images for clothing and other fashion-related attributes.

MMFashion v0.4: The software is implemented in Pytorch and ships with pre-trained models for specific fashion-related tasks. The latest version of the software has the following capabilities:
Fashion attribute prediction – predicts attributes of clothing, eg, a print, t-shirt, etc.
Fashion recognition and retrieval – determines if two images belong from the same clothing line.
Fashion Landmark Detection – detect neckline, hemline, cuffs, etc.
Fashion Parsing and Segmentation – detect and segment clothing / fashion objects.
– Fashion Compatibility and Recommendation – recommend items.

Model Zoo: You can see the list of models MMFashion currently ships with here, along with their performance on baseline tasks.

Why this matters: I think we’re on the verge of being able to build large-scale ‘culture detectors’ – systems that automatically analyze a given set of people for various traits, like the clothing they’re wearing, or their individual tastes (and how they change over time). Software like MMFashion feels like a very early step towards these systems, and I can imagine retailers increasingly using AI techniques to both understand what clothes people are wearing, as well as figure out how to recommend more visually similar clothes to them.
  Get the code here (mmfashion Github).
  Read more: MMFashion: An Open-Source Toolbox for Visual Fashion Analysis (arXiv).

####################################################

Spotting toxic content with 4chan and 8chan embeddings:
…Bottling up websites with word embeddings…
Word embeddings are kind of amazing – they’re a way you can develop a semantic fingerprint of a corpus of text, letting you understand how different words relate to one another in it. So it might seem like a strange idea to use word embeddings to bottle up the offensive shitposting on 4chan’s ‘/pol’ board – message boards notorious for their unregulated, frequently offensive speech, and association with acts of violent extremism (e.g, the Christchurch shooting). Yet that’s what a team of researchers from AI startup Textgain have done. The idea, they say, is people can use the word embedding filter to help them build datasets of potentially offensive words, or to detect them (via being deployed in toxicity filters of some kind).

The dataset: To build the embedding model, the researchers gathered around 30 million posts from the /pol subforum on 4chan and 8chan, with 90% of the corpus coming from 4chan and 10% from 8chan. The underlying dataset is available on request, they write.

Things that make you go ‘eugh’: The (short) research paper is worth a read for understanding how the thing works in practice. Though, be warned, the examples used include testing out toxicity detection with the n-word and ‘cuck’. However, it gives us a sense of how this technology can be put to work.
  Read more: 4chan & 8chan embeddings (arXiv).
  Get the embeddings in binary and raw format from here (textgain official website).

####################################################

Want to make your own weird robot texts? Try out this free ‘aitextgen’ software:
…Plus, finetune GPT-2 in your browser via a Google colab…
AI developer Max Woolf has spent months building free software to make it easy for people to mess around with generating text via GPT-2 language models. This week, he updated the open source software to make it faster and easier to setup. And best of all, he has released a Colab notebook that handles all the fiddly parts of training and finetuning simple GPT-2 text models: try it out now and brew up your own custom language model!

Why this matters: Easy tools encourage experimentation, and experimentation (sometimes) yields invention.  
  Get the code (aitextgen, GitHub)
  Want to train it in your browser? Use a Google colab here (Google colab).
  Read the docs here (aitextgen docs website).

####################################################

Want self-driving cars that can read signs? The RoadText-1K dataset might help:
…Bringing us (incrementally) closer to the era of robots that can see and read…
Self-driving cars need to be able to read; a new dataset from the International Institute of Information Technology in Hyderabad, India, and the Autonomous University of Barcelona, might teach them how.

RoadText-1K: The RoadText-1K dataset consists of 1000 videos that are around 10 seconds long each. Each video is from the BDD100K dataset, which is made up of video taken from the driver’s perspective of cars as they travel around the US. BDD is from the Berkeley Deep Drive project, which sees car companies and the eponymous university collaborate on open research for self-driving cars.
  Each frame in each video in RoadText-1K has been annotated with bounding boxes around the objects containing text, giving researchers a dataset full of numberplates, street signs, road signs, and more. In total, the dataset contains 1,280,613 instances of text across 300,000 frames.

Why this matters: Slowly and steadily, we’re making the world around us legible to computer vision. Much of this work is going on in private companies (e.g, imagine the size of the annotated text datasets that are in-house at places like Tesla and Waymo), but we’re also starting to see public datasets as well. Eventually, I expect we’ll develop robust self-driving car vision networks that can be fine-tuned for specific contexts or regions, and I think this will yield a rise in experimentation with odd forms of robotics.
  Read more: RoadText-1K: Text Detection & Recognition Dataset for Driving Videos (arXiv).
  Get the dataset here (official dataset website, IIIT Hyderabad).

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Is there a Moore’s Law equivalent for AI algorithms?
In 2018, OpenAI research showed that the amount of compute used in state-of-the-art AI experiments had been increasing by more than a hundred thousand times over the prior five year period. Now they have looked at trends in algorithmic efficiency — the amount of compute required to achieve a given capability. They find that in the past 7 years the compute required to achieve AlexNet-level performance in image classification has decreased by a factor of 44x—a halving time of ~16 months. Improvements in other domains have been faster, over shorter timescales, though there are fewer data points — in Go, AlphaZero took 8x less compute to reach AlphaGo Zero–level, 12 months later; in translation, the Transformer took 61x less training compute to surpass seq2seq, 3 years later.

AI progress: A simple three-factor model of AI progress takes hardware (compute), software (algorithms), and data, as inputs. This research suggests the last few years of AI development has been characterised by substantial algorithmic progress, alongside the strong growth in compute usage. We don’t know how well this trend generalises across tasks, or how long it might continue. More research is needed on these questions, on trends in data efficiency, and on other aspects of algorithmic efficiency — e.g. training and inference efficiency.

Other trends: This can be combined with what we know about other trends to shed more light on recent progress — improvements in compute/$ have been ~20%pa in recent years, but since we can do 70% more with a given bundle of compute each year, the ‘real’ improvement has been ~100%pa. Similarly, if we adjust the compute used in state-of-the-art experiments, the ‘real’ growth has been even steeper than initially thought.

Why it matters: Better understanding and monitoring the drivers of AI progress should help us forecast how AI might develop. This is critical if we want to formulate policy aimed at ensuring advanced AI is beneficial to humanity. With this in mind, OpenAI will be publicly tracking algorithmic efficiency.
  Read more: AI and Efficiency (OpenAI)
  Read more: AI and Compute (OpenAI).

####################################################

Tech Tales:

Moonbase Alpha
Earth, 2028

He woke up on the floor of the workshop, then stood and walked over in the dark to the lightswitch, careful of the city scattered around on the floor. He picked up his phone from the charger on the outlet and checked his unread messages and missed calls, responding to none of them. Then he turned the light on, gazed at his city, and went to work. 

He used a mixture of physical materials and software-augments that he projected onto surfaces and rendered into 3D with holograms and lasers and other more obscure machines. Hours passed, but seemed like minutes to him, caught up in what to a child would seem a fantasy – to be in charge of an entire city – to construct it, plan it, and see it rise up in front of you. Alive because of your mind. 

Eventually, he sent a message: “We should try to talk again”.
“Yes”, she replied. 

-*-*-*-*-

He knew the city so well that when he closed his eyes he could imagine it, running his mind over its various shapes, edges, and protrusions. He could imagine it better than anything else in his life at this point. Thinking about it felt more natural than thinking about people.

-*-*-*-*-

How’s it going? She said.
What do you think?, he said. It’s almost finished.
I think it’s beautiful and terrible, she said. And you know why.
I know, he said.
Enjoy your dinner, she said. Then she put down the tray and left the room.

He ate his dinner, while staring at the city on the moon. His city, at least, if he wanted it to be.

It was designed for 5000 people. It had underground caverns. Science domes. Refineries. Autonomous solar panel production plants. And tunnels – so many tunnels, snaking between great halls and narrowing enroute to the launch pads, where all of humanity would blast off into the solar system and, perhaps, beyond. 

Lunar 1, was its name. And “Lunar One,” he’d whisper, when he was working in the facility, late in the evening, alone.

Isn’t it enough to just build it? She said.
That’s not how it works, he said. You have to be there, or they’ll be someone else.
But won’t it be done? She said. You’ve designed it.
I’m more like a gardener, he said. It’ll grow out there and I’ll need to tend it.
But what about me?
You’ll get there too. And it will be so beautiful.
When?
He couldn’t say “five years”. Didn’t want that conversation. So he said nothing. And she left.

-*-*-*-*-

The night before he was due to take off he sat by the computer in his hotel room, refreshing his email and other message applications. Barely reading the sendoffs. Looking for something from her. And there was nothing.

That night he dreamed of a life spent on the moon. Watching his city grow over the course of five years, then staying there – in the dream, he did none of the therapies or gravity-physio. Just let himself get hollow and brittle. So he stayed up there. And in the dream the city grew beyond his imagination, coating the horizon, and he lived there alone until he died. 

And upon his death he woke up. It was 5am on launch day. The rockets would fire in 10 hours.

Things that inspired this story: Virtual reality; procedural city simulator programs; the merits and demerits of burnout; dedication.