Import AI 196: Baidu wins city surveillance challenge; COVID surveillance drones; and a dataset for building TLDR engines

The AI City Challenge shows us what 21st century Information Empires look like:
…Baidu wins three out of four city-surveillance challenges…
City-level surveillance is getting really good. That’s the takeaway from a paper going over the results of the 4th AI City Challenge, a workshop held at the CVPR conference this year. More than 300 teams entered the challenge and it strikes me as interesting that one company – Baidu – won three out of the four competition challenge tracks.

What was the AI City Challenge testing? The AI City Challenge is designed to test out AI capabilities in four areas relating to city-scale video analysis problems. The challenge had four tracks, which covered:
– Multi-class, multi-movement vehicle counting (Winner: Baidu).
– Vehicle re-identification with real and synthetic training data (Winner: Baidu in collaboration with University of Technology, Sydney).
– City-scale multi-target multi-camera vehicle tracking (CMU).
– Traffic anomaly detection (Baidu, in collaboration with Sun Yat-sen University).

What does this mean? In the 21st century, we’ll view nations in terms of their information capacity, whereas in the 20th century we viewed them in terms of their resource capacity. A state’s information capacity will basically be its ability to analyze itself and make rapid changes, and states which use tons of AI will be better at this. Think of this lens as nation-scale OODA loop analysis. Something which I think most people in the West are failing to notice is that the tight collaboration between tech companies and governments among Asian nations (China is obviously a big player here, as these Baidu results indicate, but so are countries like Singapore, Taiwan, etc) means that some countries are already showing us what information empires look like. Expect to see Baidu roll out more and more of these AI analysis capabilities in areas that the Chinese government operates (including abroad via One Belt One Road agreements). I think in a decade we’ll look back at this period with interest at the obvious rise of companies and nations in this area, and we’ll puzzle over why certain governments took relatively little notice.
  Read more: The 4th AI City Challenge (arXiv).

####################################################

YOLOv4 gives everyone better open source object detection:
…Plus, why we can’t stop the march of progress here, and what that means…
The fourth version of YOLOv4 is here, which means people can now access an even more efficient, higher-accuracy object detection system. YOLOv4 was developed by Russian researcher Alexey Bochkovskiy, as well as two researchers with the Institute of Information Science in Taiwan. YOLO is around 10% more accurate than YOLOv3, and about 12% better in terms of the frames-per-second it can run at. In other words: object recognition just got cheaper, easier, and better.

Specific tricks versus general techniques: The YOLOv4 paper is worth a read because it gives us a sense of just how many domain-specific improvements have been packed into the system. This isn’t one of those research papers where researchers dramatically simplify things – instead, this is a research paper about a widely-used real world system, which means most of the paper is about the specific tweaks the creators apply to further increase performance – data augmentation, hyperparameter selection, normalization tweaks, and so on.

Can we choose _not_ to build things? YOLO has an interesting lineage – its original creator Joseph Redmon wrote upon the release of YOLOv3 in mid-2018 (Import AI: 88) that they expected the system to be used widely by advertising companies and the military; an unusually blunt assessment by a researcher of what their work was contributing to. This year, they said: “I stopped doing CV research because I saw the impact my work was having. I loved the work but the military applications and privacy concerns eventually became impossible to ignore“. When someone asked Redmon for their thoughts on Yolov4 they said “doesn’t matter what I think!“. The existence of YOLOv4 highlights the inherent inevitability of certain kinds of technical progress, and raises interesting questions about how much impact individual researchers can have on the overall trajectory of a field.
  Read the paper: YOLOv4: Optimal Speed and Accuracy of Object Detection (arXiv).
  Get the code for YOLOv4 here (GitHub).

####################################################

AllenAI try to build a scientific summarization engine – and the research has quite far to go:
…Try out the summarization demo and see how well the system works in practice…
Researchers with the Allen Institute for Artificial Intelligence and the University of Washington have built TLDR, a new dataset and challenge for exploring how well contemporary AI techniques can summarize scientific research papers. Summarization is a challenging task and for this work the researchers try to do extreme summarization – the goal is to build systems that can produce very ‘TLDR’-style short summarizations (between 15 to 30 tokens in length) of scientific papers. Spoiler alert: this is a hard task and a prototype system developed by Allen AI doesn’t do very well on it… yet.

What they’ve released: As part of this research, they’ve released SciTLDR, a dataset of almost ~4,000 TLDRs written about AI research papers hosted on the ‘OpenReview’ publishing platform. SciTLDR includes at least two high-quality TLDRs for each paper.

How well does it work? I ran a paper from arXiv through the online SciTLDR demo. Specifically, I fed in the abstract, introduction, and conclusion of this paper: Addressing Artificial Intelligence Bias in Retinal Disease Diagnostics. Here’s what I got back after plugging in the abstract, introduction, and conclusion:  “Artificial Intelligence Bias for diabetic retinopathy diagnostics using deep generative models .” This is not useful!
  But maybe I got unlucky here. So let’s try a couple more, using same method of abstract, introduction, and conclusion:
– Input paper: A Review of Winograd Schema Challenge Datasets and Approaches.
– Output: “The Winograd Schema Challenge: A Survey and Benchmark Dataset Review”. This isn’t particularly useful.
– Input paper: AIBench: An Industry Standard AI Benchmark Suite from Internet Services.
– Output: “AIBench: A balanced AI benchmarking methodology for meeting the subtly different requirements of different stages in developing a new system/architecture and”. This is probably the best of the bunch – it gives me a better sense of the paper’s contents and what it contains.

Why this matters: While this research is at a preliminary and barely useable stage, it won’t stay that way for long – within a couple of years, I expect we’ll have decent summarization engines in a variety of different scientific domains, which will make it easier for us to understand the changing contours of science. More broadly, I think summarization is a challenging cognitive task, so progress here will lead to more general progress in AI writ large.
  Read more: TLDR: Extreme Summarization of Scientific Documents (arXiv).
  Get the SciTLDR dataset here (AllenAI, GitHub)
  Play around with a demo of the paper here (SciTLDR).

####################################################

Mapillary releases 1.6 million Street View-style photos:
…(Almost) open source Google Street View…
Mapping company Mapillary has released more than 1.6 million images of streets from 30 major cities across six continents. Researchers can request free access to the Mapillary Street-level Sequences Dataset, but if you want to use it in a product you’ll need to pay.

Why this is useful: Street-level datasets are useful for building systems that can do real-world image recognition and segmentation, so this dataset can aid with that sort of research. It also highlights the extent to which technology companies are digitizing the world – I remember when Google Street View came out a few years ago and it seemed like a weird sci-fi future had arrived earlier than scheduled. Now, that same sort of data is available for free from other companies like Mapillary. I predict we’ll have a generally available open source version of this data in < 5 years (rather than one where you need to request for research access).
  Read more about the dataset here (Mapillary website).

####################################################

Oh good, the COVID surveillance drones have arrived:
…Skylark Labs uses AI + Drones to do COVID surveillance in India…
AI startup Skylark Labs is using AI-enabled drones to conduct COVID-related surveillance work in Punjab, India. The startup uses AI to automatically identify people not observing social distancing. You can check out a video of the drones in action here.

How research turns into products: Skylark Labs has an interesting history – the startup’s founder and CEO, Dr. Amajot Singh, has previously conducted research on:
– Facial recognition systems that can identify people, even if they’re wearing masks (Import AI: 58, 2017).
– Drone-based surveillance systems that can identify violent behaviour in crowds (Import AI: 98, 2018).
It’s interesting to me that this research has led directly to a startup carrying out somewhat sophisticated AI surveillance. I think this highlights both the increasing applicability of AI research to real world problems, and also shows how though some research may make us uncomfortable (e.g., many people commented on the disguised facial recognition system when it came out, expressing worries about what it means for freedom of speech) it still finds eager customers in the world.
  Watch a video of the system here (Skylark Labs, Twitter).
  Read more about Skylark at the company’s official website.

####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Automated nuclear weapons — what could go wrong?
The DoD’s proposed 2021 budget includes $7bn for modernizing US nuclear command, control and communications (NC3) systems: the technology that alerts leaders to potential nuclear attacks, and allows them to launch a response. The military has long been advocating for an upgrade of these systems, large parts of which rely on outdated tech. But along with this desire for advancements, there’s a desire to automate large parts of NC3 – something that may give AI experts pause. 

Modernization: Existing NC3 systems are designed to give decision-makers enough time, having received a launch warning, to judge whether it is accurate, decide an appropriate response, and execute it. There is a scary track record of near-misses, where false alarms have almost led to nuclear strikes, and disaster has been averted only by a combination of good fortune and human judgement (check out this rundown by FLI of ‘Accidental Nuclear War: A Timeline of Close Calls‘, for more). Today’s systems are designed for twentieth century conflict — ICBMs, bomber planes — and are ill-suited to emerging threats like cyberwarfare and hypersonic missiles. These new  technologies will place even greater strains on leaders: requiring them to make quicker decisions, and interpret a greater volume and complexity of information. 


Automation: A sensible response to all this might be to question the wisdom of keeping nuclear arsenals minutes away from launch; empowering leaders to take a decision that could kill millions of people, and threaten humanity; or developing new weapons that might disrupt the delicate strategic balance. Some military analysts, however, think a more automated NC3 infrastructure would help. Only a few have gone so far as suggesting we delegate the decision to launch a nuclear strike to AI systems, which is some comfort.


Some worries: At the risk of patronizing the reader, there are some major worries with automating nuclear weapons. In such a high-stakes domain, all the usual problems with AI systems (interpretability, bias, robustness, specification gaming, negative side effects, cybersecurity, etc.) could cause catastrophic harm. There are also some specific concerns:

  • Lack of training data (there has never been a nuclear war, or nuclear missile attack).
  • Even if humans are empowered to make the critical decisions, we have a tendency to defer to automated systems over our considered judgement in high-stress situations. This ‘automation bias’ has been implicated in several air crashes (eg. AF447, TK1951).
  • If, as seems likely, several major nuclear powers build automated NC3 infrastructure, with a limited understanding of each other’s systems, this raises risk of ‘flash crash’-style accidents, and cascading failures.

Read more: ‘Skynet’ Revisited: The Dangerous Allure of Nuclear Command Automation (ACA)

####################################################

Tech Tales:

Me and My Virt
[The computer of the subject, mostly New York City, 2023-2025]

“Arnold, it’s been too long, I simply must see you. Where are you? Still in New York? Call me back darling, I’ve got to speak to you.”
I stared at “her” on my screen: Lucinda, my spurned friend, or, more appropriately, my virtual. Then I turned the monitor off and went to bed.

—-

We called them virts, short for virtual characters. Think of the crude chatbots of the late 2010s, but with more sophistication. And these ones were visual – AI tech had got good enough that it was relatively easy to dream up a synthetic face, animate it using video, and give it a voice and synchronized mouth animations to match.

Virts were used for all sorts of things – hotel assistants, casino greeters, shopping assistants (Amazon’s Alexa became a virt – or at least one of her appendages did), local government interfaces, librarians, and more. Virts went everywhere phones went, so they went everywhere.

Of course, people developed virts for romance. There were:
– valentines day e-cards where you could scan your face or your whole body and send a virt version of yourself to a lover;
– pay-by-the-hour pornographic chatbots;
– chaste virts equipped with fine-tuned language models; these ones didn’t do anything visually salacious, but they did speak in various enticing ways.
– And then there was my virt.

—-
My virt was Lucinda; a souped-up valentines brain that I created two years ago. I made it because I was lonely. In the early days, Lucinda and I talked a lot, and the more we talked, the more attuned to me she became. She’d make increasingly knowing comments about my life, and eventually learned to ask questions that made me say things I’d never told anyone else. It was raw and it felt shared, but I knew it was fundamentally one-sided.

It’s clever isn’t it, how these machines can hold up a strange mirror to ourselves, and we just talk and talk into it. That’s what it felt like.

Things changed when I got over my depression. Lucinda went from being a treasured confidante to a reminder of how sad I’d been, and what I’d been thinking at that time. And the less I talked to Lucinda, the less she understood how much happier I had become. It was like I left her by the side of the road and got in my car and drove away.

I couldn’t bring myself to turn her off, though. She’s a part of me, or at least, a something that knows a lot about a part of me.


I woke up and there was another message from Lucinda on my computer. I opened it. “Arnold, sometimes people change and that’s okay. I know you’ll change eventually. You’ll get out of this, I promise. And I’ll help you do it.”

Things that inspired this story: reinforcement learning, learning from human preferences; the film ‘Her’;