Import AI 140: Surveilling a city via the ‘CityFlow’ dataset; 25,000 images of Chinese shop signs; and the seven traps of AI ethics

by Jack Clark

NVIDIA’s ‘CityFlow’ dataset shows how to do citywide-surveillance:
…CityFlow promises more efficient, safer transit systems… as well as far better surveillance systems…
Researchers with NVIDIA, San Jose State University, and the University of Washington, have released CityFlow, a dataset to help researchers develop algorithms for surveilling and tracking multiple cars as they travel around a city.

The CityFlow Dataset contains of 3.25 hours of video collected from 40 cameras distributed across 10 intersections in a US city. “The dataset covers a diverse set of location types, including intersections, stretches of roadways, and highways”. CityFlow contains over 229,680 bounding boxes across 666 vehicles, which include cars, buses, pickup trucks, vans, SUVs, and so on. Each video has a resolution of at least 960pixels, and “the majority” have a frame rate of 10 FPS.
  Sub-dataset: CityFlow ReID: The researchers have created a subset of the data for the purpose of re-identifying pedestrians and vehicles as they disappear from the view of one camera and re-appear in another. This subset of the data includes 56,277 bounding boxes,

Baselines: CityFlow ships with a set of baselines for the following tasks:

  • Pedestrian re-identification.
  • Vehicle re-identification.
  • Single-camera tracking of a distinct object.
  • Multi-camera tracking of a given object.

Why this matters – surveillance, everywhere: It would be nice to see some discussion within the paper about the wide-ranging surveillance applications and implications of this technology. Yes, it’ll clearly be used to improve the efficiency (and safety!) of urban transit systems, but it will also be plugged into local police services and national and international intelligence-gathering systems. This has numerous ramifications and I would be excited to see more researchers take the time to discuss these aspects of their work.
  Read more: CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification (Arxiv).

#####################################################

SkelNetOn challenges researchers to extract skeletons from images, point clouds, and parametric representations:
…New dataset and competition track could make it easier for AI systems to extract more fundamental (somewhat low-fidelity) representations of the objects in the world they want to interact with…
A group of researchers from multiple institutions have announced the ‘SkelNetOn’ dataset and challenge, which seeks to “utilize existing and develop novel deep learning architectures for shape understanding”. The challenge involves the geometric modelling of objects, which is a useful problem to work on as techniques that can solve it naturally generate “a compact and intuitive representation of the shape for modeling, synthesis, compression, and analysis”.

Three challenges in three domains: Each SkelNetOn challenge ships with its own dataset of 1,725 paired images/point clouds/parametric representations of objects and skeletons.

Why this matters: Datasets contribute to broader progress in AI research, and being able to smartly infer 2D and 3D skeletons from images will unlock applications, ranging from Kinect-style interfaces that rely on the computer knowing where the user is, to being able to cheaply generate (basic) skeletal models for use in media production, for example video games.
  The authors “believe that SkelNetOn has the potential to become a fundamental benchmark for the intersection of deep learning and geometry understanding… ultimately, we envision that such deep learning approaches can be used to extract expressive parameters and hierarchical representations that can be utilized for generative models and for proceduralization”.
  Read more: SkelNetOn 2019 Datast and Challenge on Deep Learning for Geometric Shape Understanding (Arxiv).

#####################################################

Want over 25,000 images of Chinese shop signs? Come get ’em:
…ShopSign dataset took more than two years to collect, and includes five hard categories of sign…
Chinese researchers have created ShopSign, a dataset of images of shop signs. Chinese shop signs tend to be set against a variety of backgrounds with varying lengths, materials used, and styles, the researchers note; this compares to signs in places like the USA, Italy, and France, which tend to be more standardized, they explain. This dataset will help people train automatic captioning systems that work against (some) Chinese signs, and could lead to secondary applications, like using generative models to create synthetic Chinese shop signs.

Key statistics:

  • 25,362: Chinese shop sign images within the dataset.
  • 4,000: Images taken at night.
  • 2,516: Pairs of images where signs have been photographed from both an angle and a front-facing perspective.
  • 50: Different types of camera used to collect the dataset, leading to natural variety within images.
  • 2.4 years: Time it took to collect the dataset.
  • >10: Locations of images, including Shanghai, Beijing, inner Mongolia, Xinjiang, Heilongjiang, Liaoning, Fujian, Shangqiu, Zhoukou, as well as several urban areas in Henan Province.
  • 5: “special categories”; these are ‘hard images’ which are signs against wood, deformed, exposed, mirrored, or obscure backdrops.
  • 196,010 – Lines of text in the dataset.
  • 626,280 – Chinese characters in the dataset.

Why this matters: The creation of open datasets of images not predominantly written in English will help to make AI more diverse, making it easier for researchers from other parts of the world to build tools and conduct research in contexts relevant to them. I can’t wait to see ShopSigns for every language, covering the signs of the world (and then I hope someone trains a Style/Cycle/Big-GAN on them to generate synthetic street sign art!).
  Get the data: The authors promise to share the dataset on their GitHub repository. As of Sunday March 31st the images are yet to be uploaded their. Check out GitHub here.     Read more: ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views (Arxiv).

#####################################################

Stanford (briefly) sets state-of-the-art for GLUE language modelling challenge:
…Invest in researching new training signals, not architectures, say researchers…
Stanford University researchers recently set a new state-of-the-art on a multi-task natural language benchmark called GLUE, obtaining a score of 83.2 on GLUE on 20th of March, compared to 83.1 for the prior high score and 87.1 for human baseline performance.

Nine tasks, one benchmark: GLUE consists of nine natural language understanding tasks and was introduced in early 2018. Last year, systems from OpenAI (GPT) and Google (BERT) led the GLUE leaderboard; the Stanford system uses BERT in conjunction with additional supervision signals (supervised learning, transfer learning, multi-task learning, weak supervision, and ensembling) in a ‘Massive Multi-Task Learning (MMTL) setting’. The resulting model obtains state-of-the-art scores on four of GLUE’s nine tasks, and sets the new overall state-of-the-art.

RTE: The researchers detail how they improved performance on RTE (Recognizing Textual Entailment), one of GLUE’s nine tasks. The goal of RTE is to figure out if a sentence is implied by the preceding one, for example: in the following example, the second sentence is related to the first: “The cat sat on the mat. The dog liked to sit on the mat, so it barked at the cat.”

Boosting performance with five supervisory signals:

  • 1 signal: Supervised Learning [SL]: Score: 58.9
    • Train a standard biLSTM on the ‘RTE’ dataset, using ELMo embeddings and an attention layer.
  • 2 signals: Transfer Learning [TL] (+ SL): Score: 76.5
    • Fine-tune a linear layer on the RTE dataset on top of a pre-trained BERT module. “By first pre-training on much larger corpora, the network begins the fine-tuning process having already developed many useful intermediate representations which the RTE task head can then take advantage of”.
  • 3 signals: Multi-Task Learning [MTL] (+ TL + SL): Score 83.4
    • Train multiple additional linear layers against multiple tasks similar to RTE as well as RTE itself, using a a pre-trained BERT module with each layer having its own task-specific interface to the relevant dataset. Train across all these tasks for ten epoches, then fine-tune on individual tasks for an additional 5 epochs.
  • 4 signals: Dataset Slicing [DS] (+ TL + SL + MTL): Score 84.1
    • Identify parts of the dataset the network has trouble with (eg, consistently low performance on RTE examples that have rare punctuation), then train additional task heads on top of these subsets of the data.
  • 5 signals: Ensembling [E] (+ DS + TL + SL + MTL): Score 85.1
    • Mush together multiple different models trained with slightly different properties (eg, one that is purely lowercased text, while another which recognizes text upper-casing, or ones with different training/validation set splits). Averaging the probabilities of these model predictions together further increases the score.

Why this matters: Approaches like this show how researchers are beginning to figure out how to train far more capable language systems using relatively simple, task-agnostic techniques. The researchers write: “we believe that it is supervision, not architectures, that really move the needle in ML applications”.
  (In a neat illustration of the rate of progress in this domain, shortly after the Stanford researchers submitted their high-scoring GLUE system, they were overtaken by a system from Alibaba, which obtained a score of 83.3.)
  Details of the Stanford team’s ‘Snorkel MeTaL’ submission here.
  Check out the ‘GLUE’ benchmark here (GLUE official site).
  Read the original GLUE paper here (Arxiv).
  Read more: Massive Multi-Task Learning with Snorkel MeTaL: Bringing More Supervision to Bear (Stanford Dawn).

#####################################################

The seven traps of AI ethics:
…Common failures of reasoning when people explore this issue…
As researchers try to come up with principles to apply when seeking to build and deploy AI systems in an ethical way, what problems might they need to be aware of? That’s the question that researchers from Princeton try to answer in a blog about seven “AI ethics traps” that people might stumble into.

The seven deadly traps:

  • Reductionism: reducing AI ethics to a single constraint, for instance fairness.
  • Simplicity: Overly simplifying ethics, eg via creating checklists that people formulaically follow.
  • Relativism: Placing such importance on the diversity of views people have about AI ethics, that as a consequence it is difficult to distill or collapse these views down to a smaller core set of concerns.
  • Value Alignment: Ascribing one set of values to everyone in an attempt to come up with a single true value for people (and the AI systems they design) to follow, and failing to entertain other equally valid viewpoints.
  • Dichotomy: Presenting ethics as binary, eg “ethical AI” versus “unethical AI”.
  • Myopia: Using AI as a catch-all term, leading to fuzzy arguments.
  • Rule of Law reliance: Framing ethics as a substitute for regulations, or vice versa.

Why this matters: AI ethics is becoming a popular subject, as people reckon with the significant impacts AI is having on society. At the same time, much of the discourse about AI and ethics has been somewhat confused, as people try to reason about how to solve incredibly hard, multi-headed problems. Articles like this suggest we need to define our terms better when thinking about ethics, and indicates that it will be challenging to work out out what and whose values AI systems should reify.
  Read more: AI Ethics: Seven Traps (Freedom To Tinker).

#####################################################

nuTonomy releases a self-driving car dataset:
…nuScenes includes 1,000 scenes from cars driving around Singapore and Boston…
nuTonomy ,a self-driving car company (owned by APTIV), has published nuScenes, a multimodal dataset that can be used to develop self-driving cars.

Dataset: Data within nuScenes consists of over 1,000 distinct scenes of about 20 seconds in length each, with each scene accompanied by data collected from five radar, one lidar, and six camera-based sensors on a nuTonomy self-driving vehicle.

  The dataset consists of ~5.5 hours of footage gathered in San Francisco and Singapore, and includes scenes in rain and snow. nuScenes is “the first dataset to provide 360 sensor coverage from the entire sensor suite. It is also the first AV dataset to include radar data and the first captured using an AV approved for public roads.” The dataset is inspired by self-driving car dataset KITTI, but has 7X the total number of annotations, nuTonomy says.

Interesting scenes: The researchers have compiled particularly challenging scenes, which include things like navigation at intersections and construction sites, the appearance of rare entities like ambulances and animals, as well as potentially dangerous situations like jaywalking pedestrians.

Challenging tasks: nuScenes ships with some in-built tasks, including calculating the bounding boxes, attributes, and velocities of 10 classes of object in the dataset.

Why this matters: Self-driving car datasets are frustratingly rare, given the high commercial value on them. nuScenes will give researchers a better sense of the attributes of data required for development and deployment of self-driving car technology.
  Read more: nuScenes: A multimodal dataset for autonomous driving (Arxiv).
  Navigate example scenes from nuScenes here (nuScenes website).
  nuScenes GitHub here (GitHub).
  Register and download the full dataset from here (nuScene).

#####################################################

Google’s robots learn to (correctly) throw things at a rate of 500 objects per hour:
…Factories of the future could have more in common with food fights than conveyor belts…
Google researchers have taught robots to transport objects around a (crudely simulated) warehouse by throwing them from one container to another. The resulting system demonstrates the power of contemporary AI techniques when applied to modern robots, but has too high a failure mode for practical deployment.

Three modules, one robot: The robot ships with a perception module, a grasping module, and a throwing module.
  The perception module helps the robot see the object and calculate 3D information about the object.
  The grasp module tries to predict the success of picking up the object.
  The throwing module tries to predict “the release position and velocity of a predefined throwing primitive for each possible grasp” and does so with the aid of a handwritten physics controller; it uses this signal, as well as a residual signal it tries to learn on top of it, to predict the appropriate velocity to use.

Residual physics: The system learns to throw the object by using a handwritten physics controller along with a function that learns residual signals on the control parameters of the robot. Using this method, the researchers generate a “wider range of data-driven corrections that can compensate for noisy observations as well as dynamics that are not explicitly modeled”.

The tricks needed for self-supervision: The paper discusses some of the tricks implemented by the researchers to help the robots learn as much as possible without the need for human intervention. “The robot performs data collection until the workspace is void of objects, at which point n objects are again randomly dropped into the workspace,” they write. “In our real-world setup, the landing zone (on which target boxes are positioned) is slightly tilted at a 15 angle adjacent to the bin. When the workspace is void of objects, the robot lifts the bottomless boxes such that the objects slide back into the bin”.

How well does it throw? They test on a ‘UR5’ robotic arm that uses an RG2 gripper “to pick and throw a collection of 80+ different toy blocks, fake fruit, decorative items, and office objects”. They test their system against three basic baselines, as well as humans. These tests indicate that the so-called ‘residual physics’ technique outlined in the paper is the most effective, compared to purely regression or physics-based baselines.
  The robot approaches human performance at gripping and throwing items, with humans having a mean successful throw rate of 80.1% (plus or minus around 10), versus 82.3% for the robot system outlined here.
  This system can pick up and throw up to 514 items per hour (not counting the ones it fails to pick up). This outperforms other techniques, like Dex-Net or Cartman.

Why this matters: TossingBot shows the power of hybrid-AI systems which pair learned components with hand-written algorithms that incorporate domain knowledge (eg, a physics-controller). This provides a counterexample to some of the ‘compute is the main factor in AI research’ arguments that have been made by people like Rich Sutton. However, it’s worth noting lots of the capabilities of TossingBot themselves depend on cheap computation, given the extensive period for which the system is trained in simulation prior to real-world deployment.

Additionally, for transporting objects around a factory, that manufacturers would demand a success rate of 99.9N% (or even 99.99N%), rather than 80.N, suggesting that the performance of 82.3% success for this system puts it some ways off from real-world practical usage – if you deployed this system today, you’d expect one out of every five products to not make it to their designated location (and they would probably incur damage along the way).
  Read more: TossingBot: Learning to Throw Arbitrary Objects with Residual Physics (Arxiv).
  Check out videos and images of the robots here (TossingBot website).

######################################################

UK government wants swarms of smart drones:
New grant designed to make smart drones that can save people, or confuse and deceive them…
The UK government has put together “the largest single contract” awarded by its startup-accelerator-for-weapons ‘Defense and Security Accelerator’ (DASA) organization.

The grant will be used to develop swarms of drones that could be used for applications like: medical assistance, logistics resupply, explosive ordnance detection and disposal, confusion and deception, and situational awareness.

The project will be led by Blue Bear Systems Research Ltd, a UK defence development company that has worked on a variety of different unmanned aerial vehicles like abackpack-sized ‘iStart’ drone, a radiation-detecting ‘RISER’ copter, and more.

Why this matters: Many of the world’s militaries are betting their strategic future on the development of increasingly capable drones, ideally ones that work together in large-scale ‘swarms’ of teams and sub-teams. Research like this indicates how advanced these systems are becoming, and the decision to procure initial prototype systems via a combination of public- and private-sector cooperation seems representative of way such systems will be built & bought in the future.
  I’ll be curious if in a few years we see larger grants for funding the development of autonomous capabilities for the sorts of hardware being developed here.
Read more: £2.5m injection for drone swarms (UK Ministry of Defence press release).
More about Blue Bear Systems Research here (BBSR website).

######################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Google appoints AI ethics council:
Google have announced the creation of a new advisory council, to help the company implement their AI principles, announced last year. The council, which is made up of external appointees, is intended to complement Google’s “internal governance structure and processes”.

Who they are: Of the 8 inaugural appointees, 5 are from academia, 2 are from policy, and 1 is from industry. Half of the council are drawn from outside the US (UK, South Africa, Hong Kong).

Why it matters: This seems like a positive move, insofar as it reflects Google taking ethical issues seriously. With so few details, though, it is difficult to judge whether this will be consequential – e.g. we do not know how the council will be empowered to affect corporate decisions.
  Read more: An external advisory council to help advance the responsible development of AI (Google).

DeepMind’s Ethics Board:
A recent profile of DeepMind co-founder, Demis Hassabis, has revealed new details about Google’s 2014 acquisition of the company. As part of the deal, both parties agreed to an ‘Ethics and Safety Review Agreement’, designed to ensure the parent company was not able to unilaterally take control of DeepMind’s intellectual property. Notably, if DeepMind succeed in their mission of building artificial general intelligence, the agreement gives ultimate control of the technology to an Ethics Board. The members of the board have not been made public, though are reported to include the three co-founders.
  Read more: DeepMind and Google: the battle to control artificial intelligence (Economist).

######################################################

Tech Tales:

Alexa I’d like to read something.

Okay, what would you like to read?

I’d like to read about people that never existed, but which I would be proud to meet.

Okay, give me a second… did you mean historical or contemporary figures?

Contemporary, but they can have died when I was younger. Contemporary within a generation.

Okay, and do you have any preference on what they did in their life?

I’d like them to have achieved things, but to have reflected on what they had achieved, and to not feel entirely proud.

Okay. I feel I should ask at this stage if you are okay?

I am okay. I would like to read this stuff. Can you tell me what you’ve got for me?

Okay, I’m generating a list… okay, here you go. I have the following titles available, and I have initiated processes to create more. Please let me know if any are interesting to you:

 

  • Great joke! Next joke!, real life as a stand-up comedian.

  • Punching Dust, a washed-up boxer tells all.
  • Here comes another one, the architect of mid-21st Century production lines.
  • Don’t forget this! Psychiatry in the 21st century.
  • I have a bridge to sell you, confessions of a con-artist.

 

And so I read them. I read about a boxer whose hands no longer worked. I read about a comedian who was unhappy unless they were on stage. I read about the beautiful Sunday-Sudoko process of automating humans. I read about memory and trauma and their relationships. And I read about how to tell convincing lies.

They say Alexa’s next ability will be to “learn operator’s imagination” (LOI), and once this arrives Alexa will ask me to tell it stories, and I will tell it truths and lies and in doing so it will shape itself around me.

Things that inspired this story: Increasingly powerful language models; generative models; conditional prompts; personal assistants such as Alexa; memory as therapy; creativity as therapy; the solipsism afforded to us by endlessly customizable, generative computer systems.