Import AI: #96: Seeing heartbeats with DeepPhys, better synthetic images via SAGAN, and spotting pedestrians via a trans-European dataset

by Jack Clark

Satellite imagery competition challenges systems to outline buildings, segment roads, and analyze land use patterns:
…DeepGlobe competition and associated datasets designed to speed progress on strategic domain…
Researchers with Facebook, DigitalGlobe, CosmiQ Works, Wageningen University, and the MIT Media Lab have revealed DeepGlobe 2018, a satellite imagery competition with three tasks and associated datasets. DeepGlobe is intended to yield improvements in the automated analysis of satellite images for disaster response, planning, and object detection. DeepGlobe 2018 has three tracks with linked datasets: road extraction (8,570 images), building detection (24,586 ‘scenes’, equivalent to a 650×650 image), and land cover classification (1,146 satellite images).
Results: The researchers introduce some baseline performance numbers for each task; for road extraction they used a modified version of DeepLab with a ResNet18 backbone and Focal Loss, obtaining an Intersection over Union (IoU) score of 0.545; for building detection they used the top scoring solutions from a competition held on the same dataset in 2017, which obtain IoU scores of as high as .88 on cities like Las Vegas and as low as 0.54 on Khartoum; for land cover classification they implement a DeepLab system with a ResNet18 backbone and atrous spatial pyramid pooling (ASPP) to obtain an IoU scoe of 0.43.
Why it matters: AI will increase the automated analysis capabilities people and nations can wield over their satellite imagery repositories. Progress in this domain directly influences geopolitics by giving rise to new techniques that different nations can use in conjunction with satellite data to watch and react to the world.
Read more: DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images (Arxiv).

Trans-Europe Express!: Researchers release diverse ‘EuroCity’ dataset:
…31 cities across 12 countries yields a diverse dataset containing people in a huge variety of contexts…
Researchers with the Environment Perception Group at carmaker Daimler AG and the Intelligent Vehicles Group at TU Delft have released EuroCity, a large-scale dataset for object and pedestrian detection within urban scenes. EuroCity comprises 45,000 distinct images containing more than a hundred thousand pedestrians in weather settings ranging from dry to wet. The dataset is one of the largest and most diverse yet released for detecting people in urban scenes and will be of particular interest to self-driving car developers.
  Data: The researchers collected the data via a two megapixel camera installed on the dashboard of their car which they drove through 31 cities in 12 European countries.. The dataset’s diversity may help with generalization; results indicate that pre-training on the dataset substantially improved performance when transferring to solve tasks within the more widely-used CityPersons and KITTI datasets.
  Annotations: Pedestrians and vehicle riders are annotated in the dataset. If a rider, they are also annotated with sub-labels to describe their vehicle, such as bicycle, buggy, motorbike, scooter, tricycle, and wheelchair. The researchers also annotate confounding images, like posters that depict people, or images that catch reflections of people in windows, and additional phenomena like lens flares, motion blurs, raindrops, and so on. Annotations were performed via hand.
  Baselines: Four approaches – R-CNN, R-FCN, SSD, and YOLOv3 – are tested on the dataset to create baseline performance figures. Different variants of R-CNN perform best on all three tasks, followed by the performance of YOLOv3. “Processing rates for the R-FCN, Faster R-CNN, SSD and YOLOv3 on non-upscaled test images were 1.2 fps, 1.7 fps, 2.4 fps and 3.8 fps, respectively, on a Intel(R) Core i7-5960X CPU 3.00 GHz processor and a NVidia GeForce GTX TITAN X with 12.2 GB memory”.
  Why it matters: Datasets tend to motivate work on problems contained within them. Given the breadth and scale of EuroCity, it’s likely its release will improve the state-of-the-art when it comes to pedestrian detection in busy or partially occluded scenes. It also hints at a future where hundreds of thousands of cars with dash cams are used to grow and augment continent-scale datasets.
  Read more: The EuroCity Persons Dataset: A Novel Benchmark for Object Detection (Arxiv).

“I can guess your heart rate!” (with DeepPhys):
…Trained system predicts your heart beat from pixel inputs alone…
MIT and Microsoft researchers have built DeepPhys, a network that can crudely predict a person’s heart rate and breathing rate from RGB or infrared videos. They developed the network by building a couple of specific classification models based on domain knowledge about how to detect and analyze skin appearance and changes over time to better infer underlying biological phenomena.
Results: The researchers test their system on four datasets, three recorded under controlled and uncontrolled lighting conditions, and the fourth involving infrared. Their approach outperforms other systems on a variety of evaluation criteria. Additionally, further tests showed that training the system on diverse data inputs can lead to better performance. “The performance improvements were especially good for the tasks with increasing range and angular velocities of head rotation,” they write. “We attribute this improvement to the end-to-end nature of the model which is able to learn an improved mapping between the video color and motion information”.
Why it matters: Systems like this bring us closer to a world where the majority of cameras around us are performing a multitude of different analysis tasks, including ones we may not suspect are possible, like predicting our heart rate from images taken from security camera feeds.
Read more: DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks (Arxiv).

Funny dogs no more! Google and Rutgers introduce ‘SAGAN’:
…Want to be a better artist? Look inside yourself…
One of the classic problems with GAN-generated images is the number of dog legs. What I mean by that is though these systems have become adept in recent years at generating synthetic imagery in a bunch of different domains, they’ve remained stubbornly bad at modeling aspects of images that require a holistic understanding of the whole – like getting the number of legs right on a dogs body, or figuring out the correct physical dimensions of a cat’s tail and paw relationship, and so on. “While the state-of-the-art ImageNet GAN model excels at synthesizing image classes with few structural constraints (e.g. ocean, sky and landscape classes, which are distinguished more by texture than by geometry), it fails to capture geometric or structural patterns that occur consistently in some classes (for example, dogs are often drawn with realistic fur texture but without clearly defined separate feet),” the researchers explain.
Attention to the rescue: The researchers, which include GAN-inventor Ian Goodfellow, get around this issue by implementing what they call a Self-Attention Generative Adversarial Network (SAGAN). A SAGAN works by pairing a self-attention mechanism with the traditional machinery of GAN. “The self-attention module is complementary to convolutions and helps with modeling long range, multi-level dependencies across image regions. Armed with self-attention, the generator can draw images in which fine details at every location are carefully coordinated with fine details in distant portions of the image,” they write.
Results: The resulting systems dramatically outperform other approaches when assessed by the Inception score (which measures the KL divergence between the conditional class distribution and the marginal class distribution, where a higher score indicates between quality), with SAGAN obtaining a score of 52.52 compared to 36.8 for the prior best published result. It attains similarly impressive scores when assessed via Frechet Inception Distance (FID).
Why it matters: Attention is a simple idea that has come to dominate a huge amount of AI research lately. SAGAN provides further evidence for the generality and applicability of the technique. The work also suggests that progress in automated image synthesis is going to continue, and I worry that society isn’t quite prepared for what having all these cheap, convincing digital fakes means.
Read more: Self-Attention Generative Adversarial Networks (Arxiv).

Big Empiricism: Google carries out major ImageNet transfer learning experiment:
…Well-performing ImageNet models can aid transfer learning, but not as much as people had intuited…
New research from Google comprehensively tests the idea that models which attain higher scores on the widely-used ‘ImageNet’ dataset will tend to have good properties when used as inputs for transfer learning and domain adaptation. The research evaluates 13 ImageNet-trained classification models on 12 image classification tasks in three settings: as fixed feature extractors, as aids for fine-tuning, and using networks trained from random initialization.
Results: ImageNet performance is at best weakly predictive of good out-of-the-box performance on other tasks, though confidence increases with fine-tuning.
  Why it matters: Experiments like these enlarge our understanding of transfer learning within AI, which is a crucial problem that needs to be dealt with to build more capable systems. “Is the general enterprise of learning widely-useful features doomed to suffer the same fate as feature engineering in computer vision?” the authors wonder. “It is not entirely surprising that features learned on one dataset benefit from some other amount of adaptation (i.e., fine-tuning) when applied to another. It is, however, surprising that features learned from a large dataset cannot always be profitably adapted to much smaller datasets.”
  Big Empiricism: I’d categorize this type of research as a buzzword (you’ve been warned!) I’ve been mentally using called ‘Big Empiricism’. This sort of research tends to work by taking an existing technique and scaling it up to unprecedented levels to test its performance in large domains, or by taking a well-received idea and testing it via large costly experiments with multiple permutations. Other examples of work here include papers like ‘Regularized Evolution for Image Classifier Architecture Search (ImportAI #81) or the original Neural Architecture Search paper, Evolution Strategies, and Exploring the Limits of Language Modeling (among many other worthy examples).
  I do think this gives credence to the argument that AI science is bifurcating into two distinct tracks, with many organizations participating in basic AI research and a few (typically wealthy) ones exploring questions that require access to very large and expensive computers.
  Read more: Do Better ImageNet Models Transfer Better? (Arxiv).

##########
AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

Australia’s Chief Scientist proposes ethics ‘stamp’ for AI:
In a speech this week, Dr Alan Finkel, the Chief Scientist of Australia, proposed a mark which would identify products with AI components. Finkel says his idea, the “Turing Stamp”, would require companies to meet an independently audited set of standards, in the model of ‘Fair Trade’ or ‘Organic’ stamps on food.
Why it matters: If this worked it could represent a mechanism for incentivizing ethical AI development without top-down regulation by introducing ethics into competition between firms. Oren Etzioni, the CEO of the Allen Institute for Artificial Intelligence Research, has similarly argued that digital chatbots should have to identify themselves when talking to humans. However, distinguishing clearly between “AI” and “computer” would likely prove to be a challenge for regulators, so I’d worry about a rapid proliferation of labels.
Read more: Government should lead AI certification: Finkel (Government News).

Lessons from the Alvey Programme:
…What happened the last time the UK government dumped money into AI, and what parts of that were most helpful?…
In the appendices of the recent House of Lords report on AI is a discussion of a historic government attempt to stimulate the UK’s AI industry, the ‘Alvey Programme’ in the ‘AI spring’ of the 1980s. In 1983, the government committed £1bn (in today’s money) to AI R&D via an industry-academia partnership very similar to those being put forward today, in the UK and elsewhere.
  What worked: The programme successfully created a community of researchers in AI in the UK, and yielded a prototype for academia-industry collaboration that remains the main model of contemporary government AI R&D programs. Some of the research streams, like the focus on object-oriented programming, would have a lasting impact.
What went wrong: The programme was not deemed a success at the time, and was halted after five years. The goal of translating academic progress into commercial capabilities was not realized, as companies were frequently unwilling, or unable, to make significant investments in these technologies. The authors also point out the lack of communication amongst researchers, due to the absence of a single ‘research centre’ for the program.
  Read more: Lords AI Report (Appendix 4)
   Read more: Lessons from the Alvey Programme for Creating an Innovation Ecosystem for Artificial Intelligence.

Brookings polls 1,500 US adults regarding AI:
…Cautious optimism, concerns about job displacement and privacy, and support for regulation…
The Brookings thinktank recently conducted a survey of ~1,500 US adults on AI attitudes.
Cautious optimism: 34% expect AI to make day-to-day life easier, vs 13% expect it to make life harder.
  Concerns about jobs, privacy, and the future of humanity:
– 38% expect AI to reduce jobs, vs. 12% expect job creation.
– 49% expect AI to reduce personal privacy, vs 5% expect AI to increase it, and 12% for no impact.
– 32% think AI represents a threat to humanity vs. 24% for no threat.
  Significant support for regulation: 42% support government regulation of AI, vs 17% opposing it.
  US perceived as world-leader, but China expected to catch up: 21% thought US was the leading country in AI, closely followed by Japan (19%), and China (15%). When asked the same question about 10 years from now, 21% thought US would still be leading, narrowly beating China (20%).
Why it matters: Understanding and managing public attitudes towards AI is an important part of AI policy, but polling on this has been limited so far, with previous work focused more on tracking trends in news coverage, or qualitative methods. More regular polling worldwide to see differences in attitudes over time, and between countries, would be a positive (and cheap!) endeavor.
  Read more: Brookings survey finds worries over AI impact on jobs and personal privacy, concern U.S. will fall behind China.

US cities use Amazon facial recognition software, ACLU objects:
…ACLU has used public records requests to reveal how Amazon’s ‘Rekognition’ facial recognition software is being used by US law enforcement agencies in three states…
  What Rekognition can do: The ACLU reveals that Amazon has been renting its AI-powered ‘Rekognition’ software to several US law enforcement agencies. Rekognition is able to identify, track and analyze faces in real time, and recognize up to 100 people per image. It represents a new type of AI-software-as-a-service being developed by companies like Amazon and competes with similar cloud-based image recognition engines from Google, Microsoft, and others.
Why the ACLU is so concerned: The ACLU says software can be “readily used to violate civil liberties and civil rights”, and envisage scenarios where police can monitor who attends protests, ICE can continuously monitor new immigrants, and cities could routinely track their residents whether they are suspected of criminal activity or not.
  Why it matters: As the US public sector tries to harness AI capabilities it’s going to be forced to enter into more and more procurement relationships with powerful AI companies, many of whom have implicit ideological stances that differ to those of some of their customers (see also: Google and Project Maven.)
  Read more: Amazon Teams Up With Law Enforcement to Deploy Dangerous New Facial Recognition Technology (ACLU)
  Read more: Amazon is selling facial recognition to law enforcement — for a fistful of dollars (Washington Post)

Tech Tales:

On The Surprising Re-Emergence of Board Game Designers as Cultural Arbiters and Controllers

Say you’ve got hundreds of different computers and you’re ordering them to do lots of different things and you want to be able to nudge them occasionally and figure out what they’re doing — what do you do? It’s a hard problem. Lots of the early methods ended up structuring AI organizations as combinations of companies and computation fleets. That period didn’t last long. As the software took over companies changed: human work became less about the specific design of specific details – as the marketing slogan from one of the big tech companies goes, ‘from tooth brushes to silent electric engines, our $auto_bot can do it all. Buy today, earn today!’, instead it’s about figuring out how to let humans easily interface with these machines.

Enter the board game designers. About a decade ago some of the companies discovered internally that they could recast AI-teaching problems into interactive games that people could play in virtual reality, or in cut-down scenarios on their phones. Staff started to ‘work’ by playing games that interfaced with gargantuan learning engines. But it worked. And soon it led to products, all of which relied on this conceit of having the human work by playing a game.

Board games were designed to solve AI-relevant problems, such as:
– Marshaling fleets of anti-poacher drones to survey a large wildlife park
– Optimizing delivery times in a given area while satisfying certain human happiness measures (sometimes known as: brand maximizing) and being able to lightly direct spare vehicles to perform promotional ‘robot intervention’ stunts.
– Evolving a contextual-input orchestra via a hundred musical robots and more than a thousand input streams from various webcams, microphones, pressure-sensitive pads on walkways, etc.
– etc.

Eventually just about every problem got a board game variant. The AI systems these games controlled became ever-smarter as well, so the games became more complicated as well. And in this march to complexity the purpose of the board games changed twice. First, we built games of games – abstract entities that took training to operate which would let one person skillfully conducted thousands of AI systems at once. For a while, this drove society. But as we ran more games and grew more expert in their construction the AIs became smarter as well.

One day the purpose of the games began to change: instead of providing interfaces through which we could change the AIs, the games became interfaces through which the AIs could learn from is. These board games now work like simulations, where we play them and the robot indicates what it is planning to do and we give votes about how we feel about what it is doing, and then sometimes it adopts that behavior, or sometimes it does something else.

Obviously, these sorts of board games are less fun. Something about becoming a pawn on the board instead of the player sitting behind it makes people unwilling to play these games. So the games have got better: now they’re designed to hook us and entrance us, and the machines are learning to experiment on us in this way as well.

So that’s who we’ve ended up with The Suck, our omnipresent nickname for ‘UN-backed AI Interface Cluster 1 Class: ALL_PEOPLE’. The Suck is a board game designed by machines running on casino-mad impulses. During its construction the machines eagerly exhumed ancient propaganda methods from Edward Bernays to tabloids to arcade game machines to casinos to mobile apps to long-since-regulated Social Media Architectures. The machines and the UN officials building The Suck did their job well and now most of the planet spends a few hours a day playing it. We can do anything else we want. No one is forcing us to do it. But, what else are you going to do? It’s fun!

Few of us have a clear sense of what the machines are up to, these days. Shuttles go up and come down. New things are built. The atmospheres are being cleared. Human UN staff occasionally talk to the world to give updates on The Partnership, which is how we refer to this relationship we’ve got with the machines. Most people are pretty cheerful but some see malice in what is most likely just a banal burst of progress: Now is the greatest time to be a board game player in history, and also the most dangerous time, because these board games are after our minds – the_truth_is_out_there forum posting, captured t-9 days from message posting.

Things that inspired this story: The Glass Bead Game, 4x strategy games, interfaces between simulated AIs and human overseers, learning from human preferences.

Import AI