Import AI: #78: Google gives away free K80 GPUs; Chinese researchers benchmark thermal imagery pedestrian trackers; and AI triumphs against dermatologists in diagnosis competition

by Jack Clark

AI beats panel of 42 dermatologists at spotting symptoms of a particular skin disorder:
…R-CNN + large amounts of data beats hundreds of years of combined medical schooling…
Scientists have gathered together a large medical-grade dataset of photos of fingernails and toenails and used it to train a neural network to distinguish symptoms of onychomycosis better than a panel of experts. The approach relies on faster R-CNN (GitHub), an object classifier originally developed by Microsoft Research (Arxiv), as well as convolutional neural networks that implement a resnet-152 model (also developed by Microsoft Research). It’s another datapoint that, at least in the perceptual domain, it seems like given enough data&compute we can design systems that can match or exceed humans’ capabilities at narrowly specified tasks.
  Data janitorial work: The researchers also contribute a dataset of almost ~50,000 nail photographs for use in further research. The paper includes details on how they shaped and cleaned their data to obtain this dataset – a process that involved the researchers having to train an object localizing system to be able to automatically crop their images to just feature nails, rather than misclassified other things (apparently initially the network would mistake teeth or warts for fingers.)
  Results: They comprehensively test the resulting networks against a variety of different humans with different skills, ranging from nurses to clinicians to professors with a dermatology specialism. In all cases the AI-based networks matched or exceed large groups of human experts on medical classification tasks. “Only one dermatologist performed better than the ensemble model trained with the A1 dataset, and only once in three experiments,” they write.
  The future: One of the promises of AI for medical use-cases is that it can dramatically lower the cost of initial analysis of a given set of symptoms. This experiment backs up that view, and in addition to gathering the dataset and developing the AI techniques, the scientists have also developed a web- and smartphone-based platform to collect and classify further medical data. “The results from this study suggest that the CNNs developed in this study and the smartphone platform we developed may be useful in a telemedicine environment where the access to dermatologists is unavailable,” they write.
– Read more: Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network (PLOS One).

US defense establishment to invest in AI, robots:
…New National Defense Strategy memo mentions AI…
The US’s new National Defense Strategy calls for the government to “invest broadly in military application of autonomy, artificial intelligence, and machine learning, including rapid application of commercial breakthroughs”.
The summary also hints at the troubling dual use nature of AI and other technologies. “The security environment is also affected by rapid technological advancements and the changing character of war. The drive to develop new technologies is relentless, expanding to more actors with lower barriers of entry, and moving at accelerating speed. New technologies include advanced computing, “big data” analytics, artificial intelligence, autonomy, robotics, directed energy, hypersonics, and biotechnology— the very technologies that ensure we will be able to fight and win the wars of the future.
– Read more: Summary of the 2018 National Defense Strategy of The United States of America (PDF).

A movable feast of language modeling techniques from Fast.ai:
…Calibration, calibration, calibration…
Researchers with Fast.AI and Aylient Ltd have published details on Fine-tuned Language Models (FitLaM), a set of transfer learning methods to optimize language models for given domains. This paper has a similar flavor to DeepMind’s recent ‘Rainbow’ algorithm, where in both cases researchers integrate a bunch of recent innovations in their field (language modelling and reinforcement learning, respectively), to create an ‘everything-and-the-kitchen-sink’-style model, which attains good task performance.
Results: FitLaM models attain state-of-the-art scores on five distinct text classification tasks, reducing errors by between 18 and 24 percent on the majority of the datasets.
How it works: FitLaM models consists of an RNN with one or more task-specific linear layers, along with a tuning technique that manipulates more data in the higher layers of the network and less in the depths, aiding preservation of information gleaned from general-domain language modelling. Along with this, the authors develop a bunch of different techniques to further facilitate transfer, detailed exhaustively in the paper.
Transfer learning: To aid transfer learning, the researchers pre-train a language model on a large text corpus – in this case Wikitext, which consists of over ~28,000 pre-processed Wikipedia articles. Other techniques they use include ‘gradual unfreezing’ of neural network layers during re-training, using cosine annealing for fine-tuning, and using reverse annealing as well.
Tested domains: Sentiment analysis (two separate datasets), question classification, topic classification (two datasets).
– Read more: Fine-tuned language models for text classification (Arxiv).

Google-owned Kaggle adds free GPUs to online coding service:
…Free GPUS with very few strings attached…
Google says users of Colaboratory, its live coding mashup that works like a cross between a Jupyter Notebook and a Google Doc, no comes with free GPUs. Users can write a few code snippets, detailed here, and get access to two vCPUs with 13GB of RAM and, the icing on the cake – an NVIDIA K80 GPU, according to a comment from an account linked to Michael Piatek at Google.
– Access Colaboratory here.

First came ResNets, then DenseNets, now… SparseNets?
…Researchers chain networks together in weird ways to attain state-of-the-art results…
Neural networks can in one way be viewed as machines that operate over distinct datasets and figure out the transformation that ties them together, researchers have developed approaches (Residual Networks and DenseNets) that are able to pick up on successively finer-grained features that distinguish different visual phenomena, while insuring that as much information as possible can propagate from one layer of a network to another.
Now, researchers with Simon Fraser University have tried to take the best traits from ResNets and DenseNets and synthesize them into SparseNets, a way of structuring networks that “aggregates features from previous layers: each layer only takes features from layers that have exponential offsets away from it… Experimental results on the CIFAR-10 and CIFAR-100 datasets demonstrate that SparseNets are able to achieve comparable performance to current state-of-the art models with significantly fewer parameters,” they write.
Thrifty networks: So, what’s the motivation for structuring networks in such a way? It’s that if you can expand the size of the network without adding too many parameters, then you know you’ll ultimately be able to exploit this efficiency to build even larger networks in the future. Experiments with SparseNet show that networks built like this can attain accuracies similar to those obtained by ResNets and DenseNets on a far, far smaller parameter budget.
– Read more: Sparsely Connected Convolutional Networks.

Bootstrapping data quality with neural networks:
…Chinese researchers try to scale data generation via AI…
Researchers with Soochow University, Alibaba Group, Shenzhen Gowild Robotics Co. Ltd, and Heilongjiang University, have developed a system to improve performance of Chinese named entity recognition (NER) techniques by generating low-quality data and improving its quality via adversarial training. NER is a specific skill that systems use to spot the key parts of sentences and how they link to a larger knowledge store about the world. Better NER approaches tend to quickly translate into improved consumer-facing or surveillance-oriented AI systems, like personal assistants, or databases for analyzing large amounts of speech.
The technique: The researchers use crowd annotators to label specific datasets, such as those in dialog and e-commerce, and use a variety of neural network-based systems to analyze the commonalities and differences between the different NER labels applied by each individual to their specific sample of text. The resulting system is able to perform classification at higher accuracies than other systems trained on the same data, beating or matching other baselines created by the researchers.
– Read more: Adversarial Learning for Chinese NER from Crowd Annotations (Arxiv).

Chinese researchers gather pedestrian tracking dataset and evaluate nine trackers on it:
…Oy, you there! Yes, you, the iridescent person with the really warm hands!…
Chinese researchers have selected 60 videos shot in thermal infrared, compiled them, and turned them into a dataset to use to evaluate thermal infrared pedestrian tracking (TIR) technologies.
  Dataset: The 60 thermal sequences contain footage from a variety of devices (surveillance cameras, hand-held cameras, vehicle-mounted cameras, drones) across a mixture of differently scaled scenes, camera-positions, and video perspectives.
Trackers evaluated: The researchers evaluate nine distinct pedestrian trackers that implement different methods, ranging from support vector machines, to correlation and regression filters, to deep learning approaches (systems: HDT and MCFTS). SRDCF – a spatially regularized discriminative correlation filter (PDF) – is the clear winner, attaining the most reliably high scores across a bunch of different tests.
  Surprisingly strong deep learning performance: Both neural network approaches (HDT and MCFTS) enjoy fairly consistent, high rankings as well. “We suggest that the deep feature based trackers have potential to achieve better performance if there are enough thermal images for training,” they write.
  Expensive: Deep learning approaches still seem fairly expensive, with the DL systems (HDT and MCFTS) attaining frames-per-second of 10.60 and 4.73 respectively when deployed on an Intel PC with a 1080 NVIDIA GPU and 32GB of RAM. SRDCF, by comparison, gets 12.29FPS.
– Read more: PTB-TIR: A Thermal infrared Pedestrian Tracking Benchmark (Arxiv).

OpenAI Bits&Pieces:

Scaling Kubernetes to 2,500 Nodes:
An account of some of the problems we ran into and workarounds we devised as we scaled up our large AI infrastructure.
– Read more: Scaling Kubernetes to 2,500 Nodes (OpenAI blog).

Tech Tales.

Earth, 2045:

Canary Wharf, London:

So I’m bent over trying to fit myself through a ventilation fan when I see them: two crates, sealed, tape still on them. I approach cautiously. The still air of the data center feels close, tomb-like. My suit is coated in dust from squeezing my way through the long-dormant fan. I put my hand on top of one of the boxes and close my eyes, imagining the inside of the crate and trying to will the things I am hunting for into existence. I take a deep breath and open the box.

There they are. Not as many as I’d hoped, but some. Each chip gets its own housing in a spongy, odorless, moisture-wicking, anti-static material. I peer in and see the familiar brand names: InnerEye, AccuVision, Mine+, Seeder. The manifest for the box lists a few others which are missing from the container, but I don’t fret. These will be enough.

You see, we knew Moore’s Law was ending, and we didn’t do much about it. Kind of like climate change. We just stared at the problem – again, similar; the dreadful consequence of energy distribution and dissipation over time – and built bigger fabs and crafted bigger chips and told people it was fine. But it wasn’t fine. In the background we were all driven to mass parrelalism, and this worked for a while – we built vast data centers around the world, all of us modeling ourselves on the early Google insight that The Datacenter is the Machine. Our advances were so impressive and consistent that people didn’t pay attention to the spiralling energy bills, or the diminishing returns we were getting from going big.

Then the wars happened. Some of them purely economic, others physical – ‘kinetic’ in terms used by certain military types. Fabs were destroyed. It’s not like we went back to the stone age, but we had to move back up the nanometre process curve for chip manufacturing all the way to 10nm – decades of progress, hiccuped away in fireballs. Now, sub-10nm node chips are all spoken for whether from government buyers, AI companies, or the family offices of the world’s billionaires who are all, as ever, obsessed with simulating a future that has not yet arrived and acting accordingly.

So that’s why people like me exist. We don’t go and buy new chips, we just go and find old ones. Because for certain things there’s no substitute for speed. They’re talking now about registering all chips so as to be able to spot ‘illegal AI activity’. So that’s creating even more demand for me and my services. I don’t much like to think about what happens to these chips after I hand them over – though it doesn’t take much thought to realize that the situations where you’re willing to pay this much money are life and death situations. Now whether these are for machines that guard or machines that hunt is another question.

Technologies that inspired this story: Moore’s Law, fab construction costs, the implicit geopolitics of compute.

Import AI