Import AI 149: China’s AI principles call for international collaboration; what it takes to fit a neural net onto a microcontroller; and solving Sudoko with a hybrid AI system

by Jack Clark

China publishes its own set of AI principles – and they emphasize international collaboration:
Principles for education, impacts of AI, cooperation, and AGI…
A coalition of influential Chinese groups have published a set of ethical standards for AI research, called the Beijing AI Principles. These principles are meant to govern how developers research AI, how they use it, and how society should manage AI. The principles heavily emphasize international cooperation at a time of rising tension between nations over the strategic implications of rapidly advancing digital technologies.

The principles were revealed last week by a coalition that included the Beijing Academy of Artificial Intelligence (BAAI), Tsinghua University, and a league of companies including Baidu, Alibaba, and Tencent. “The Beijing Principles reflect our position, vision and our willingness to create a dialogue with the international society,” said the director of BAAI, Zeng Yi, according to Xinhua. “Only through coordination on a global scale can we build AI that is beneficial to both humanity and nature”.

Highlights of the Beijing AI principles: Some of the notable principles include establishing open systems “to avoid data/platform monopolies”, that people should receive education and training “to help them adapt to the impact of AI development in psychological, emotional and technical aspects”, and that people should approach the technology with an emphasis on long-term planning, including anticipating the need for research focused on “the potential risks of Augmented Intelligence, Artificial General Intelligence (AGI) and Superintelligence should be encouraged”.

Why this matters: Principles are ones of the ways that large policy institutions develop norms to govern technology, so Beijing’s AI principles should be seen as a prism via which the Chinese government will seek to regulate aspects of AI. These principles will sit alongside multi-national principles like those developed by the OECD, as well as those developed by individual entities (eg: Google, OpenAI). The United States government is yet to outline the principles with which it will approach the development and deployment of AI technology, though it has participated in and supported the creation of the OECD AI principles.
  Read more: Beijing AI Principles (Official Site).
  Read more: Beijing publishes AI ethical standards, calls for int’l cooperation (Xinhua).

#####################################################

Faster, smaller, cheaper, better! Google trains SOTA-exceeding ‘EfficientNets’:
…What’s better than scaling up by width? Depth? Resolution? How about all three in harmony?…
Google has developed a way to scale up neural networks more efficiently and has used this technique to find a new family of neural network models called EfficientNets. EfficientNets outperform existing state-of-the-art image recognition systems, while being up to ten times as efficient (in terms of memory footprint).

How EfficientNets work: Compound Scaling: Typically, when scaling up a neural network, people fool around with things like width (how wide are the layers in the network), depth (how many layers are stacked on top of eachother), and resolution (what resolution are inputs being processed it). For this project, Google performed a large-scale study of the ways in which it could scale networks and discovered an effective approach it calls ‘compound scaling’, based on the idea that “in order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling”. EfficientNets are trained using a compound scaling method that scales width, depth, and resolution in an optimal way.

Results: Faster, cheaper, lighter, better! Google shows that it can train existing networks (eg, ResNet, MobileNet) with good performance properties by scaling them up using its compound training technique. The company also develops new EfficientNet models on the ImageNet dataset – widely considered to be a gold-standard for evaluating new systems – setting a new state-of-the-art score on image identification (both top-1 and top-5) accuracy, while achieving this with around 10X fewer parameters than other systems.  

Why this matters: As part of the industrialization of AI, we’re seeing organizations dump resources into learning how to train large-scale networks more efficiently, while preserving the performance of resource-hungry ones. To me, this is analogous to going from the expensive prototype phase of production of an invention, to the beginnings of mass production.
  Read more: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (Arxiv).
  Read more: EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling (Google AI Blog).

#####################################################

Pairing deep learning systems with symbolic systems, for SAT solving:
…Using neural nets for logical reasoning gets a bit easier…
Researchers with Carnegie Mellon University and the University of Southern California have paired deep learning systems with symbolic AI by creating MAXSAT, a differentiable satisfiability solver that can be knitted into larger deep learning systems. This means it is now easier to integrate logical structures into systems that use deep learning components.

Sudoko results: The SATNet model does well against a basic ConvNet model, as well as a model fed with a binary mask which indicates which bits need to be learned. SATNet outperforms these systems, scoring 98.3% on an original sudoko set when given the numeric inputs. More impressively, it obtains a score of 63.2% on ‘visual sudoko’ (traditional convnet: 0%), which is where they replace the digits with handwritten MNIST digits and feed it in. Specifically, they use a convnet to parse the figures in the Sudoko image, then pass this

Why this matters: Hybrid AI systems which fuse the general utility-class capabilities of deep learning components with more specific systems seems like a way to bridge traditional and symbolic AI, and making such systems be easy to add into larger systems. “Our hope is that by wrapping a powerful yet generic primitive such as MAXSAT solving within a differentiable framework, our solver can enable “implicit” logical reasoning to occur where needed within larger frameworks, even if the precise structure of the domain is unknown and must be learned from data”.
  Read more: SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver (Arxiv).

#####################################################

Squeezing neural nets onto microcontrollers via neural architecture search:
…Get ready for billions of things to gain deep learning-based sense&respond capacity…
Researchers with ARM ML Research and Princeton University want to make it easier for people to deploy advanced artificial intelligence capabilities onto microcontrollers (MCUs) – something that has been difficult to do so far because today’s neural networks techniques are too computationally expensive and memory-intensive to be easily deployed onto MCUs.

MCUs and why they matter: Microcontrollers are the sorts of ultra-tiny lumps of computation embedded in things like fridges, microwaves, very small drones, small cameras, and other electronic widgets. To put this in perspective, in the developed world a typical person will have around four distinct desktop-class chips (eg, their phone, a laptop, etc), while having somewhere on the order of three dozen MCUs; a typical mid-range car might pack as many as 30 MCUs inside itself.

MCUs shipped in 2019 (projection): 50 billion
GPUs shipped in 2018: 100 million

“The severe memory constraints for inference on MCUs have pushed research away from CNNs and toward simpler classifiers based on decision trees and nearest neighbors”, the researchers write. Therefore, it’s intrinsically valuable to be able to figure out how to train neural networks so they can fit into the small computational budget of an MCU (2k of RAM versus 1GB for a Raspberry Pi or 11GB for an NVIDIA 1080Ti GPU). To do this, the ARM and Princeton researchers have used multi-objective neural architecture search to jointly train networks that can fit inside the tight computational specifications of an average MCU.

Sparse Architecture Search (SpArSe): Their technique combines neural architecture search with network pruning, letting them jointly train a network against multiple objectives while continuously zeroing out some of its parameters during training. This makes it both easier to perform the (computationally expensive) NAS procedure, and creates better networks once training is finished. “Pruning enables SpArSe to quickly evaluate many sub-networks of a given network, thereby expanding the scope of the overall search. While previous NAS approaches have automated the discovery of performant models with reduced parameterizations, we are the first to simultaneously consider performance, parameter memory constraints, and inference-time working memory constraints”. SpArSe considers regular, depthwise, separable, and downsampled convolutions, and uses a Multi-Objective Bayesian Optimizer (MOBO!) for training.

Results: powerful performance in a small package: The researchers test their approach by training networks on the MNIST, CIFAR-10, CUReT, and Chars4k datasets. Their system obtains higher accuracies with lower parameters than other methods, typically out-performing them by a wide margin.

Why this matters: Techniques like neural architecture search are part of the broader industrialization of AI, as they make it dramatically easier for people to develop and evaluate new network types, essentially letting people trade off a $/computation cost against the $/AI-researcher-brain cost of having people come up with newer architectures. Though accuracies remain somewhat belower where we’d need for commercial deployment (the highesrt score that SpArSe obtains ia around 84% accuracy on image categorization for CIFAR-109, for instance), techniques like this suggest we’ll soon deploy crude sensing and analytical capabilities onto potentially billions to trillions of devices across the planet.
  Read more: SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers (Arxiv).

#####################################################

Judging synthetic imagery with the Classification Accuracy Score (CAS):
…Generative models are progressing, but how do we measure that? DeepMind has a suggestion…
How do we know an output from a generative model is, for lack of a better word, good? Mostly, we work this out by studying the output and making a qualitative judgement, eg, we’ll look at a hundred images generated by a big generative model and make a judgement call based on how reasonable the generations seem, or we’ll listen to the musical outputs of a model and rate it according to how well such outputs conform to our own sense of appropriate rhythm, tone, harmony, and so on. The problem with these evaluative schemes is that they’re highly qualitative and don’t give us good ways to quantitatively analyze the outputs of such models.

Now, researchers from DeepMind have come up with a new evaluation technique and task, which they call Classification Accuracy Score (CAS), to better assess the capabilities of generative models. CAS works by testing “the gap in performance between networks trained on real and synthetic data”, and in particular is designed to surface pathologies in the generative model being used.

CAS works like this: “for any generative model… we learn an inference network using only samples from the conditional generative model, and measure the performance of the inference network on a downstream task”. The intuition here is that “if the model captures the data distribution, performance on any downstream task should be similar whether using the original or model data”.

$$$: Researchers can expect to pay a few tens of dollars to evaluate any given system using the benchmark. “At the time of writing, one can compute the metric in 10 hours for roughly $15, or in 45 minutes for roughly $85 using TPUs”, they write.

Putting models under the microscope with CAS: The researchers use CAS to evaluate three generative models, BigGAN, Hierarchical Autoregressive Models (HAM), and a high-resolution Vector-Quantized Variational Autoencoder (high-res VQ-VAE). The evaluation surfaces a couple of notable things. 1) both the Hierarchical Autoregressive system and the High-Res VQ-VAE significantly outperform BigGAN, despite BigGAN generating some of the qualitatively most intriguing samples. The metric also helps identify which models are better at learning a broad set of distributions over the data, rather than over-fitting to a relatively small set of classes. This method also shows that high CAS scores don’t correlate to FID or Inception, highlighting the significant difference in how these metrics work.

CAS, versus other measures: There are other tools available to assess the outputs of generative models, including metrics like Inception Score (IS) and Frechet Inception Distance (FID). These techniques try to give a quantitative measure of the quality of the generations of the model, but have certain drawbacks. “Inception Score does not penalize a lack of intra-class diversity, and certain out-of-distribution samples to produce Inception Scores three times higher than that of the data. Frechet Inception Distance, on the other hand, suffers from a high degree of bias.

Why this matters: One of the challenges of working in artificial intelligence is working out what progress represents a real improvement, and what progress may in fact be illusory. Key to this is the development of more advanced measurement and assessment techniques. Approaches like CAS show us:
a) how surprisingly difficult it is to evaluate the capabilities of increasingly powerful generative models

  1. b) how (unintentionally) misleading metrics can be about the true underlying performance of a system
  2. c) how as we develop more advanced systems, we’ll likely need to develop more sophisticated assessment schemes.

All of this feels like a further sign of the advancing sophistication and deployment of AI systems – I’m wondering at what point AI evaluation becomes its own full-fledged sub-field of research.
  Read more: Classification Accuracy Score for Conditional Generative Models (Arxiv).

#####################################################

Drones learn in simulators, fly in reality:
…Domain randomization + drones = robust flight policies that cross the reality gap…
Researchers with the University of Zurich and the Intelligence Systems Lab at Intel have developed techniques to train drones to fly purely in simulation, then transferring to reality. This kind of ‘sim2real’ behavior is highly desirable for AI researchers, because it means systems can be rapidly developed and iterated on in software simulators, then executed and validated in the real world. Here, we can see how these techniques can be applied to let researchers train the perception component of a drone exclusively in simulation, then transfer it to reality.

How it works: domain randomization: This project relies on domain randomization, a technique some AI researchers use to generate additional training data. For this work, the researchers use a software-based simulator to generate various permutations of the environments that they want the drone to fly in, randomizing things like the visual properties of a scene, the shape of a gate for the drone to fly through, the background of the scene, and so on. They then generate globally optimal trajectories through these simulated courses, and the simulated drones are trained via imitation learning to mimic these policies. Because this is reinforcement learning, the drones are initially absolutely terrible at this, crashing frequently and generally bugging out. The authors solve the data collection task here by, charmingly, carrying the quadrotor through the track – they refer to this as “handheld mode”.

Testing: In tests, the researchers show that “more comprehensive randomization increases the robustness of the learned policy to unseen scenarios at different speeds”. They also show that network capacity has a big impact on performance, so running extremely small (and therefore computationally cheap) networks comes with an accuracy loss. They show that they can train networks which can generalize to different track layouts than onces they’ve been exposed to, as well as radically different real-world lighting conditions (which have frequently been a confounding factor for research in the past). In real world tests, the method does is able to perform on-par with human professional pilots at successfully navigating through various hoops in the track, though takes substantially longer (best human lap time: around 5 seconds; best drone lap time: between 12 and 16 seconds). They also show that systems trained with a mixture of simulated and real data can outperform systems trained purely with real world data alone.

Conclusion: Work like this gives us a sense of how rapidly drone systems are advancing in complexity and capability, and highlights how it’s going to be increasingly simple for people to use software-based tools to train drones either entirely or significantly in simulation, then transfer them to reality. This will likely speed up the prace of AI R&D progress in the drone sector (by making it less critical to test on real-world hardware), and makes them likely to be used as a destination for certain hard robotics benchmarks in the future.
  Read more: Deep Drone Racing: From Simulation to Reality with Domain Randomization (Arxiv).
  Watch a video of the work here (official YouTube video).

#####################################################

Tech Tales:

Racing Brains

“Hey, Mark! This car thinks it’s a drone!” he shouted, right before the car accelerated up the ramp and became airborne: it stayed in the air for maybe three seconds, and we watched its wheels turn pointless in the air; the car’s drone-brain thought it was reasonable to try and control it mid-air, and it wouldn’t learn that reality thought otherwise.

It landed, kicking up a cloud of dust around it, and skidded into a tight turn, then slipped out of view as it made its way down the course. A few people clapped. Others leaned in and joked with eachother. Some money changed hands. Then we all turned to look to the next vehicle coming down the course – this one was an electronic motorbike: lightweight, fast, sounding like a hornet as it came down the track. It took the ramp at speed, then landed and started weaving from side to side, describing a snake-like sinusoidal pattern in the dust of the track.

“What was in that thing?” I said to my friend.
“Racing eel – explains the weaving, right?”
“Right”

We looked at the dust and listened to the sounds of the machines in the distance. Then we all turned our heads and looked to the left to see another machine come over the horizon, and guess at where its brain came from.

Things that inspired this story: Imitation learning; sim2real; sim2x; robots; robots as entertainment; distortion and contortion as an aesthetic and an art form.