Import AI: #67: Inspecting AI with RNNVis; Facebook invents counter-intuitive language translation method; and what fractals have to do with neural architecture search

by Jack Clark

Welcome to Import AI, subscribe here.

All hail the AI inspectors: New ‘RNNVis’ software makes it easier to interpret the inner workings of recurrent nets.
…Figuring out why a particular neural network is classifying something in a certain way is a challenge. Engineers are trying to make that easier with new software to help visualize the hidden states of recurrent neural networks for text classification, giving people some graphical software to use to analyze neural network data.
…The RNNVis software helps people interpret the hidden states of RNNs by providing information about the distribution of hidden states, letting them explore hidden states at the sequence level, examine statistics of these states, and compare learning outcome of models. Some examples of how it can be used can include analyzing two subtly different sentences with slightly different sentiment classifications, inspecting which words and tones the network is activating on to help figure out why the classification is different.
…Read more here: Understanding Hidden Memories of Recurrent Neural Networks.

Facebook learns to translate between languages without access to shared texts:
…It-shouldn’t-work-but-it-does result shows how to translate by mapping different texts into a shared latent representational space…
…New research from Facebook shows how to translate from one language into another without the availability of a shared jointly-translated text. The process works by converting sentences from different languages into noisier representations of themselves, then learning to decode the noisy versions. An adversarial training technique is used to constrain the problem so that it reduces its inaccuracies over time by closing the distance between the noisy and clean conversions – this lets iterative training yield better performance.
…”The principle of our approach is to start from a simple unsupervised word-by-word translation model, and to iteratively improve this model based on a reconstruction loss, and using a discriminator to align latent distributions of both the source and the target languages,” they write.
…Results: The team compares its approach against supervised baselines. The third iteration of its fully unsupervised model attains a BLEU score of around 32.76 on English to French and 22.74 on English to German on the Multi30k-Task1 evaluation. This compares to supervised baselines of 56.83 and 35.16, respectively. For the WMD evaluation it gets 15.05 on English to French and 9.64 on English to German, compared to 27.97 and 21.33 for supervised.
…Data: I seems to intuitively make sense that the approach is very data intensive. The Facebook researchers use “the full training set of 36 million pairs” from the ‘WMT’14 English-French’ dataset, as well as 1.8 million sentences each of the ‘WMT’16 English-German’ dataset. They also use the Multi30k-Task1 dataset, which has 30,000 images in English/French/German with translations to each of eachother. ” We disregard the images and only consider the parallel annotations, with the provided training, validation and test sets, composed of 29,000, 1,000 and 1,000 pairs of sentences respectively.”
…The power of iteration: To get an idea of the power of this iterative, bootstrapping approach we can study the translation of a sentence from French into English.
…Source: une photo d’ une rue bondee en evill
…Iteration 0: a photo a street crowded in city .
…Iteration 1: a picture of a street crowded in a city .
…Iteration 2: a picture of a crowded city street .
…Iteration 3: a picture of a crowded street in a city .
…Reference: a view of a crowded city street .
…Read more here: Unsupervised Machine Translation Using Monolingual Corpora Only.

Sponsored: No matter what IA stage your organization may sit, the Intelligent Automation – New Orleans conference taking place December 6 – 8, has variety of sessions that will enable your team to prepare for, and/or improve upon their current IA initiatives.
View Agenda.
…Expert speakers from organizations including: AbbVie, AIG, Citi, Citizens Bank, FINRA, Gap Inc., JPMorgan Chase, Lindt & Sprungli, LinkedIn, NASA, Sony, SWBC, Sysco, TXU Energy
…Exclusive 20% Discount: Use Code IA_IMPORTAI

Chinese facial recognition company raises $460 million:
…Megvii Inc, also known as Face++, has raised money from a Chinese state venture fund, as well as others. It sells facial identification services to a variety of companies, including Ant Financial and Lenovo.
…China scale: Megvii “uses facial scans held in a Ministry of Public Security database drawn from legal identification files on about 1.3 billion Chinese” citizens, Bloomberg reports.
…Read more: China, Russia Put Millions in This Startup to Recognize Your Face.

What actual data scientists use at actual jobs (hint: it’s not a mammoth RL system with auxiliary losses).
…Data science competition platform Kaggle has surveyed 16,000 people to produce a report shedding light on the backgrounds, skills, salaries, and so on, of its global community of data wranglers.
…A particularly enlightening response is the breakdown by Kaggle of what its data science users claim to use in their day to day life. Coming in at #1 is logistic regression, followed by decision trees (#2), random forests (#3), neural networks (#4), and bayesian techniques (#5). In other words, tried-and-tested technologies beat out some of the new shiny ones, despite their research success.
…(One interesting quirk: In people working in the military and security industries neural networks are used slightly more frequently than logistic regression.
…Programming languages: Python is the most popular programming tool across all employed data scientists, followed by R, SQL, Jupyter notebooks, and TensorFlow. (No other modern deep learning framework gets a method besides Spark/MLlib).
Read more here: The State of Data Science & Machine Learning.

What does AI really learn? DeepMind scientists probe language agents to find out:
…As people build more complicated AI systems that contain more moving parts, it gets increasingly difficult to figure out what exactly the AI agents are learning. DeepMind researchers are trying to tackle this problem by using their 3D ‘DeepMind Lab’ environment to perform controlled experiments on basic AI agents that are being taught to link English words to visual objects.
…The researchers note a couple of interesting traits in their agents which match some of the traits displayed by humans, namely:
…”Shape / colour biases: If the agent is exposed to more shape words than colour words during training, it can develop a human-like propensity to assume that new words refer to the shape rather than the colour of objects. However, when the training distribution is balanced between shape and colour terms, it develops an analogous bias in favour of colours.”
…” The problem of learning negation: The agent learns to execute negated instructions, but if trained on small amounts of data it tends to represent negation in an ad hoc way that does not generalise.”
…”Curriculum effects for vocabulary growth: The agent learns words more quickly if the range of words to which it is exposed is limited at first and expanded gradually as its vocabulary develops.”
…”Semantic processing and representation differences: The agent learns words of different semantic classes at different speeds and represents them with features that require different degrees of visual processing depth (or abstraction) to compute.”
…They theorize that the fact agents learn to specialize their recognition and labeling mechanisms with biases like shapes versus colors helps them learn words more rapidly. Eventually they expect to train agents on real world data. “This might involve curated sets of images, videos and naturally-occurring text etc, and, ultimately, experiments on robots trained to communicate about perceptible surroundings with human interlocutors.”
…Read more: Understanding Grounded Language Learning Agents.

Mandelbrot’s Revenge: New architecture search technique creates self-repeating, multi-scale, fractal-like structures.
DeepMind research attains state-of-the-art results on CIFAR-10 (3.63% error), competitive ImageNet validation set results (5.2% error)…
…A new AI development technique means that the way we could design neural network architectures of the future has a lot in common with the self-repeating mathematical structures found in fractals – that’s the implication of a new paper from DeepMind which shows how to design neural networks using a combination of self-repeating motifs that repeat across multiple small and large timescales. The approach works by using neural architecture search (NAS) techniques to get AI systems to discover the primitive building blocks that they can use to build large-scale neural network architectures. As the network is evolved it tries to build larger components of itself out of these building blocks, which it automatically discovers. Changes to the network are propagated down to the level of the individual building blocks, letting it continually adjust at all scales during training.
Efficiency: The approach is surprisingly efficient, allowing DeepMind to design more efficient architectures using less power and less wallclock time than many other techniques, being competitive with far more expensive networks. ”
…”As far as the architecture search time is concerned, it takes 1 hour to compute the fitness of one architecture on a single P100 GPU (which involves 4 rounds of training and evaluation). Using 200 GPUs, it thus takes 1 hour to perform random search over 200 architectures and 1.5 days to do the evolutionary search with 7000 steps. This is significantly faster than 11 days using 250 GPUs reported by (Real et al., 2017) and 4 days using 450 GPUs reported by (Zoph et al., 2017),” the researchers write.
Drawbacks: The approach has some drawbacks, namely that the variance of these networks is relatively high, so you’re going to need to go through multiple runs of development to come up with top-scoring architectures. This means that the efficiency gains may be reduced slightly if you’re unlucky to get a bad random seed when designing the network. “When training CIFAR models, we have observed standard deviation of up to 0.2% using the exact same setup. The solution we adopted was to compute the fitness as the average accuracy over 4 training-evaluation runs,” they write.It’s also not quite so efficient as other techniques when it comes to the number of parameters in the network: “Our ImageNet model has 64M parameters, which is comparable to Inception-ResNet-v2 (55.8M) but larger than NASNet-A (22.6M).”
Read more here: Hierarchical Representations for Efficient Architecture Search.

Google AI researchers continue to work diligently to automate themselves:
Neural architecture search applied at unprecedentedly large-scales, evolving architectures that yield state-of-the-art results with far less architecture specification required by humans…
Datasets used: Imagenet for image recognition and MS COCO for object detection.
Optimization techniques: To reduce the search space Google first applied neural architecture search techniques to a model trained on CIFAR-10, then used the best learned architecture from that experiment as the seed for the Imagenet/COCO models.
…The results: On ImageNet image classification, NASNet achieves a prediction accuracy of 82.7% on the validation set, surpassing all previous Inception models that we built [2, 3, 4]. Additionally, NASNet performs 1.2% better than all previous published results and is on par with the best unpublished result reported on [5].” They were able to get state-of-the-art on the COCO object detection task by combining features learn from the ImageNet model with the Faster-RCNN technique.
…Most intriguing: You can resize the released model, named NASNet, to create pre-trained models suitable to run on devices with far fewer compute resources (eg, smartphones, perhaps embedded devices).
Code: Pre-trained NASNET models for image recognition and classification are available in the GitHub Slim and Object Detection TensorFlow repositories, Google says.
…Read more: AutoML for large scale image classification and object detection.

Is that a bird? Is that a plane? It doesn’t matter – your neural network thinks it’s an air freshener.
…MIT student-run research team ‘LabSix‘ takes adversarial examples into the real world.
…Adversarial examples are images – and now 3D objects – that confuse neural network-based classifiers, causing them to misclassify the objects they see in front of them. Adversarial examples work on 2D images – both digital and printed out ones – and on real-world 2D(ish) entities like stop signs. They’re robust to rotations and other variations. People worry about them because until we fix them we a) don’t clearly understand how our neural network-based image classifiers work and b) are going to have to deploy AI into a world where we know it’s possible to corrupt AI classifiers using techniques imperceptible to humans.
…Now, researchers have shown that you can make three dimensional adversarial examples, illustrating just how broad the vulnerability is. Their technique let them create a sculpture of a turtle that is classified at every angle as a gun, and a baseball consistently misclassified as an espresso.
…”We do this using a new algorithm for reliably producing adversarial examples that cause targeted misclassification under transformations like blur, rotation, zoom, or translation, and we use it to generate both 2D printouts and 3D models that fool a standard neural network at any angle,” they write.
Read more here: Physical Objects That Fool Neural Nets.

Uber releases its own programming language:
Any company with sufficiently large ambitions in AI will end up releasing its own open source programming language – let’s call this tendency ‘TensorFlowing’ – to try to shape how the AI tooling landscape evolves and to create developer communities to eventually hire talent from. That’s been the strategy behind Microsoft’s CNTK, Amazon’s MXNet, Facebook’s PyTorch and Caffe2, Google’s TensorFlow and Keras, and so on.
…Now Uber is releasing Pyro, a probabilistic programming language, based on the PyTorch library.
…”In Pyro, both the generative models and the inference guides can include deep neural networks as components. The resulting deep probabilistic models have shown great promise in recent work, especially for unsupervised and semi-supervised machine learning problems,” Uber writes.
Read more here: Uber AI Labs Open Sources Pyro, a Deep Probabilistic Programming Language.

OpenAI Bits & Pieces:

AI Safety and AI Policy at the Center for a New American Security (CNAS):
…You can watch OpenAI’s Dario Amodei talk about issues in AI safety at a conference held in Washington DC last week, from about 1:10:10 in the ‘Part 1’ video on this page.
…You can watch me talk about some policy issues in AI relating to dual use and measurement from about 11:00 into the ‘Part 3’ video.
Check out the videos here.

Tech Tales:

AvianLife(™) is the top grossing software developed by Integrated Worlds Incorporated (IWI), an AI-slash-world-simulator-startup based in Humboldt, nestled among the cannabis farms and redwood trees in Northern California. Revenues from AvianLife outpace those from other IWI products five-fold, so in the four years the product has been in existence its development team has grown and absorbed many of the other teams at IWI.

But AvianLife(™) has a problem: it doesn’t contain realistic leaves. The trees in the simulator are adorned with leaflike entities, and they change color according to the march of simulated time, and even fall to the simulated ground on their proper schedules (with just enough randomness to feel unpredictable). But they don’t bend – that’s a problem, because AvianLife(™) recently gained an update containing a rare species of bird, known to use leaves and twigs and other things to assemble its nest out of.

Soon after the bird update is released AvianLife is inundated with bug complaints from users, as simulated birds struggle to build correct nests out of the rigid leaves, sometimes stacking them in lattices that stretch meters into the sky, or shaping other stacks of the flat overalls into near-perfect cubes. One bird figures out how to stack the leaves edge-first, creating walls around itself that are invisible from the side due to the two-dimensionality of the leaves. The fiendishly powerful AI algorithms of the AIs explore the new possibilities made possible by the buggy leaves. Some of their quirkier decisions are applauded by players, mistaking bugs for the creativity of their programmers.

A new software update solves the problem by marginally reducing the intelligence of the bird-loving birds, and by increasing the thickness of the leaves to avoid the two-dimensional problems. Next, they’ll add in support for non-rigid physics and give the simulated birds the simulated leaves they deserve.

Technologies that inspired this story: Physics simulators, ecosystems, flaws.