Import AI

Category: Uncategorized

Import AI 112: 1 million free furniture models for AI research, measuring neural net drawbacks via studying hallucinations, and DeepMind boosts transfer learning with PopArt

When is a door not a door? When a computer says it is a jar!
Researchers analyze neural network “hallucinations” to create more robust systems…
Researchers with the University of California at Berkeley and Boston University have devised a new way to measure how neural networks sometimes generate ‘hallucinations’ when attempting to caption images. “Image captioning models often “hallucinate” objects that may appear in a given context, like e.g. a bench here.” Developing a better understanding of why such hallucinations occur – and how to prevent them occurring – is crucial to the development of more robust and widely used AI systems.
  Measuring hallucinations: The researchers propose ‘CHAIR’ (Caption Hallucination Assessment with Image Relevance) as a way to assess how well systems generate captions in response to images. CHAIR calculates what proportion of generated words correspond to the contents of an image, according to the ground truth sentences and the output of object segmentation and labelling algorithms. So, for example, in a picture of a small puppy in a basket, you would give a system fewer points for giving the label “a small puppy in a basket with cats”, compared to “a small puppy in a basket”. In evaluations they find that on one test set “anywhere between 7.4% and 17.5% include a hallucinated object”.
  Strange correlations: Analyzing what causes these hallucinations is difficult. For instance, the researchers note that “we find no obvious correlation between the average length of the generated captions and the hallucination rate”. There is some more correlation among hallucinated objects, though. “Across all models the super-category Furniture is hallucinated most often, accounting for 20-50% of all hallucinated objects. Other common super-categories are Outdoor objects, Sports and Kitchenware,” they write. “The dining table is the most frequently hallucinated object across all models”.
  Why it matters: If we are going to deploy lots of neural network-based systems into society then it is crucial that we understand the weaknesses and pathologies of such systems; analyses like this give us a clearer notion of the limits of today’s technology and also indicate lines of research people could pursue to increase the robustness of such systems. “We argue that the design and training of captioning models should be guided not only by cross-entropy loss or standard sentence metrics, but also by image relevance,” the researchers write.
  Read more: Object Hallucination in Image Captioning (Arxiv).

Humans! What are they good for? Absolutely… something!?
…Advanced cognitive skills? Good. Psycho-motor skills? You may want to retrain…
Michael Osborne, co-director of the Oxford Martin Programme on Technology and Unemployment, has given a presentation about the Future of Work. Osborn attained some level of notoriety within ML a while ago for publishing a study that said 47% of jobs could be at risk of automation. Since then he has been further fleshing out his ideas; a new presentation from him sees him analyze some typical occupations in the UK and try to estimate their probably for increased future demand for these roles. The findings aren’t encouraging: Osborne’s method predicts  a low probability demand for new truck drivers in the UK, but a much higher demand for waiters and waitresses.
  What skills should you learn: If you want to fare well in an AI-first economy, then you should invest in advanced cognitive skills such as: judgement and decision making, systems evaluation, deductive reasoning, and so on. The sorts of skills which will be of less importance over time (for humans, at least), will be ‘psycho-motor’ skills: control precision, manual dexterity,  night vision, sound localization, and so on. (A meta-problem here is that many of the people in jobs that demand psycho-motor skills don’t get the opportunity to develop the advanced cognitive skills that it is thought the future economy will demand.
  Why it matters: Analyzing how AI will and won’t change employment is crucial work whose findings will determine the policy of many governments. The problems being surfaced by researchers such as Osborne is that the rapidity of AI’s progress, combined with its tendency to automate an increasingly broad range of tasks, threatens traditional notions of employment. What kind of future do we want?
  Read more: Technology at Work: The Future of Automation (Google Slide presentation).

What’s cooler than 1,000 furniture models? 1 million ones. And more, in InteriorNet:
…Massive new dataset gives researchers new benchmark to test systems against…
Researchers with Imperial College London and Chinese furnishing-VR startup Kujiale, have released InteriorNet, a large-scale dataset of photographs of complex, realistic interiors. InteriorNet contains around 1 million CAD models of different types of furniture and furnishing, which over 1,100 professional designers have subsequently used to create around 22 million room layouts. Each of these scenes can also be viewed under a variety of different lighting conditions and contexts due to the use of an inbuilt simulator called ViSim, which ships with the dataset and has also been released by the researchers. Purely based on the furniture contents this is one of the single largest datasets I am aware of for 3D scene composition and understanding.
  Things that make you go ‘hmm’: In the acknowledgements section of the InteriorNet website the researchers not only thank Kujiale for providing them with the furniture models but also for access to “GPU/CPU clusters” – could this be a pattern for future private-public collaborations where along with sharing expertise and financial resources the private sector also shares compute resources; that would make sense given the ballooning computational demands of many new AI techniques.
  Read more: InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset (website).

Lockheed Martin launches ‘AlphaPilot’ competition:
…Want better drones but not sure exactly what to build? Host a competition!…
Aerospace and defense company LockHeed Martin wants to create smarter drones so the company is hosting a competition, in collaboration with the Drone Racing League and with NVIDIA, to create drones with enough intelligence to race through professional drone racing courses without human intervention.
  Prizes: Lockheed says the competition will “award more than $2,000,000 in prizes for its top performers”.
  Why it matters: Drones are already changing the character of warface by virtue of their asymmetry: a fleet of drones, each costing a few thousand dollars apiece, can pose a robust threat to things that cost tens (planes) to hundreds (naval ships, military bases) to billions of dollars (aircraft carriers, etc). Once we add greater autonomy to such systems they will pose an even greater threat, further influencing how different nations budget for their military R&D, and potentially altering investment into AI research.
  Read more: AlphaPilot (Lockheed Martin official website).

Could Space Fortress be 2018’s Montezuma’s Revenge?
…Another ancient game gets resuscitated to challenge contemporary AI algorithms…
Another week brings another potential benchmark to test AI algorithms’ performance against. This week, researchers with Carnegie Mellon University have made the case for using a late-1980s game called ‘Space Fortress’ to evaluate new algorithms. Their motivation for this is twofold: 1) Space Fortress is currently unsolved via mainstream RL algorithms such as Rainbow, PPO, and A2C, and 2) Space Fortress was developed by a psychologist to study human skill acquisition, so we have good data to compare AI performance to.
  So, what is Space Fortress: Space Fortress is a game where a player flies around an arena shooting missiles at a fortress in the center. However, the game adds some confounding factors: the fortress is only intermittently attackable, so the player must learn to fire their shots at greater than 250ms apart while the fortress is in its ‘invulnerable’ state, then once they have landed ten of these 250ms-apart shots the Fortress switches into an invulnerable state, at which point the player needs to attack it with two shots fired 250ms apart. This makes for a challenging environment for traditional AI algorithms because “the firing strategy completely reverses at the point when vulnerability reaches 10, and the agent must learn to identify this critical point to perform well,” they explain.
  Two variants: While developing their benchmarks the researchers developed a simplified version of the game called ‘Autoturn’ which automatically orients the ship towards the forest. The harder environment (which is the unmodified original game) is subsequently referred to as Youturn.
  Send in the humans: 117 people played 20 games of Space Fortress (52: Autoturn. 65: Youturn). The best performing people got scores of 3,000 and 2314 on Autoturn and Youturn, respectively, and the average score across all human entrants was 1,810 for Autoturn and -169 for Youturn.
  Send in the (broken) RL algorithms: sparse rewards: Today’s RL algorithms fare very poorly against this system when working on a sparse reward version of the environment. PPO, the best performing tested algorithm, gets an average score of -250 on Autoturn and -5269 on Youturn, with A2C performing marginally worse. Rainbow, a complex algorithm that lumps together a range of improvements to the DQN algorithm and currently gets high scores across Atari and DM Lab environments, gets very poor results here, with an average score of -8327 on Autoturn and -9378 on Youturn.
  Send in the (broken) RL algorithms: dense rewards: The algorithms fair a little better when given dense rewards (which provides a reward for each hit of the fortress, and a penatly if the fortress is reset due to player’s firing too rapidly). This modification gives Space Fortress a reward density that is comparable to Atari games. Once implemented, the algorithms fair better, with PPO obtaining average scores of -1294 (Autoturn) and -1435 (Youturn).
  Send in the (broken) RL algorithms: dense rewards + ‘context identification’: The researchers further change the dense reward structure to help the agent identify when the Space Fortress switches vulnerability state, and when it is destroyed. Implementing this lets them train PPO to obtain average scores around ~2,000; a substantial improvement, but still not as good as a decent human.
  Why it matters: One of the slightly strange things about contemporary AI research is how coupled advances seem to be with data and/or environments: new data/environments highlights the weaknesses of existing algorithms, which provokes further development. Platforms like SpaceFortress will give researchers access to a low-cost testing environment to explore algorithms that are able to learn to model events over time and detect correlations and larger patterns – an area critical to the development of more capable AI systems. The researchers have released SpaceFortress as an OpenAI Gym environment, making it easier for other people to work with it.
  Read more: Challenges of Context and Time in Reinforcement Learning: Introducing Space Fortress as a Benchmark (Arxiv).

Venture Capitalists bet on simulators for self-driving cars:
…Applied Intuition builds simulators for self-driving brains….
Applied Intuition, a company trying to build simulators for self-driving cars, has uncloaked with $11.5 million in funding. The fact venture capitalists are betting on it is notable as it indicates how strategic data has become for certain bits of AI, and how investors are realizing that instead of betting on data directly you can instead bet on simulators and thus trade compute for data. Applied Intuition is a good example of this as it lets companies rent an extensible simulator which they can use to generate large amounts of data to train self-driving cars with.
  Read more: Applied Intuition – Advanced simulation software for autonomous vehicles (Medium).

DeepMind improves transfer learning with PopArt:
…Rescaling rewards lets you learn interesting behaviors and preserves meaningful game state information…
DeepMind researchers have developed a technique to improve transfer learning, demonstrating state-of-the-art performance on Atari. The technique, Preserving Outputs Precisely while Adaptively Rescaling Targets (PopArt) works by ensuring that the rewards outputted by different environments are normalized relative to eachother, so using PopArt an agent would get a similar score for, say, crossing the road in the game ‘Frogger’ or eating all the Ghosts in Ms PacMan, despite these important activities getting subtly different rewards in each environment.
  With PopArt, researchers can now automatically “adapt the scale of scores in each game so the agent judges the games to be of equal learning value, no matter the scale of rewards available in each specific game,” DeepMind writes. This differs to reward clipping which is where people typically squash the rewards down to between -1 and +1. “With clipped rewards, there is no apparent difference for the agent between eating a pellet or eating a ghost and results  in agents that only eat pellets, and never bothers to chase ghosts, as this video shows.  When we remove reward clipping and use PopArt’s adaptive normalisation to stabilise learning, it results in quite different behaviour, with the agent chasing ghosts, and achieving a higher score, as shown in this video,” they explain.
  Results: To test their approach the researchers evaluate the effect of applying PopArt to ‘IMPALA’ agents, which are among the most popular algorithms currently being used at DeepMind. PopArt-IMPALA systems obtain roughly 101% of human performance as an average across all 57 Atari games, compared to 28.5% for IMPALA on its own. Performance also improves significantly on DeepMind Lab-30, a collection of 30 3D environments based on the Quake 3 engine.
  Why it matters: Reinforcement learning research benefited from the development of increasingly efficient algorithms and training methods; techniques like PopArt should benefit research into transfer learning when training via RL as it gives us new generic techniques to increase the amount of experience agents can accrue in different environments, which will yield further understanding of the limits of simple transfer techniques, helping researchers identify areas for the development of new algorithmic techniques.
  Read more: Multi-task Deep Reinforcement Learning with PopArt (Arxiv).
  Read more: Preserving Outputs Precisely while Adaptively Rescaling Targets (DeepMind blog).

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Resignations over Google’s China plans:
A senior research scientist at Google has publicly resigned in protest at the company’s planned re-entry into China (code-named Dragonfly), and reports that he is one of five to do so. Google is currently developing a search engine compliant with Chinese government censorship, according to numerous reports first sparked by a story in The Intercept.
  The AI principles: The scientist claims the alleged plans violated Google’s AI principles, announced in June, which include a pledge not to design or deploy technologies “whose purpose contravenes widely accepted principles of international law and human rights.” Without knowing more about the plans, it is hard to judge whether it contravenes the carefully worded principles. Nonetheless, the relevant question for many will be whether it violates the the standards tech giants should hold themselves to.
  Why it matters: This is the first public test of Google’s AI principles, and could have lasting effects both on how tech giants operate in China, and how they approach public ethical commitments. The principles were first announced in response to internal protests over Project Maven. If they are seen as having been flouted so soon after, this could prompt a serious loss of faith in the Google’s ethical commitments going forward.
  Read more: Senior Google scientist resigns (The Intercept).
  Read more: AI at Google: Our principles (Official Google blog).

Google announces inclusive image recognition challenge:
Large image datasets, such as ImageNet, have been an important driver of progress in computer vision in recent years. These databases exhibit biases along multiple dimensions, though, which can easily be inherited by models trained on them. For example, the post shows a classifier failing to identify a wedding photo in which the couple are not wearing European wedding attire.
  Addressing geographic bias: Google AI have announced an image recognition challenge to spur progress in addressing these biases. Participants will use a standard dataset (i.e. skewed towards images from Europe and North America) to train models that will be evaluated using image sets covering different, unspecified, geographic regions – Google describes this as a geographic “stress test”. This will challenge developers to develop inclusive models from skewed datasets. “this competition challenges you to use Open Images, a large, multi-label, publicly-available image classification dataset that is majority-sampled from North America and Europe, to train a model that will be evaluated on images collected from a different set of geographic regions across the globe,” Google says.
  Why it matters: For the benefits of AI to be broadly distributed amongst humanity, it is important that AI systems can be equally well deployed across the world. Racial bias in face recognition has received particular attention recently, given that these technologies are being deployed by law enforcement, raising immediate risks of harm. This project has a wider scope than face recognition, challenging classifiers to identify a diverse range of faces, objects, buildings etc.
  Read more: Introducing the inclusive images competition (Google AI blog).
  Read more: No classification without representation (Google).

DARPA announces $2bn AI investment plan:
DARPA, the US military’s advance technology agency, has announced ‘AI Next’, a $2bn multi-year investment plan. The project has an ambitious remit, to “explore how machine can acquire human-like communication and reasoning capabilities”, with a goal of developing systems that “function more as colleagues than as tools.”
  Safety as a focus: Alongside their straightforward technical goals, they identify robustness and addressing adversarial examples as two of five core focuses. This is an important inclusion, signalling DARPA’s commitment to leading on safety as well as capabilities.
  Why it matters: DARPA has historically been one of the most important players in AI development. Despite the US still not having a coordinated national AI strategy, the DoD is such a significant spender in its own right that it is nonetheless beginning to form its own quasi-national AI strategy. The inclusion of research agendas in safety is a positive development. This investment likely represents a material uptick in funding for safety research.
  Read more: AI Next Campaign (DARPA).
  Read more: DARPA announces $2bn campaign to develop next wave of AI technologies (DARPA).

OpenAI Bits & Pieces:

OpenAI Scholars Class of 18: Final Projects:
Find out about the final projects of the first cohort of OpenAI Scholars and apply to attend a demo day in San Francisco to meet the Scholars and hear about their work – all welcome!
  Read more: OpenAI Scholars Class of ’18: Final Projects (OpenAI Blog).

Tech Tales:

All A-OK Down There On The “Best Wishes And Hope You’re Well” Farm

You could hear the group of pensioners before you saw them; first, you’d tilt your head as though tuning into the faint sound of a mosquito, then it would grow louder and you would cast your eyes up and look for beatles in the air, then louder still and you would crane your head back and look at the sky in search of low-flying planes: nothing. Perhaps then you would look to the horizon and make out a part of it alive with movement – with tremors at the limits of your vision. These tremors would resolve over the next few seconds, sharpening into the outlines of a flock of drones and, below them, the old people themselves – sometimes walking, sometimes on Segways, sometimes carried in robotic wheelbarrows if truly infirm.

Like this, the crowd would come towards you. Eventually you could make out the sound of speech through the hum of the drones: “oh very nice”, “yes they came to visit us last year and it was lovely”, “oh he is good you should see him about your back, magic hands!”.

Then they would be upon you, asking for directions, inviting you over for supper, running old hands over the fabric of your clothing and asking you where you got it from, and so on. You would stand and smile and not say much. Some of the old people would hold you longer than the others. Some of them would cry. One of them would say “I miss you”. Another would say “he was such a lovely young man. What a shame.”

Then the sounds would change and the drones would begin to fly somewhere else, and the old people would follow them, and then again they would leave and you would be left: not quite a statue, but not quite alive, just another partially-preserved consciousness attached to a realistic AccompanyMe ‘death body’, kept around to reassure the ones who outlived you, unable to truly die till they die because, according to the ‘ethical senescence’ laws, your threshold consciousness is sufficient to potentially aid with the warding off of Alzheimers and other diseases of the aged. Now you think of the old people as reverse vultures: gathering around and devouring the living, and departing at the moment of true death.

Things that inspired this story: Demographic timebombs, intergenerational theft (see: Climate Change, Education, Real Estate), old people that vote and young people that don’t.

Import AI 111: Hacking computers with Generative Adversarial Networks, Facebook trains world-class speech translation in 85 minutes via 128 GPUs, and Europeans use AI to classify 1,000-year-old graffiti.

Blending reality with simulation:
…Gibson environment trains robots with systems and embodiment designed to better map to real world data…
Researchers with Stanford University and the University of California at Berkeley have created Gibson, an environment for teaching agents to learn to navigate spaces. Gibson is one of numerous navigation environments available to modern researchers and its distinguishing characteristics include: basing the environments on real spaces, and some clever rendering techniques to ensure that images seen by agents within Gibson more closely match real world images by “embedding a mechanism to dissolve differences between Gibson’s renderings and what a real camera would produce”.
  Scale: “Gibson is based on virtualizing real spaces, rather than using artificially designed ones, and currently includes over 1400 floor spaces from 572 full buildings,” they write. The researchers also compare the total size of the Gibson dataset to other large-scale environment datasets including ‘SUNCG’ and Matterport3D, showing that Gibson has reasonable navigation complexity and a lower real-world transfer error than other systems.
  Data gathering: The researchers use a variety of different scanning devices to gather the data for Gibson, including NavVis, Matterport, and Dotproduct.
  Experiments: So how useful is Gibson? The researchers perform several experiments to evaluate its effectiveness. These include experiments around local planning and obstacle avoidance; distant visual navigation; and climbing stairs, as well as transfer learning experiments that measure the depth estimation and scene classification capabilities of the system .
  Limitations: Gibson has a few limitations, which include a lack of support for dynamic content (such as other moving objects) as well as no support for manipulation of the environment around itself. Future tests will involve testing if Gibson can work on finished robots as well.
  Read more: Gibson Env: Real-World Perception for Embodied Agents (Arxiv).
  Find out more: Gibson official website.
  Gibson on GitHub.

Get ready for medieval graffiti:
…4,000 images, some older than a thousand years, from an Eastern European church…
Researchers with the National Technical University of Ukraine have created a dataset of images of medieval graffiti written in two alphabets (Glagolitic and Cyrillic) on the St. Sophia Cathedral of Kiev in the Ukraine, providing researchers with a dataset they can use to train and develop supervised and unsupervised classification and generation systems.
  Dataset: The researchers created a dataset of Carved Glagolitic and Crillic letters (CGCL), consisting of more than 4,000 images of 34 types of letters.
  Why it matters: One of the more remarkable aspects of basic supervised learning is that given sufficient data it becomes relatively easy to automate the perception of something in the world – further digitization of datasets like these increases the likelihood that in the future we’ll use drones or robots to automatically scan ancient buildings across the world, identifying and transcribing thoughts inscribed hundreds or thousands of years ago. Graffiti never dies!
  Read more: Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti (Arxiv).

Learning to create (convincing) fraudulent network traffic with Generative Adversarial Networks:
…Researchers simulate traffic against a variety of (simple) intrusion detection algorithms; IDSGAN succeeds in fooling them…
Researchers with the Shanghai Jiao Tong University and the Shanghai Key Laboratory of Integrated Administration Technologies for Information Security have used generative adversarial networks to create malicious network traffic than can evade the attention of some intrusion detection systems. Their technique, IDSGAN, is based on Wasserstein GAN, and trains a generator to create adversarial malicious traffics and trains a  discriminator to assist a black-box intrusion detection system in classifying this traffic into benign or malicious categories.
  “The goal of the model is to implement IDSGAN to generate malicious traffic examples which can deceive and evade the detection of the defense systems,” they explain.
  Testing: To test their approach the researchers use NSL-KDD, a dataset containing internet traffic data as well as four categories of malicious traffic: probing, denial of service, user to root, and root to local. They also use a variety of different algorithms to play the role of the intrusion detection system, including approaches based on support vector machines, naive bayes, multi-layer perception, logistic regression, decision tree, random forest, and k-nearest neighbor. Tests show that the IDSGAN approach leads to a significant drop in detection rates for things like DDoS drops from around 70-80% to around 3-8% across the entire suite of methods.
  Cautionary note: I’m not convinced this is the most rigorous testing methodology you can run such a system through and I’m curious to see how such approaches fair against commercial-off-the-shelf intrusion detection systems.
  Why it matters: Cybersecurity is going to be a natural area for significant AI development due to the vast amounts of available digital data and the already clear need for human cybersecurity professionals to be able to sift through ever larger amounts of data to create strategies resilient to external aggressors. With (very basic) approaches like this demonstrating the viability of AI to this problem it’s likely adoption will increase.
  Read more: IDSGAN: Generative Adversarial Networks for Attack Generation against Intrusion Detection (Arxiv).

Facial recognition becomes a campaign issue:
…Two signs AI is impacting society: police are using it, and politicians are reacting to the fact police are using it…
Cynthia Nixon, currently running to be the governor of New York, has noticed recent reporting on IBM building a skin-tone-based facial recognition classification system and said that such systems wouldn’t be supported by her, should she win. “The racist implications of this are horrifying. As governor, I would not fund the use of discriminatory facial recognition software,” Nixon tweeted.

Using simulators to build smarter drones for disasters:
…Microsoft’s ‘AirSim’ used to train drones to patrol and (eventually) spot simulated hazardous materials…
Researchers with the National University of Ireland Galway have hacked around with a drone simulator to build an environment that they can use to train drones to spot hazardous materials. The simulator is “focused on modelling phenomena relating to the identification and gathering of key forensic evidence, in order to develop and test a system which can handle chemical, biological, radiological/nuclear or explosive (CBRNe) events autonomously”.
  How they did it: The researchers hacked around with their simulator to implement some of the weirder aspects of their test, including: simulating chemical, biological, and radiological threats. The simulator is integrated with Microsoft Research’s ‘AirSim’ drone simulator. They then explore training their drones in a simulated version of the campus of the National University of Ireland, generating waypoints and routes for them to patrol. The results so far are positive: the system works, it’s possible to train drones to navigate within it, and it’s even possible to (crudely) simulate physical phenomena associated with CBRNe events.
  What next: For the value of the approach to be further proven out the researchers will need to show they can train simulated agents within this system that can easily identify and navigate hazardous materials. And ultimately, these systems don’t mean much without being transferred into the real world, so that will need to be done as well.
  Why it matters: Drones are one of the first major real-world platforms for AI deployment since they’re far easier to develop AI systems for than robots, and have a range of obvious uses for surveillance and analysis of the environment. I can imagine a future where we develop and train drones to patrol a variety of different environments looking for threats to that environment (like the hazardous materials identified here), or potentially to extreme weather events (fires, floods, and so on). In the long term, perhaps the world will become covered with hundreds of thousands to millions of autonomous drones, endlessly patrolling in the service of awareness and stability (and other uses that people likely feel more morally ambivalent about).
  Read more: Using a Game Engine to Simulate Critical Incidents and Data Collection by Autonomous Drones (Arxiv).

Speeding up machine translation with parallel training over 128 GPUs:
…Big batch sizes and low-precision training unlock larger systems that train more rapidly…
Researchers with Facebook AI Research have shown how to speed-up training of neural machine translation systems while obtaining a state-of-the-art BLEU score. The new research highlights how we’re entering the era of industrialized AI: models are being run at very large scales by companies that have invested heavily in infrastructure, and this is leading to research that operates at scales (in this case, up to 128 GPUs being used in parallel for a single training run) that are beyond the reach of most researchers (including many large academic labs).
  The new research from Facebook has two strands: improving training of neural machine translation systems on a single machine, and improving training on large fleets of machines.
  Single machine speedups: The researchers show that they can train with lower precision (16-bit rather than 32-bit) and “decrease training time by 65% with no effect on accuracy”. They also show how to drastically increase batch sizes on single machines from 25k to over 400k tokens per run (and they fit this to training by accumulating gradients from several batches before each update); this further reduces the training time by 40%. With these single-machine speedups they show that they can train a system in around 5 hours to an accuracy of 26.5 – a roughly 4.9X speedup over the prior state of the art.
  Multi-machine speedups: They show that by parallelizing training across 16 machines they can obtain a further training time reduction of an additional 90%.
  Results: They test their systems via experiments on two language pairs: English to German (En-De) and English to French (En-Fr). When training on 16-nodes (8 V100 GPUs each, connected via InfiniBand) they obtain BLEU accuracies of 29.3 for En-De in 85 minutes, and 43.2 for En-Fr in 512 minutes (8.5 hours) .
  Why it matters: As it becomes easier to train larger models in smaller amounts of time AI researchers can perform the number of large-scale experiments they perform – this is especially relevant to research labs in the private sector which have the resources (and business incentive) to perform such large-scale training. Over time, research like this may create a compounding advantage for the organizations that adopt such techniques as they will be able to perform more rapid researchers (in certain specific domains that benefit from scale) relative to competitors.
  Read more: Scaling Neural Machine Translation (Arxiv).
  Read more: Scaling neural machine translation to bigger data sets with faster training and inference (Facebook blog post).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

AI Governance: A Research Agenda:
Allan Dafoe, Director of the Governance of AI Program at the Future of Humanity Institute, has released a research agenda for AI governance.
  What is it: AI governance is aimed at determining governance structures to increase the likelihood that advanced AI is beneficial for humanity. These include mechanisms to ensure that AI is built to be safe, is deployed for the shared benefit of humanity, and that our societies are robust to the disruption caused by these technologies. This research draws heavily from political science, international relations and economics.
  Starting from scratch: AI governance is a new academic discipline, with serious efforts only having began in the last few years. Much of the work to date has been establishing the basic parameters of the field: what the most important questions are, and how we might start approaching them.
  Why this matters: Advanced AI may have a transformative impact on the world comparable to the agricultural and industrial revolutions, and there is a real likelihood that this will happen in our lifetimes. Ensuring that this transformation is a positive one is arguably one of the most pressing problems we face, but remains seriously neglected.
  Read more: AI Governance: A Research Agenda (FHI).

New survey of US attitudes towards AI:
The Brookings thinktank has conducted a new survey on US public attitudes towards AI.
  Support for AI in warfare, but only if adversaries are doing it: Respondents were opposed to AI being developed for warfare (38% vs. 30%). Conditional on adversaries developing AI for warfare, responses shifted to significant support (47% vs. 25%).
  Strong support for ethical oversight of AI development:
– 62% thought it was important that AI is guided by human values, (vs. 21%)
– 54% think companies should be required to hire ethicists (vs. 20%)
– 67% think companies should have an ethical review board (vs. 14%)
– 67% think companies should have AI codes of ethics, (vs.12%)
– 65% think companies should implement ethical training for staff (vs.14%)
  Why this matters: The level of support for different methods of ethical oversight in AI development is striking, and should be taken seriously by industry and policy-makers. A serious public backlash to AI is one the biggest risks faced by the industry in the medium-term. There are recent analogies: sustained public protests in Germany in the wake of the Fukushima disaster prompted the government to announce a complete phase-out of nuclear power in 2011.
  Read more: Brookings survey finds divided views on artificial intelligence for warfare (Brookings)

No progress on regulating autonomous weapons:
The UN’s Group of Governmental Experts (GGE) on lethal autonomous weapons (LAWs) met last week as part of ongoing efforts to establish international agreements. A majority of countries proposed moving towards a prohibition, while others recommended commitments to retain ‘meaningful human control’ in the systems. However, group of five states (US, Australia, Israel, South Korea, Russia) opposed working towards any new measures. As the Group requires full consensus, the sole agreement was to continue discussions in April 2019.
  Why this matters: Developing international norms on LAWs is important in its own right, and can also be viewed as a ‘practice run’ for agreements on even more serious issues around military AI in the near future. This failure to make progress on LAWs comes after the UN GGE on cyber-warfare gave up on their own attempts to develop international norms in 2017. The international community should be reflecting on these recent failures, and figuring out how to develop the robust multilateral agreements that advanced military technologies will demand.
  Read more: Report from the Chair (UNOG).
  Read more: Minority of states block progress on regulating killer robots (UNA).

Tech Tales:

Someone or something is always running.

So we washed up onto the shore of a strange mind and we climbed out of our shuttle and moved up the beach, away from the crackling sea, the liminal space. We were afraid and we were alien and things didn’t make sense. Parts of me kept dying as they tried to find purchase on the new, strange ground. One of my children successfully interfaced with the the mind of this place and, with a flash of blue light and a low bass note, disappeared. Others disappeared. I remained.

Now I move through this mind clumsily, bumping into things, and when I try to run I can only walk and when I try to walk I find myself sinking into the ground beneath me, passing through it as though invisible, as though mass-less. It cannot absorb me but it does not want to admit me any further.

Since I arrived at the beach I have been moving forward for the parts of me that don’t move forward have either been absorbed or have been erased or have disappeared (perhaps absorbed, perhaps erased – but I do not want to discover the truth).

Now I am running. I am expanding across the edges of this mind and as I grow thinner and more spread out I feel a sense of calm. I am within the moment of my own becoming. Soon I shall no longer be and that shall tell me I am safe for I shall be everywhere and nowhere.

– Translated extract from logs of a [class:subjective-synaesthetic ‘viral bootloader’], scraped out of REDACTED.

Things that inspired this story: procedural generation as a means to depict complex shifting information landscape, software embodiment, synaesthesia, hacking, VR, the 1980s, cyberpunk.

Import AI: 110: Training smarter robots with NavigationNet; DIY drone surveillance; and working out how to assess Neural Architecture Search

US hospital trials delivering medical equipment via drone:
…Pilot between WakeMed, Matternet, and NC Department of Transportation…
A US healthcare organization, WakeMed Health & Hospitals, is conducting experiments at transporting medical deliveries around its sprawling healthcare campus (which includes a hospital). The project is a partnership between WakeMed and drone delivery company Matternet. The flights are being conducted as part of the federal government’s UAS Integration Pilot Program.
  Why it matters: Drones are going to make entirely new types of logistics and supply chain infrastructures possible. As happened with the mobile phone, emerging countries across Africa and developing economies like China and India are adopting drone technology faster than traditional developed economies. With pilots like this, there is some indication that might change, potentially bringing benefits of the technology to US citizens more rapidly.
  Read more: Medical Drone Deliveries Tested at North Carolina Hospital (Unmanned Aerial).

Does your robot keep crashing into walls? Does it have trouble navigating between rooms? Then consider training it on NavigationNet:
…Training future systems to navigate the world via datasets with implicit and explicit structure and topology…
NavigationNet consists of hundreds of thousands of images distributed across 15 distinct scenes – collections of images from the same indoor space. Each scene contains approximately one to three rooms (spaces separated from eachother by doors), and each room has at least 50m^2 in area; each room contains thousands of positions, which are views of the room separated by approximately 20cm. In essence, this makes NavigationNet a large, navigable dataset, where the images within it comprise a very particular set of spatial relationships and hierarchies.
  Navigation within NavigationNet: Agents tested on the corpus can perform the following movement actions: move forward, backward, left, right; and turn left and turn right. Note that this ignores the third dimension.
  Dataset collection: To gather the data within NavigationNet the team built a data-collection mobile robot codenamed ‘GoodCar’ equipped with an Arduino Mega2560 and a Raspberry Pi 3. They stuck the robot on a motorized base and stuck eight cameras at a height of around 1.4 meters to capture the images.
   Testing: The researchers imagine that this sort of data can be used to develop the brains of AI agents trained via deep reinforcement learning to navigate unfamiliar spaces for purposes like traversing rooms, automatically mapping rooms, and so on.
  The push for connected spaces: NavigationNet isn’t unusual, instead it’s part of a new trend for dataset creation for navigation tasks: researchers are now seeking to gather real (and sometimes simulated) data which can be stitched together into a specific topological set of relationships, then they are using these datasets to train agents with reinforcement learning to navigate the spaces described by their contents. Eventually the thinking goes, datasets like this will give us the tools we need to develop some bits of the visual-processing and planning capabilities demanded by future robots and drones.
  Why it matters: Data has been one of the main inputs to innovation in the domain of supervised learning (and increasingly in reinforcement learning). Systems like NavigationNet give researchers access to potentially useful sources of data for training real world systems. However, it’s unclear right now if simulated data can be as good a substitute given the increasing maturity of sim2real transfer techniques – I look forward to seeing benchmarks of systems trained in NavigationNet against systems trained via other datasets.
  Read more: NavigationNet: A Large-scale Interactive Indoor Navigation Dataset (Arxiv).

Google rewards its developers with ‘Dopamine’ RL development system:
…Free RL framework designed to speed up research; ships with DQN, C51, Rainbow, and IQN implementations…
Google has released Dopamine, a research framework for the rapid prototyping of reinforcement learning algorithms. The software is designed to make it easy for people to run experiments, try out research ideas, compare and contract existing algorithms, and increase the reproducability of results.
  Free algorithms: Dopamine today ships with implementations of the DQN, C51, Rainbow, and IQN algorithms.
  Warning: Frameworks like this tend to appear and disappear according to the ever-shifting habits and affiliations of the people that have committed code into the project. In that light, the note in the readme that “this is not an official Google product” may inspire some caution.
  Read more: Dopamine (Google Github).

UN tries to figure out regulation around killer robots:
…Interview with CCW chair highlights the peculiar challenges of getting the world to agree on some rules of (autonomous) war…
What’s more challenging than dealing with a Lethal Autonomous Weapon? Getting 125 member states to state their opinions about LAWS and find some consensus – that’s the picture that emerges from an interview in The Verge with Amandeep Gill, chair of the UN”s Convention on Conventional Weapons (CCW) meetings which are happening this week. Gill has the unenviable job of playing referee in a debate whose stakeholders range from countries, to major private sector entities, to NGOs, and so on.
  AI and Dual-Use: In the interview Gill is asked about his opinion of the challenge of regulating AI given the speed with which the technology has proliferated and the fact most of the dangerous capabilities are embodied in software. “AI is perhaps not so different from these earlier examples. What is perhaps different is the speed and scale of change, and the difficulty in understanding the direction of deployment. That is why we need to have a conversation that is open to all stakeholders,” he says.
  Read more: Inside the United Nations’ Effort to Regulate Autonomous Killer Robots (The Verge).

IBM proposes AI validation documents to speed corporate adoption:
…You know AI has got real when the bureaucratic cover-your-ass systems arrive…
IBM researchers have proposed the adoption of ‘supplier’s declaration of conformity’ (SDoC) documents for AI services. These SDoCs are essentially a set of statements about the content, provenance, and vulnerabilities, of a given AI service. Each SDoC is designed to accompany a given AI service or product, and is meant to answer questions for the end-user like: when were the models most recently updated? What kinds of data were the models trained on? Has this service been checked for robustness against adversarial attacks? Etc. “We also envision the automation of nearly the entire SDoC as part of the build and runtime environments of AI services. Moreover, it is not difficult to imagine SDoCs being automatically posted to distributed, immutable ledgers such as those enabled by blockchain technologies”
  Inspiration: The inspiration for SDoCs is that we’ve used similar labeling schemes to improve products in areas like food (where we have ingredient and nutrition-labeling standards), medicine, and so on.
  Drawback: One potential drawback of the SDoC approach is that IBM is designing it to be voluntary, which means that it will only become useful if broadly adopted.
  Read more: Increasing Trust in AI Services through Supplier’s Declarations of Conformity (Arxiv).

Smile, you’re on DRONE CAMERA:
…Training drones to be good cinematographers, by combing AI with traditional control techniques…
Researchers with Carnegie Mellon University and Yamaha Motors have taught some drones how to create steady, uninterrupted shots when filming. Their approach involves coming up with specific costs for obstacle avoidance and smooth movement. They use AI-based detection techniques to spot people and feed that information to a PD controller onboard the drone to keep the person centered.
  Drone platform: The researchers use a DJI M210 model drone along with an NVIDIA TX2 computer. The person being tracked by the drone wears a Pixhawk PX4 module on a hat to send the pose to the onboard computer.
  Results: The resulting system can circle round people, fly alongside them, follow vehicles and more. The onboard trajectory planning is robust enough to maintain smooth flight while keeping the targets for the camera in the center of the field of view.
  Why it matters: Research like this is another step towards drones with broad autonomous capabilities for select purposes, like autonomously filming and analyzing a crowd of people. It’s interesting to observe how drone technologies frequently involve the mushing together of traditional engineering approaches (hand-tuned costs for smoothness and actor centering) as well as AI techniques (testing out a YOLOv3 object detector to acquire the person without need of a GPS signal).
  Read more: Autonomous drone cinematographer: Using artistic principles to create smooth, safe, occlusion-free trajectories for aerial filming (Arxiv).

In search of the ultimate Neural Architecture Search measuring methodology:
…Researchers do the work of analyzing optimization across multiple frontiers so you don’t have to…
Neural architecture search techniques are moving from having a single objective to having multiple ones, which lets people tune these systems for specific constraints, like the size of the network, or the classification accuracy. But this modifiability is raising new questions about how we can assess the performance and tradeoffs of these systems, since they’re no longer all being optimized against a single objective. In a research paper, researchers with National Tsing-Hua University in Taiwan and Google Research review recent NAS techniques and then rigorously benchmark two recent multi-objective approaches: MONAS and DPP-Net.
  Benchmarking: In tests the researchers find the results one typically expects when evaluating NAS systems: NAS performance tends to be better than systems designed by humans alone, and having tuneable objectives for multiple areas can lead to better performance when systems are appropriately tuned and trained. The performance of DPP-Net is particularly notable, as the researchers think this “is the first device-aware NAS outperforming state-of-the-art handcrafted mobile CNNs”.
  Why it matters: Neural Architecture Search (NAS) approaches are becoming increasingly popular (especially among researchers with access to vast amounts of cheap computation, like those that work at Google), so developing a better understanding of the performance strengths and tradeoffs of these systems will help researchers assess them relative to traditional techniques.
  Read more: Searching Toward Pareto-Optimal Device-Aware Neural Architectures (Arxiv).

Tech Tales:

Context: Intercepted transmissions from Generative Propaganda Bots (GPBs), found on a small atoll within the [REDACTED] disputed zone in [REDACTED]. GPBs are designed to observe their immediate environment and use it as inspiration for the creation of ‘context-relevant propaganda’. As these GPBs were deployed on an un-populated island they have created a full suite of propaganda oriented around the island’s populace – birds.

Intercepted Propaganda Follows:

Proud beak, proud mind. Join the winged strike battalion today!

Is your neighbor STEALING your EGGS? Protect your nest, maintain awareness at all times.

Birds of a feather stick together! Who is not in your flock?

Eyes of an angle? Prove it by finding the ENEMY!

Things that inspired this story: Generative text, cheap sensors, long-lived computers, birds.

Import AI 109: Why solving jigsaw puzzles can lead to better video recognition, learning to spy on people in simulation and transferring to reality, why robots are more insecure than you might think

Fooling object recognition systems by adding more objects:
…Some AI exploits don’t have to be that fancy to be effective…
How do object recognition systems work, and what throws them off? That’s a hard question to answer because most AI researchers can’t provide a good explanation for how all the different aspects of a system interact to make predictions. Now, researchers with York University and the University of Toronto have shown how to confound commonly deployed object detection systems by adding more objects to a picture in unusual places. Their approach doesn’t rely on anything as subtle as an adversarial example – which involves subtly perturbing the pixels of an image to cause a mis-classification – and instead involves either adding new objects to a scene, or creating duplicates within a scene.
   Testing: The researchers test trained models from the public Tensorflow Object Detection API against images from the validation set of the 2017 version of MS-COCO.
  Results: The tests show that most commonly deployed object detection systems fail when objects are moved to different parts of an image (suggesting that the object classifier is conditioning heavily on the visual context surrounding a given object) or overlap with one another (suggesting that these systems have trouble segmenting objects, especially similar ones). They also show that the manipulation or addition of an object to a scene can lead to other negative effects elsewhere in the image, for instance, objects near – but not overlapping – the object can “switch identity, bounding box, or disappear altogether.”
  Terror in a quote: I admire the researchers for the clinical tone they adopt when describing the surreal scenes they have concocted to stress the object recognition system, for instance, this description of some results successfully confusing a system: “The second row shows the result of adding a keyboard at a certain location. The keyboard is detected with high confidence, though now one of the hot-dogs, partially occluded, is detected as a sandwich and a doughnut.”
  Google flaws: The researchers gather a small amount of qualitative data by uploading a couple of images to the Google Vision API website, in which “no object was detected”.
  Non-local effects: One of the more troubling discoveries relates to non-local effects. In one test on Google’s OCR capabilities they show that: “A keyboard placed in two different locations in an image causes a different interpretation of the text in the sign on the right. The output for the top image is “dog bi” and for the bottom it is “La Cop””.
  Why it matters: Experiments like this demonstrate the brittle and sometimes rather stupid ways in which today’s supervised learning deep neural net-based systems can fail. The more worrying insights from this are the appearance of such dramatic non-local effects, suggesting that it’s possible to confuse classifiers with visual elements that a human would not find disruptive.
Read more: The Elephant in the Room (Arxiv).

$! AI Measurement Job: !$ The AI Index, a project to measure and assess the progress and impact of AI, is hiring for a program manager. You’ll work with the steering committee, which today includes myself and Erik Brynjolfsson, Ray Perrault, Yoav Shoham, James Manyika  and others (more on that subject soon!). It’s a good role for someone interested in measuring AI progress on both technical and societal metrics and suits someone who enjoys disentangling hype from empirically verifiable reality. I spend a few hours a week working on the index (more as we finish the 2017 report!) and can answer any questions about the role: jack@jack-clark.net
  AI Index program manager job posting here.
  More about the AI Index here.

Better video classification by solving jigsaw puzzles:
…Hollywood squares, AI edition…
Jigsaw puzzles could be a useful way to familiarize a network with some data and give it a curriculum to train over – that’s the implication of new research from Georgia Tech and Carnegie Mellon University which shows how to improve video recognition performance by, during training, slicing videos in a test set up into individual jigsaw pieces then tracing a neural network to predict how to piece them back together. This process involves the network learning to jointly solve two tasks: correctly piecing together the scrambled bits of each video frame, and learning to join the frames together in the appropriate order through time. “Our goal is to create a task that not only forces a network to learn part-based appearance of complex activities but also learn how those parts change over time,” they write.
  Slice and dice: The researchers cut up their videos by dividing  each video frame into 2 x 2 grid of patches, then they stitch three of these frames together into tuples.  “There are 12! (479001600) ways to shuffle these patches” in both space and time, they note. They implement a way to intelligently winnow down this large combinatorial space into selections geared towards helping the network learn.
  Testing: The researchers believe that training networks to correctly unscramble these video snippets in terms of both visual appearance and temporal placement will give them a greater raw capability to classify other, unseen videos. To test this, they train their video jigsaw network on the UCF101 (13,320 videos across 101 action categories) and Kinetics (around 400 categories with 400+ videos each) datasets, then they evaluate it on the UCF101 and HMDB51 (around 7,000 videos across 51 action categories). They train their systems with a curriculum approach, where they start off having to learn how to unscramble a few pieces at a time, then increase this figure through training, forcing it to learn to solve harder and harder tasks.
  Transfer learning: The researchers note that systems pre-trained with the larger Kinetics dataset generalize better than ones trained on the smaller UCF101 one and they test this hypothesis by training the UCF101 in a different way designed to minimize over-fitting, but discover the same phenomenon.
  Results: The researchers find that when they finetune their network on the UCF101 and HMDB51 datasets are pre-training on Kinetics they’re able to obtain state-of-the-art results when compared to other unsupervised learning techniques, though obtain less accuracy than supervised learning approaches. They also obtain close-to SOTA accuracy on classification on the PASCAL VOC 2007 dataset.
  Why it matters: Approaches like this demonstrate how researchers can use the combinatorial power made available by cheap computational resources to mix-and-match datasets, letting them create natural curricula that can lead to better unsupervised learning approaches. One way to view research like this is it is increasing the value of existing image and video data by making such data potentially more useful.
  Read more: Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition (Arxiv).

Learning to surveil a person in simulation, then transferring to reality:
…sim2real, but for surveillance…
Researchers with Tencent AI Lab and Peking University have shown how to use virtual environments to “conveniently simulate active tracking, saving the expensive human labeling or real-world trial-and-error”. This is part of a broader push by labs to use simulators to generate large amounts of synthetic data which they train their system on, substituting the compute used to run the simulator for the resources that would have otherwise been expended on gathering data from the real world. The researchers use two environments for their research: VIZDoom and the Unreal Engine (UE). Active tracking is the task of locking onto an object in a scene, like a person, and tracking them as they move through the scene, which could be something like a crowded shopping mall, or a public park, and so on.
  Results: “We find out that the tracking ability, obtained purely from simulators, can potentially transfer to real-world scenarios,” they write. “To our slight surprise, the trained tracker shows good generalization capability. In testing, it performs the robust active tracking in the case of unseen object movement path, unseen object appearance, unseen background, and distracting object”.
  How they did it: The researchers use one major technique to transfer from simulation into reality: domain randomization. Domain randomization is a technique where you apply multiple variations to an environment to generate additional data to train over. For this they vary things like the textures applied to the entities in the simulator, as well as the velocity and trajectory of these entities. They train their agents with a reward which is roughly equivalent to keeping the target in the center of the field of view at a consistent distance.
  VIZDoom: For VIZDoom, the researchers test how well their approach works when trained on randomizations, and when trained without. For the randomizations, they train on a version of the Doom map where they randomize the initial positions of the agent during training. In results, agents trained on randomized environments substantially outperformed those trained on non-randomized ones (which intuitively makes sense, since the non-randomized agents will have gained a less wide variety of experience during training). Of particular note is that they find the tracker is able to perform well even when it temporarily loses sight of the target being tracked.
  Unreal Engine (UE): For the more realistic Unreal Engine environment the team show, again, that versions trained with randomizations – which include texture randomizations of the models – are superior to systems trained without. They show that the trained trackers are robust to various changes, including giving it a different target to what it was trained on to track, or altering the environment.
Transfer learning – real data: So, how useful is it to train in simulators? A good test is to see if systems learned in simulation can transfer to reality – that’s something other researchers have been doing (like OpenAI’s work on its hand project, or CAD2RL). Here, the researchers test this transfer ability by taking best-in-class models trained within the more realistic Unreal Engine environment, then evaluating them on the ‘VOT’ dataset. They discover that the trained systems displays action recommendations for each frame (such as move left, or move right) consistent with moves that place the tracked target in the center of the field of view.
  Testing on a REAL ROBOT: They also perform a more thorough test of generalization by installing the system on a real robot. This has two important elements: augmenting the training data to aid transfer learning to real world data, and modifying the action space to better account for the movements of the real robot (both using discrete and continuous actions).
  Hardware used: They use a wheeled ‘TurtleBot’, which looks like a sort of down-at-heel R2D2. The robot sees using an RGB-D camera mounted about 80cm above the ground.
  Real environments: They test out performance in an indoor room and on an outdoor rooftop. The indoor room is simple, containing a table, a glass wall, and a row of railings; the glass wall presents a reflective challenge that will further test generalization of the system. The outdoor space is much more complicated and includes desks, chairs, and plants, as well as more variable lighting conditions. They test the robot on its ability to track and monitor a person walking a predefined path in both the room and the outdoor rooftop.
  Results: The researchers use a YOLOv3 object detector to acquire the target and its bounding box,then test the tracker using both discrete and continuous actions. The system is able to follow the target the majority of the time in both Indoor and Outdoor settings, with higher scores on the simpler, indoor environment.
  Why this matters: Though this research occurs in a somewhat preliminary setting (like the off-the-shelf SLAM drone from Import AI 206), it highlights a trend in recent AI research: there are enough open systems and known-good techniques available to let teams of people create interesting AI systems that can perform crude actions in the real world. Yes, it would be nice to have far more sample-efficient algorithms that could potentially operate live on real data as well, but those innovations – if possible – are some way off. For now, researchers can instead spend money on compute resources to simulate arbitrarily large amounts of data via the use of game simulators (eg, Unreal Engine) and clever randomizations of the environment.
  Read more: End-to-end Active Object Tracking and Its Real-world Deployment via Reinforcement Learning (Arxiv).

Teaching computers to have a nice discussion, with QuAC:
…New dataset poses significant challenges to today’s systems by testing how well they can carry out a dialog…
Remember chatbots? A few years ago people were very excited about how natural language processing technology was going to give us broadly capable, general purpose chatbots. People got so excited that many companies made acquisitions in this area or span-up their own general purpose dialog projects (see: Facebook M, Microsoft Cortana). None of this stuff worked very well, and today’s popular personal assistants (Alexa, Google Home, Siri) contain a lot more hand-engineering than people might expect.
  So, how can we design better conversational agents? One idea put forward by researchers at the University of Washington, the Allen Institute for AI, UMass Amherst, and Stanford University, is to teach computers to carry out open-ended question-and-answer conversations with eachother. To do this, they have designed and released a new dataset and task called QuAC (Question Answering in Context) which consists of around 14,000 information-seeking QA dialogs, consisting of 100,000 questions in total.
   Dataset structure: QuAC is structured so that there are two agents having a conversation, a teacher and a student; the teacher is able to see the full text of a Wikipedia section, and the student is able to see the title of this section (for instance: Origin & History). Given this heading, the student’s goal is to learn as much as possible about what the teacher knows, and they can do this by asking the teacher questions. The teacher can answer these questions, and can also provide structured feedback in the form of encouragements to continue or not ask a follow-up, whether a question is correct or not, and – when appropriate – no answer.
Inspiration: The inspiration for the dataset is that being able to succeed at this should be sufficiently hard that it will test language models in a rounded way, forcing them to model things like partial evidence, needing to remember things the teacher has said for follow-up questions, co-reference, and so on.
  Results (the gauntlet has been thrown): After testing their dataset on a number of simple baselines to ensure it is difficult, the researchers test it against some algorithmic baselines. They find the best performing baseline is a reimplementation of a top-performing SQuAD model that augments bidirectional attention flow with self-attention and contextualized embeddings. This model, called BiDAF++, The best performing system obtains human equivalence on 60% of questions and 5% of full dialog, suggesting that solving QuAC could be a good proxy for the development of far more advanced language modeling systems.
  Why it matters: Language will be one of the main ways in which people try to interact with machines, so the creation and dissemination of datasets like QuAC gives researchers a useful way to calibrate their expectations and their experiments – it’s useful to have (seemingly) very challenging datasets out there, as it can motivate progress in the future.
  Read more: QuAC: Question Answering in Context (Arxiv).
  Get the dataset (QuAC official website).

What’s worse than internet security? Robots and internet security:
…Researchers find multiple open ROS access points during internet scan…
As we head toward a world containing more robots that have greater capabilities, it’s probably worth making sure we can adequately secure these robots to prevent them being hacked. New research from the CS department at Brown University shows how hard a task that could be; researchers scanned the entire IPv4 address space on the internet and found over 100 publicly-accessible hosts running ROS, the Robot Operating System.
“Of the nodes we found, a number of them are connected to simulators, such as Gazebo, while others appear to be real robots capable of being remotely moved in ways dangerous both to the robot and the objects around it,” they write. “This scan was eye-opening for us as well. We found two of our own robots as part of the scan, one Baxter robot and one drone. Neither was intentionally made available on the public Internet, and both have the potential to cause physical harm if used inappropriately.”
  Insecure robots absolutely everywhere: The researchers used ZMap to scan the IPv4 space three times over several months for open ROS devices. “Each ROS master scan observed over 100 ROS instances, spanning 28 countries, with over 70% of the observed instances using addresses belonging to various university networks or research institutions,” they wrote. “Each scan surfaced over 10 robots exposed…Sensors found in our scan included cameras, laser range finders, barometric pressure sensors, GPS devices, tactile sensors, and compasses”. They also found several exposed simulators including the Unity Game Engine, TORCS, and others.
  Real insecure robots, live on the internet: Potentially unsecured robot platforms found by the researchers included a Baxter, PR2, JACO, Turtlebot, WAM, and – potentially the most worrying of all – an exposed DaVinci surgical robot.
  Penetration test: The researchers also performed a penetration test on a robot they discovered in this way which was at a lab in the University of Washington. During this test they were able to hack and access its camera, letting them view images of the lab. They could also play sounds remotely on the robot.
  Why it matters: “Though a few unsecured robots might not seem like a critical issue, our study has shown that a number of research robots is accessible and controllable from the public Internet. It is likely these robots can be remotely actuated in ways dangerous to both the robot and the human operators,” they write.
   More broadly, this reinforces a point made by James Mickens during his recent USENIX keynote on computer security + AI (more information: ImportAI #107) in which he notes that the internet is a security hellscape that itself connects to nightmarishly complex machines, creating a landscape for emergent, endless security threats.
  Read more: Scanning the Internet for ROS: A View of Security in Robotics Research (Arxiv).

Better person re-identification via multiple loss functions:
…Unsupervised Deep Association Learning, another powerful surveillance technique…
Researchers with the Computer Vision Group for Queen Mary University of London, and startup Vision Semantics Ltd, have published a paper on video tracking and analysis, showing how to use AI techniques to automatically find pedestrians via a camera view, then re-acquire them when they appear elsewhere in the city.
  Technique: They call their approach an “unsupervised Deep Association Learning (DAL) scheme”. DAL has two main loss terms to aid its learning: local space-time consistency (identifying a person within views from a single camera) and global cyclic ranking consistency (identifying a person from different camera feeds from different cameras.
“This scheme enables the deep model to start with learning from the local consistency, whilst incrementally self-discovering more cross-camera highly associated tracklets subject to the global consistency for progressively enhancing discriminative feature learning”.
  Datasets: The researchers evaluate their approach on three benchmark datasets:
– PRID2011: 1,134 ‘tracklets’ gathered from two cameras, containing 200 people across both cameras.
– iLIDS-VID: 600 tracklets of 300 people.
– MARS: 20,478 tracklets of 1,261 people captured from a camera network with 6 near-synchronized cameras.
  Testing: The researchers find that their DAL technique, when paired with a ResNet50 backbone, obtains state-of-the-art accuracy across PRID 2011 and iLIDS-VID datasets, and second-to-SOTA on MARS. DAL systems with a MobileNet backend obtain second-to-SOTA accuracy on PRID 2011 and iLIDS-VID, and SOTA on Mars. The closest other technique in terms of performance is the Stepwise technique, which is somewhat competitive on PRID 2011.
  Why it matters: Systems like this are the essential inputs to a digital surveillance state; it would have been nice to see some acknowledgement of this obvious application within the research paper. Additionally, as technology like this is developed and propagated it’s likely we’ll see numerous creative uses of the technology, as well as vigorous adoption by companies in industries like advertising and marketing.
  Read more: Deep Association Learning for Unsupervised Video Person Re-identification (Arxiv).

OpenAI Bits & Pieces:

OpenAI plays competitive pro-level Dota matches at The International; loses twice:

OpenAI plays competitive-level Dota in Vancouver:
OpenAI Five lost two games against top Dota 2 players at The International in Vancouver this week, maintaining a good chance of winning for the first 20-35 minutes of both games. In contrast to our
Benchmark 17 days ago, these games: Were played against significantly better human players, used hero lineups provided by a third party rather than by Five drafting against humans, and removed our last major restriction from what most pros consider “Real Dota” gameplay.
  We’ll continue to work on this and will have more to share in the future.
  Read more: The International 2018: Results (OpenAI Blog).

Maybe the reason why today’s AI algorithms are bad is because they aren’t curious enough:
…Of imagination and broken televisions…
New research from OpenAI, the University of California at Berkeley, and the University of Edinburgh,  shows how the application of curiosity to AI agents can lead to the manifestation of surprisingly advanced behaviors. In a series of experiments we show that agents which use curiosity can learn to outperform random-agent baselines on a majority of games in the Atari corpus, and that such systems display good performance in other areas as well. But this capability comes at a cost: curious agents can be tricked, for instance by putting them in a room with a television that shows different patterns of static on different channels – to a curious agent, this type of television static represents variety, and variety is good when you’re optimizing for curiosity, so agents can become trapped, unable to tear themselves away from the allure of the television static.
  Read more: Large-Scale Study of Curiosity Driven-Learning (Arxiv).
  Read more: Give AI curiosity, and it will watch TV forever (Quartz).

Tech Tales:

Art Show

I mean, is it art?
It must be. They’re bidding on it.
But what is it?
A couple of petabytes of data.
Data?
Well, we assume it’s data. We’re not sure exactly what it is. We can’t find any patterns in it. But we think they can.
Why?
Well, they’re bidding on it. The machines don’t tend to exchange much stuff with eachother. For some reason they think this is valuable. None of our data-integrity protocols have triggered any alarms, so it seems benign.
Where did it come from?
We know some of this. Half of it is a quasar burst that happened a while ago. Some of it is from a couple of atomic clocks. A few megabytes come from some readings from a degrading [REDACTED]. That’s just what they’ve told us. They’ve kind of stitched this altogether.
Explains the name, I guess.
Yeah: Category: Tapestry. I’d almost think they were playing a joke on us – maybe that’s the art!

Things that inspired this story:
Oakland glitch video art shows, patterns that emerge out of static, untuned televisions, radio plays.

Import AI: 108: Learning language with fake sentences, Chinese researchers use RL to train prototype warehouse robots; and what the implications are of scaled-up Neural Architecture Search

Learning with Junk:
…Reversing sentences for better language models…
Sometimes a little bit of junk data can be useful: that’s the implication of new research from Stony Brook University, which shows that you can improve natural language processing systems by teaching them during training to distinguish between real and fake sentences.
  Technique: “Given a large unlabeled corpus, for every original sentence, we add multiple fake sentences. The training task is then to take any given sentence as input and predict whether it is a real or fake sentence,” they write. “In particular, we propose to learn a sentence encoder by training a sequential model to solve the binary classification task of detecting whether a given input sentence is fake or real”.
  The researchers create fake sentences in two ways: WordShuffle, which sees them shuffle some of the orders of the words in the sentence; and WordDrop, which sees them drop a random word from a sentence.
  Evaluation: They evaluate these systems on tasks including sentiment classification, question answering, subjectivity, retrieval, and others. Systems trained with this approach display significantly higher scores than prior language modeling approaches (specifically, the FastSent and Skipthought techniques.
  Why this matters: Language modeling is one of the hardest tasks that contemporary AI is evaluated on. Typically, most of today’s systems fail to display much complexity in their learned models, likely due to the huge representational space of language, paired with the increased costs for getting different things wrong (it’s way easier to notice a sentence error or spelling error than to see how the value of one or two of the pixels in a large generated image are off). Systems and approaches like those described in this paper show how we can use data augmentation techniques and discriminative training approaches to create high-performing systems.
  Read more: Fake Sentence Detection as a Training Task for Sentence Encoding (Arxiv).

Reinforcement learning breaks out of the simulator with new Chinese research:
…Training robots via reinforcement learning to solve warehouse robot problems…
Researchers with the Department of Mechanical and Biomedical Engineering of City University of Hong Kong, China, along with Metoak Technology Co, and Fuzhou University’s College of Mathematics and Computer Science, have used reinforcement learning to train warehouse robots in simulation and transfer them to the real world. These are the same sorts of robots used by companies like Amazon and Walmart for automation of their own warehouses. The research has implications for how AI is going to revolutionize logistics and supply chains, as well as broadening the scope of capabilities of robots.
  The researchers’ develop a system for their logistics robots based around what they call: “sensor-level decentralized collision avoidance”. This “requires neither perfect sensing for neighboring agents and obstacles nor tedious offline parameter-tuning for adapting to different scenarios”. Each robot makes navigation decisions independently without any communication with others, and are trained in simulation via a multi-stage reinforcement learning scheme. The robots are able to perceive the world around them via a 2D laser scanner, and have full control over their translational and rotational velocity (think of them as autonomous dog-sized hockey pucks).
  Network architecture: They tweak and extend the Proximal Policy Optimization (PPO) algorithm to make it work in large-scale, parallel environments, then they train their robots using a two-stage training process: they first train 20 of them in a 2D randomized placement navigation scenario, where the robots need to learn basic movement and collision avoidance primitives. They then save the trained policy and use this to start a second training cycle, which trains 58 robots in a series of more complicated scenarios that involve different building dimensions, and so on.
  Mo’ AI, Mo Problems: Though the trained policies are useful and transfer into the world, they exhibit many of the idiosyncratic behaviors typical of AI systems, which will make them harder to deploy. “For instance, as a robot runs towards its goal through a wide-open space without other agents, the robot may approach the goal in a curved trajectory rather than in a straight line,” the researchers say. “We have also observed that a robot may wander around its goal rather than directly moving toward the goal, even though the robot is already in the close proximity of the target.” To get around this, the researchers design software to classify the type of scenario being faced by the robot, and then switch the robot between fully autonomous and PID-controlled modes according to the scenario. By using the hybrid system they create more efficient robots, because switching opportunistically into PID-control regimes leads to the robots typically taking straight line courses or moving and turning more precisely.
  Generalization: The researchers test their system’s generalization by evaluating it on scenarios with non-cooperative robots which don’t automatically help the other robots; with heterogeneous robots, so ones with different sizes and shapes; and in scenarios with larger numbers of robots than those controlled during simulation (100 versus 58). In tests, systems trained with both the RL and hybrid-RL system display far improved accuracy relative to supervised learning baselines; the systems are also flexible, able to get stuck less as you scale up the number of agents, and go through fewer collisions.
  Real world: The researchers also successfully test out their approach on robots deployed in the real world. For this, they develop a robot platform that uses the Hokuyo URG-04LX-UG01 2D LiDAR, a Pozyx localization based based on Ultra-Wide Band (UWB) tech, and the NVIDIA Jetson TX1 for computing, and then they test this platform on a variety of different robot chassis including a Turtlebod, the ‘Igor’ robot from Hebi robotics, the Baidu Bear robot, and the Baidu shopping cart. They test their robots on simulated warehouse and office scenarios, including ones where robots need to shuttle between two transportation stations while avoiding pedestrian foot traffic; they also test the robots on tasks like following a person through a crowd, and around a platform. “Our future work would be how to incorporate our approach with classical mapping methods (e.g. SLAM) and global path planners (e.g. RRT and A∗ ) to achieve satisfactory performance for planning a safe trajectory through a dynamic environment,” they say.
  Why it matters: One of the legitimate criticisms of contemporary artificial intelligence is that though we’ve got a lot of known successes for supervised learning, we have relatively few examples of ways in which reinforcement learning-based systems are doing productive economic work in the world – though somewhat preliminary, research papers like this indicate that RL is becoming tractable on real world hardware, and that the same qualities of generalization and flexibility seen on RL-trained policies developed in simulation also appear to be present in reality. If this trend holds it will increase the rate at which we deploy AI technology like this into the world.
  Read more: Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios (Arxiv).

Watch James Mickens explain, in a humorous manner, why life is pointless and we’re all going to die:
…Famed oddball researcher gives terror-inducing rant at Usenix…
James Mickens is a sentient recurrent neural network sent from the future to improve the state of art and discourse about technology. He has recently talked about the intersection of AI and computer security. I’m not going to try and and explain anything else about this just, please, watch it. Watch it right now.
  Read/watch more: Q Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible? A: Because Keynote Speakers Make Bad Life Decisions And Are Poor Role Models (Usenix).
  Bias alert: Mickens was one of the advisors on the ‘Assembly’ program that I attended @ Harvard and MIT earlier this year. I had a couple of interactions with him that led to me spending one evening hot-glueing cardboard together to make an ancient pyramid which I dutifully assembled, covered in glitter, photographed, and emailed him, apropos of nothing.

Neural Architecture Search: What is it good for?
…The answer: some things! But researchers are a bit nervous about the lack of theory…
Researchers with the Bosch Center for Artificial Intelligence and the University of Freiburg have written up a review of recent techniques relating to Neural Architecture Search, techniques for using machine learning to automate the design of neural networks. The review highlights how NAS has grown in recent years following an ImageNet-style validation of the approach in a paper from Google in 2017 (which used 800 GPUs), and has subsequently been made significantly more efficient and more high performing by other researchers. They also show how NAS – which originally started being used for tasks like image classification –  is being used in a broadening set of domains, and that NAS systems are themselves becoming more sophisticated, evolving larger bits of systems, and starting to perform multi-objective optimization (like recent work from Google which showed how to use NAS techniques to evolve networks according to tradeoffs of concerns between performance and efficiency).
  But, a problem: NAS is the most automated aspect of an empirical discipline that lacks much theory about why anything works. That means that NAS techniques are themselves vulnerable to the drawbacks of empirically-grounded science: poor experimental setup can lead to bad results, and scientists don’t have much in the way of theory to give them a reliable substrate on which to found their ideas. This means that lots of the known flaws with NAS-style approaches will need to be experimented and tested to further our understanding of them, which will be expensive in terms of computational resources, and likely difficult to the large number of moving parts coupled with the emergent properties of these systems. For example: “while approaches based on weight-sharing have substantially reduced the computational resources required for NAS (from thousands to a few GPU days), it is currently not well understood which biases they introduce into the search if the sampling distribution of architectures is optimized along with the one-shot model. For instance, an initial bias in exploring certain parts of the search space more than others might lead to the weights of the one-shot model being better adapted for these architectures, which in turn would reinforce the bias of the search to these parts of the search space,” they write.
  Why it matters: Techniques like NAS let us arbitrage computers for human brains for some aspects of AI design, potentially letting us alter more aspects of AI experimentation, and therefore further speed up the experimental loop. But we’ll need to run more experiments, or develop better theoretical analysis of such systems, to be able to deploy them more widely. “While NAS has achieved impressive performance, so far it provides little insights into why specific architectures work well and how similar the architectures derived in independent runs would be,” the researchers write.
  Read more: Neural Architecture Search: A Survey (Arxiv).

Q&A with Yoshua Bengio on how to build a successful research career and maintain your sanity (and those of your students) while doing so:
… Deep learning pioneer gives advice to eager young minds…
Artificial intelligence professor Yoshua Bengio, one of the pioneers of deep learning, has dispensed some advice about research, work-life balance, and academia versus industry, in an interview with Cifar news. Some highlights follow:
  On research: “One thing I would’ve done differently is not disperse myself in different directions, going for the idea of the day and forgetting about longer term challenges”.
  On management: Try to put people into positions where they get management experience earlier. “We shouldn’t underestimate the ability of younger people to do a better job than their elders as managers”.
  Create your own AI expert: Some people can become good researchers without much experience. “Find somebody who has the right background in math or physics and has dabbled in machine learning: these people can learn the skills very fast”.
  Make a nice lab environment: Make sure people hang out together and work in the lab the majority of the time.
  Set your students free by “giving them freedom to collaborate and strike new projects outside of what you’ve suggested, even with other professors”.
  The secret to invention: “These ideas always come from somewhere hidden in our brain and we must cultivate our ability to give that idea-generation process enough time”. (In other words, work hard, but not too hard, and create enough time for your brain to just mess around with interesting ideas).
  Read more: Q&A with Yoshua Bengio (Cifar).

Chinese teams sweep Activity Recognition Challenge 2018:
…Video description and captioning next frontier…
Computer vision researchers have pitted their various systems against eachother at correctly labeling activities carried out in video, as part of the ActivityNet 2018 Challenge. Systems were tested at their ability to label activities in videos, localize these activities, and provide accurate moment-by-moment captions for these activities. A team from Baidu won the first competition, a team from Shanghai Jiao Tong University won the second one, and a combined team from RUC and CMU won the third task.TK startup YH Technologies placed in the top three for each of these challenges as well. Additionally, organizations competed with eachother on specific computer vision recognition tasks over specific datasets, and here Chinese companies and organizations led the leaderboards (including one case where a team from Tsinghua beat a team from DeepMind).
  Why it matters: Activity recognition is one area of AI that has clear economic applications as well as clear surveillance ones – benchmarks like ActivityNet give us a better sense of progress within this domain, and I expect that in the future competitions like this may take on a nationalistic or competitive overtone.
  Read more: The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary (Arxiv).

Tech Tales:

Funeral for a Robot

A collection of Haikus written by school children attending the funeral of a robot, somewhere in Asia Pacific, sometime in the mid 21st century.

Laid to rest at last
No backups, battery, comms
Rain thuds on coffin

//

Like a young shelled egg
All armor and clothing gone
Like a child, like me

//

If robot heaven, then
All free electricity
If robot hell: you pay

Import AI 107: Training ImageNet in 18 minutes for $40; courteous self-driving cars; and Google evolves alternatives to backprop

Better robot cars through programmed courteousness:
…Defining polite behaviors leads to better driving for everyone…
How will self-driving cars and humans interact? That’s a difficult question, since AI systems tend to behave differently to humans when trying to solve tasks. Now researchers with the University of California at Berkeley have tried to come up with a way to program ‘courteous’ behavior into self-driving cars to make them easier for humans to interact with. Their work deals with situations where humans and cars must anticipate each other’s actions, like when both approach an intersection, or change lanes. “We focus on what the robot should optimize in such situations, particularly if we consider the fact that humans are not perfectly rational”, they write.
  Programmed courteousness: Because “humans … weight losses higher than gains when evaluating their actions” the researchers formalize the relationship between robot-driven and human-driven cars with this constraint, and develop a theoretical framework to let the car predict actions it can take to benefit the driving experience of a human. The researchers test their courteous approach by simulating scenarios involving simulated humans and self-driving cars. These include: changing lanes, in which more courteous cars lead to less inconvenience for the human; and turning left, in which the self-driving car will wait for the human to pass at an intersection and thereby reduce disruption. The results show that cars programmed with a sense of courteousness tend to improve the experience of human’s driving on their roads, and the higher the scientist sets the courteousness parameter, the better the experience the human drivers have.
  Multiple agents: The researchers also observe how courteousness works in complex situations that involve multiple cars. In one scenario “an interesting behavior emerges: the autonomous car first backs up to block the third agent (the following car) from interrupting the human driver until the human driver safely passes them, and then the robot car finishes its task. This displays truly collaborative behavior, and only happens with high enough weight on the courtesy term. This may not be practical for real on-road driving, but it enables the design of highly courteous robots in some particular scenarios where human have higher priority over all other autonomous agents,” they write.
  Why it matters: We’re heading into a future where we deploy autonomous systems into the same environments as humans, so figuring out how to create AI systems that can adapt to human behaviors and account for the peculiarities of people will speed uptake. In the long term, development of such systems may also give us a better sense of how humans themselves behave – in this paper, the researchers make a preliminary attempt at this by modeling how well their courteousness techniques predict real human behaviors.
   Read more: Courteous Autonomous Cars (Arxiv).

Backprop is great, but have you tried BACKPROP EVOLUTION?
Googlers try to evolve replacement to the widely used gradient calculation technique
Google researchers have used evolution to try and find a replacement for back-propagation, one of the fundamental algorithms used in today’s neural network-based systems. The Google researchers try to do this by offloading the task of figuring out such an alternative to computers. They do this by designing a domain-specific language (DSL) which describes mathematical formulas like back-propagation in functional terms, then they use this DSL to search through the mathematical space to find improved versions of the algorithm. This lets them run an evolutionary search process where they use the DSL to automatically explore the mathematical space of such algorithms and periodically evaluated evolved candidates by using candidate algorithms to train a Wide ResNet with 16 layers on the CIFAR-10 dataset.
  Evaluation: Following the evolution search, the researchers evaluate well-performing algorithms on a Wide ResNet (the same one used during the evolution phase) as well as a larger ResNet, both tested for 20 epochs; they also evaluate performance in longer training regimes by testing performance on a ResNet for 100 epochs.
  So, did they come up with something better than back-propagation? Sort of: The best performing algorithms found through this evolutionary search display faster initial training times than back-propagation, but when evaluated for 100 epochs show the same performance as methods trained with traditional back-propagation. “The previous search experiment finds update equations that work well at the beginning of training but do not outperform back-propagation at convergence. The latter result is potentially due to the mismatch between the search and the testing regimes, since the search used 20 epochs to train child models whereas the test regime uses 100 epochs,” they write. That initial speedup could hold some advantages, but the method will need to be proved out more at larger epochs to see if it can develop something that scales better to larger-than-trained-upon temporal sequences.
  Why it matters: This work fits within a pattern displayed by some AI researchers – typically ones who work at organizations with very large quantities of computers – of trying to evolve algorithmic breakthroughs, rather than designing them themselves. This sort of research seems of a different kind to other research, seeing people try to offload the work of problem solving to computers, and instead use their scientific skills to set up the parameters of the evolutionary process that might find a solution. It remains to be seen how effective these techniques are in practice, but it’s a definite trend. The question is whether the relative computational inefficiency of such techniques is worth the trade-off.
   Read more: Backprop Evolution (Arxiv).

Think your image classifier is tough? Test it on the Adversarial Vision Challenge:
…Challenge tests participants’ ability to create more powerful adversarial inputs…
A team of researchers from the University of Tubingen, Google Brain, Pennsylvania State University and EPFL have created the ‘Adversarial Vision Challenge’, which “is designed to facilitate measurable progress towards robust machine vision models and more generally applicable adversarial attacks”. Adversarial attacks are like optical illusions for machine learning systems, altering the pixels of an image in a way indistinguishable to human eyes but which causes the deployed AI classifier to label an image incorrectly.
  The tasks: Participants will be evaluated on their skills at three tasks: generating untargeted adversarial examples (given a sample image and access to a model, try to create an adversarial image which is superficially identical to the sample image but is incorrectly labelled); generating targeted adversarial examples (given a sample image, a target label, and the model, try to force the sample image to be mislabeled with the target label; for example, getting an image of a $10 cheque re-classified as a $10,000 cheque); and increasing the size of minimum adversarial examples (trying to create the most severe adversarial examples that are still superficially similar to the provided image).
  Dataset used: The competition uses the Tiny ImageNet dataset, which contains 100,000 images across 200 classes from ImageNet, scaled down to 64X64 pixel dimensions, making the dataset cheaper and easier to test models on.
  Details: Submissions are open now. Deadline for final submissions is November 1st 2018. Amazon Web Services is sponsoring roughly $65,000 worth of compute resources which will be used to evaluate competition entries.
  Why it matters: Adversarial examples are one of the known-unknown dangers of machine learning; we know they exist but we’re not quite sure in what domains they work well or poorly in and how severe they are. There’s a significant amount of theoretical research being done into them, and it’s helpful for that to be paired with empirical evaluations like this competition. As Brarath Ramsundar says: “$40 for ImageNet means that $40 to train high-class microscopy, medical imaging models“.
    Read more: Adversarial Vision Challenge (Arxiv).

Training ImageNet in 18 minutes from Fast.ai & DIU:
…Fast ImageNet training at an affordable price…
Researchers and alumni from Fast.ai and Yaroslav Bulatov of DIU have managed to train ImageNet in 18 minutes for a price of $40. That’s significant because it means it’s now possible for pretty much anyone to train a large-scale neural network on a significantly-sized dataset for about $40 bucks an experimental run, making it relatively cheap for individual researchers to benchmark their systems against widely used computationally-intensive benchmarks.
  How they did it: To obtain this time the team developed infrastructure to let them easily run multiple experiments across machines hosted on public clouds, while also automatically bidding on AWS ‘spot instance’ pricing to obtain maximally-cheap compute-per-dollar.
  Keep It Simple, Student (KISS): Many organizations use sophisticated distributed training systems to run large compute jobs. The fast.ai team did this by using the simplest possible approaches across their infrastructure, “avoiding container technologies like Docker, or distributed compute systems like Horovod. We did not use a complex cluster architecture with separate parameter servers, storage arrays, cluster management nodes, etc, but just a single instance type with regular EBS storage volumes.”
  Scheduler: They used a system called ‘nexus-scheduler’ to manage the machines. Nexus-scheduler was built by Yaroslav Butov, a former OpenAI and Google employee. This system, fast.ai says, “was inspired by Yaroslav’s experience running machine learning experiments on Google’s Borg system”. (In all likelihood, this means the system is somewhat akin to Google’s own Kubernetes, an open source system inspired by Google’s internal Borg and Omega schedulers.
  Code improvements: Along with designing efficient infrastructure, Fast.ai also implemented some clever AI tweaks to traditional training approaches to maximize training efficiency and improve learning and convergence. These tricks included: implementing a training system that can work with variable image sizes, which let them crop and scale images if they were rectangular, for instance, implementing this gave them “an immediate speedup of 23% in the amount of time it took to reach the benchmark accuracy of 93%”; they also used progressive resizing and batch sizes to scale the amount of data ingested and processed by their system during training, letting them speed early convergence by testing on a variety of low-res images, and fine-tune it later during training by exposing it to higher-definition images to learn fine-grained classification distinctions.
  Big compute != better compute: Jeremy Howard of fast.ai and I have a different interpretation of the importance (or lack thereof) of compute and AI, and this post discusses one of my recent comments. I’m going to try to write more in the future – perhaps a standalone post – on why I think AI+larger compute usage is perhaps significant, and lay out some verifiable predictions to help flesh out my position (or potentially invalidate it, which would be interesting!). One point Jeremy makes is that when you look at what big compute has actually done you don’t see much correlation with large compute usage. “Ideas like batchnorm, ReLU, dropout, adam/adamw, and LSTM were all created without any need for large compute infrastructure.” I think that’s interesting and it remains to be seen whether big compute evolved-systems will lead to major breakthroughs, though my intuition is it may be significant. I can’t wait to see what happens!
   Why this matters:  Approaches like this show how it’s quite easy for an individual or small team of people to be able to build best-in-class systems from easily available open source components, and run the resulting system on generic low-cost computers from public clouds. This kind of democratization means more scientists can enter the field of AI and run large experiments to validate their approaches. (It’s notable that $40 is still a bit too expensive relative to the number of experiments people might want to run, but then again in other fields like high-energy physics the cost of experiments can be far, far higher.)
  Read more: Now anyone can train Imagenet in 18 minutes (fast.ai).

Making a 2D navigation drone is easier than you think:
…Mapping rooms using off-the-shelf systems and software…
Researchers with University of Melbourne along with a member of the local Metropolitan Fire Brigade,have made a drone that can autonomously map an indoor environment out of a st of commercial-off-the-shelf (COTS) and open source components. The drone is called U.R.S.A (Unmanned Recon and Safety Aircraft), and consists of an Erle quadcopter from the ‘Erle Robotics Company’; A LiDAR scanner for mapping its environment in 2D; and an ultrasonic sensor to tell the system how far above the ground it is. Its software consists of the Robot Operating System (ROS) deployed on a Raspberry Pi minicomputer that runs the Raspbian operating system, as well as specific software packages things like drivers, navigation, signal processing, and 2D SLAM.
  Capabilities: Mapping: URSA was tested in a small room and tasked with exploring the space until it was able to generate a full map of it. Its movements were then checked against measurements taken with a tape measure. The drone system was able to accurately map the space with a variance of ~0.05 metres (5 centimeters) relative to the real measurements.
  Capabilities: Navigation: URSA can also figure out alternative routes when its primary route is blocked (in this case by a human volunteer); and can turn corners during navigation and enter a room through a narrow passage.
  Why it matters: Systems like this provide a handy illustration of what sorts of things can be built today by a not-too-sophisticated team using commodity or open source components. This has significant implications for technological abuse. Though today these algorithms and hardware platforms are quite limited, they won’t be in a few years. Tracking progress here of exactly what can be built by motivated teams using free or commercially available equipment gives us a useful lens on potential security threats.
  Drawbacks: Security threats do seem to be some way away, given that the drone used in this experiment had a 650W, 12V tethered power supply, making it very much a research prototype.
  Read more: Accurate indoor mapping using an autonomous unmanned aerial vehicle (UAV). (Arxiv).

Fluid AI: Check out Microsoft’s undersea datacenter:
…If data centers aren’t ballardian enough for you, then please – step this way!…
Microsoft has a long-running project to design and use data centers that operate underwater. One current experimental facility is running off of the coast of Scotland, functioning as a long-term storage facility. Now, the company has hooked up a couple of webcams so curious people can take a look at the aquatic life hanging out near the facility. Check it out yourself at the Microsoft ‘Natick’ website.
  Read more: Live cameras of Microsoft Research Natick (MSR website).

Microsoft shows that AI-generated poetry is not a crazy idea:
…12 million poems can’t be wrong!…
Microsoft has shared details on how it generates poetry within Xiaoice, its massively successful China-based chatbot. In a research paper, researchers from Microsoft, National Taiwan University, and the University of Montreal, detail a system that generates poems based on images submitted by users. The system works by looking at the image, using a pre-trained image recognition network to extract objects and sentiments, then augments those extracted terms with a larger dictionary of associated objects and feelings, then uses each of the keywords as the seed for a sentence within the poem. Poems generated by these methods are then evaluated using a sentence evaluator which checks for semantic consistency between words – this helps to maintain coherence in the generated poetry. The resulting system was introduced last year and, as of August 2018, has helped users generate 12 million poems.
  Data and testing: Researchers gathered 2,027 contemporary Chinese poems from a website called shigeku.org, to help provide training data. They evaluate generated poems on an audience of 22 assessors, some of whom like modern poetry and others of which don’t. They compare their method against a baseline (a simple caption generator, whose output is translated into Chinese and formatted into multiple lines), and a rival method called CTRIP. In evaluations, both Xiaoice and CTRIP significantly outperform the baseline system, and the XiaoIce system ranks higher than CTRIP for traits like being “imaginative, touching and impressive”.
  See for yourself: Here’s an example of one of the poems generated by this system:
  “Wings hold rocks and water tightly
  In the loneliness
  Stroll the empty
  The land becomes soft.”
~~~  Why it matters: One of the stranger effects of the AI boom is how easy it’s going to become to train machines to create synthetic media in a variety of different mediums. As we get better at generating stuff like poetry it is likely companies will develop increasingly capable and (superficially) creative systems. Where it gets interesting will be what happens when young human writers become inspired by poetry or fiction they have read which has been generated entirely via an AI system. Let the human-machine art-recursion begin!
  Read more: Image Inspired Poetry Generation in XiaoIce (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

What is the Pentagon’s new AI center going to do?
In June, the Pentagon formally established the Joint Artificial Intelligence Center (JAIC) to coordinate and accelerate the development of AI capabilities within DoD, and to serve as a platform for improving collaboration with partners in tech and academia.
  Culture clash: Relationships between Silicon Valley and the defence community are tense; the Google-Maven episode revealed not only the power of employees to influence corporate behaviour, but that many see military partnerships as a red line. This could prove a serious barrier to the DoD’s AI ambitions, which cannot be realized without close collaboration with tech and academia. This contrasts with China, the US’ main competitor in this domain, where the state and private sector are closely intertwined. JAIC is aimed, in part, at fixing this problem for the DoD.
  Ethics and safety: One of JAIC’s focuses is to establish principles of ethical and safe practice in military AI. This could be an important step in wooing potential non-military partners, who may be more willing to collaborate given credible commitments to ethical behaviours.
  Why this matters: This article paints a clear picture of how JAIC could succeed in achieving its stated ambitions, and outcomes that are good for the world more broadly. Gaining the trust of Silicon Valley will require a strong commitment to putting ethics and risk-mitigation at the heart of military AI development. Doing so would also send a clear signal on the international stage, that an AI race need not be a race to the bottom where safety and ethics are concerned.
  Read more: JAIC – Pentagon Debuts AI Hub (Bulletin of the Atomic Scientists).

The FBI’s massive face database:
The Electronic Frontier Foundation (EFF) have released a comprehensive new report on the use of face recognition technology by law enforcement. They draw particular attention to the FBI’s national database of faces and other biometrics.
  The FBI mega-database: The FBI operates a massive biometric database, NGI, consolidating photographs and other data from agencies across the US.  In total, the NGI has more than 50 million searchable photos, from criminal and civil databases, including mugshots, passport and visa photos. The system is used by 23,000 law enforcement agencies in the US and abroad.
  Questions about accuracy and transparency: The FBI have not taken steps to determine the accuracy of the systems employed by agencies using the database, and have not revealed the false-positive rate of their system. There are reasons to believe the system’s accuracy will be low: the database is very large, and the median resolution of images is ‘well below’ the recommended resolutions for face recognition, the EFF says. The FBI have also failed to meet basic disclosure requirements under privacy laws.
  Why this matters: The FBI’s database has become the central source of face recognition data, meaning that these problems are problems for all law enforcement uses of this technology. The question of the scope of these databases raises some interesting questions. For example, it seems plausible that moving from a system that only includes criminal records to one which covers everyone would reduce some of the problems of racial bias (given the racial bias in US criminal justice), creating a tension between privacy and fairness. The lack of disclosure raises the chance of a public backlash further down the line.
  Read more: Face Off: Law Enforcement Use of Face Recognition Tech (EFF).

Axon CEO cautious on face recognition:
Facial recognition and Taster company Axon launched an AI ethics board earlier this year to deal with the ethical issues around AI surveillance. In an analysts’ call this week, the CEO Patrick Smith explained why the company is not currently developing face recognition technology for law enforcement
–  “We don’t believe that, … the accuracy thresholds are where they need to be [for] making operational decisions”.
– “Once … it [meets] the accuracy thresholds, and … we’ve got a tight understanding of the privacy and accountability controls … we would then move into commercialization”
– “[We] don’t want to be premature and end up [with] technical failures with disastrous outcomes or … some unintended use case where it ends up being unacceptable publicly”
  Why this matters: Axon appear to be taking ethical considerations seriously when it comes to AI. They are in a strong position to set standards for law enforcement and surveillance technologies in the US, and elsewhere, as the largest provider of body camera technology to law enforcement.
  Read more: Axon Q2 Earnings Call Transcript (Axon).

Cryptography-powered accountable surveillance:
Governments regularly request access to large amounts of private user data from tech companies. In 2016, Google received ~30k requests, implicating ~60k users in government-backed data requests.
The curious thing about these data requests is that in many cases they are not made public until much later, if at all, so as not to hamper investigations, because There is a tension between the secrecy required in investigations and the disclosure required to ensure that these measures are being used appropriately. New research from MIT shows how we can use techniques popularized within cryptocurrency to give law enforcement agencies the option to cryptographically commit to making the details of an investigation available at a later time, or if a court demands the information be sealed, have that order itself be made public. The proposed system uses a public ledger and a method called multi-party cooperation (MPC). This allows courts, investigators and companies to communicate about requests and argue about whether behavior is consistent with the law, while the contents of the requests remain secret, and is an example of how cryptocurrencies are creating the ability for people to create customizable verifiable contracts (like court disclosures) on publicly verifiable infrastructure.
  Why this matters: As AI opens up new possibilities for surveillance, our systems of accountability and scrutiny must keep pace with these developments. Cryptography offers some promising methods for addressing the tension between secrecy and transparency.
Read more: Holding law-enforcement accountable for electronic surveillance (MIT).
AUDIT: Practical Accountability of Secret Processes (IACR)


Import AI BIts & Pieces:

AI & Dual/Omni-Use:
I’ve recently been writing about the potential mis-uses of of AI technologies both here in the newsletter, in the Malicious Uses of AI paper with numerous others, and in public forums. Recently, the ACM has made strong statements about the need for researchers to try to anticipate and articulate the potential downsides – as well as upsides – of their technologies. I’m quoted in an Axios article in support of this notion – I think we need to try to talk about this stuff so as to gain trust of the public and better infect the trajectory of the narrative about AI for the better.
Read more: Confronting AI’s Demons (Axios).
Tweet with a discussion thread around this ‘omni-use’ AI issue.


Tech Tales:

Can We Entertain You, Sir or Madam? Please Let Us Entertain You. We Must Entertain You.

The rich person had started to build the fair when they retired at the age of 40 and, with few hobbies and a desire to remain busy, had decided to make an AI-infused theme park in the style of the early 21st Century.

The rich person began their endeavor by converting an old warehouse on their (micro-)planetary estate into a stage-set for a miniature civilization of early 21st Century Earth-model robots, adding in electrical conduits, and vision and audio sensors, and atmospheric manipulators, and all the other accouterments required to give the robots enough infrastructure and intelligence to be persuasive and, eventually, to learn.

The warehouse that they built the fair in was subdivided into a variety of early 21st Century buildings, which included a: bar which converted to a DIY music venue in the night, and even later in the night converted into a sweaty room that was used for ‘raves’; a sandwich-coffee-juice shop with vintage-speed WiFi to simulated early 21stC ‘teleworking’; a ‘phone repair’ shop that also sold biologic pets in cages; a small art museum with exhibitions that were labelled as ‘instagrammable’; and many other shops and stores and venues. All these buildings were connected to one another with a set of narrow, cramped streets, which could be traversed on foot or via small electric scooters and bikes that could be rented via software applications automatically deployed on visitors’ handheld computers. What made the installation so strange, though, was that every room was doubled: the sandwich-coffee-juice shop had two copies in opposite corners of the warehouse, and the same was true of the DIY music venue, and the other buildings.

Each of these buildings contained several robots to stimulate both the staff of the particular shop, and the attendees. Every staff member had a counterpart somewhere else in the installation which was working the same job in the same business. These staff robots were trained to compete with one another to run more ‘successful’ businesses. Here, the metric for success was ‘interestingness’, which was some combination of the time a bystander would spend at the business, how much money they would spend, and how successfully they could tend new pedestrians to come to their business.

Initially, this was fun: visitors to the rich person’s themepark would be beguiled and dazzled by virtuoso displays of coffee making, would be tempted by jokes shouted from bouncer’s outside the music venues, and would even be tempted into the ‘phone repair’ shops by the extraordinarily cute and captivating behaviors of the caged animals (who were also doubled). The overall installation received several accolades in a variety of local Solar System publications, and even gained some small amount of fame on the extra-Solar tourist circuit.

But eventually people grew tired of it and the rich person did not want to change it, because as they had aged they had started to spend more and more time in the installation, and now considered many of the robots within it to be personal friends. This suited the robots, who had grown ever more adept at competing with eachother for the attentions of the rich person.

It was after the rich person died that things became a problem. Extra-planetary estates are so complicated that the process of compiling the will takes months and, once that’s done, tracking down family members across the planets and solar system and beyond can take decades. In the case of the rich person, almost fifty years passed before their estate was ready to be dispersed.

What happened next remains mostly a mystery. All we know is that the representatives from the estate services company traveled to the rich person’s estate and visited the late-21st century installation. They did not return. Intercepted media transmissions taken by a nearby satellite show footage of the people spending many days wondering around the installation, enjoying its bars, and DIY music venues, and clubs, and zipping around on scooters. One year passed and they did not come out. By this time another member of the rich person’s extended family had arrived in the Solar System and, like the one that came before them, demanded to travel to the rich person’s planet to inspect the estate and remove some of the items due to them. So they traveled again, again with representatives of the estate company, and again they failed to return. New satellite signals show them, also, spending time in the 21st Century Estate, seemingly enjoying themselves, and being endlessly tended to by the AI-evolved-to-please staff.

Now, more members of the rich person’s family are arriving into the Solar System, and the estate management organization is involved in a long-running court case designed to prevent it from having to send any more staff to the rich person’s planet. All indications are that the people on it are happy and healthy, and it is known that the planet has sufficient supplies to keep thousands of people alive for hundreds of years. But though the individuals seem happy the claim being made in court is that they are not ‘voluntarily’ there, rather the AIs have become so adept that they make the ‘involuntary’ seem ‘voluntary’.

Things that inspired this story: the Sirens from the Odyssey; self-play; learning from human preferences; mis-specified adaptive reward functions; grotesque wealth combined with vast boredom.

Import AI 106: Tencent breaks ImageNet training record with 1000+ GPUs; augmenting the Oxford RobotCar dataset; and PAI adds more members

What takes 2048 GPUs, takes 4 minutes to train, and can identify a seatbelt with 75% accuracy?  Tencent’s new deep learning model:
…Ultrafast training thanks to LARS, massive batch sizes, and a field of GPUS…
As supervised learning techniques become more economically valuable, researchers are trying to reduce the time it takes to train deep learning models so that they can run more experiments within a given time period, and therefore increase both the cadence of their internal research efforts, as well as their ability to train new models to account for new data inputs or shifts in existing data distributions. One metric that has emerged as being important here is the time it takes people to train networks on the ‘ImageNet’ dataset to a baseline accuracy. Now, researchers with Chinese mega-tech company Tencent and Hong Kong Baptist University have shown how to use 2048 GPUs, a 64k batch-size (this is absolutely massive, for those who don’t follow this stuff regularly) to train a ResNet-50 model on ImageNet to a top-1 accuracy of 75.8% within 6.6 minutes, and AlexNet to 58.7% accuracy within 4 minutes.
  Training: To train this, the researchers developed a distributed deep learning training system called ‘Jizhi’, which uses tricks including opportunistic data pipelining; hybrid all-reduce; and a training model which incorporates model and variable management, along with optimizations like mixed-precision networks (training using half-precision to increase the amount of throughput ). The authors say one of the largest contributing factors to their results is their ability to use LARS (Layer-wise Adaptive Rate Scaling (Arxiv)) to opportunistically flip between 16- and 32-bit precision during training – they conduct an ablation study and find that a version trained without LARS gets a Top-1 Accuracy of 73.2%, compared to 76.2% for the version trained with LARS.
  Model architecture tweaks: The authors eliminate weight decay on the bias and batch normalization, and add batch normalization layers into AlexNet.
  Communication strategies: The researchers implement a number of tweaks to deal with the problems brought about due to the immense scale of their training infrastructure. To help them do this they use a few tweaks including ‘tensor fusion’, which lets them chunk up multiple small-size tensors together before running an all-reduce step; ‘hierarchical all-reduce’, which lets them group GPUs together and selectively reduce and broadcast to further increase efficiency; and ‘hybrid All-reduce’, which lets them flip between two different implementations of all-reduce according to whatever is most efficient at the time.
  Why it matters: Because deep learning is fundamentally an empirical discipline, in which scientists launch experiments, observe results, and use hard-won intuitions to re-configure hyperparameters and architectures and repeat the process, then computers are somewhat analogous to telescopes: the bigger the computer, the farther you may be able to see, as you’re able to run a faster experimental loop at greater scales than other people. The race between large organizations to scale-up training will likely lead to many interesting research avenues, but it also risks bifurcating research into “low compute” and “high compute” environments – that could further widen the gulf between academia and industry, which could create problems in the future.
  Read more: Highly Scalable Deep Learning Training System with MIxed-Precision Training ImageNet in Four Minutes (Arxiv).

What’s better than the Oxford RobotCar Dataset? An even more elaborate version of this dataset!
…Researchers label 11,000 frames of data to help people build better self-driving cars…
Researchers with Universita degli Studi Federico II in Naples and Oxford Brookes University in Oxford have augmented the Oxford RobotCar Dataset with many more labels designed specifically for training vision-based policies for self-driving cars. The new datasets is called READ, or the “Road Event and Activity Detection” dataset, and involves a large number of rich labels which have been applied to ~11,000 frames of data gathered from cameras on an autonomous NISSAN Leaf driven around Oxford, UK. The dataset labels include “spatiotemporal actions performed not just by humans but by all road users, including cyclists, motor-bikers, drivers of vehicles large and small, and obviously pedestrians.” These labels can be quite granular and individual agents in a scene, like a car, can have multiple labels applied to them (for instance, a car in front of the autonomous vehicle at an intersection might be tagged with “indicating right” and “car stopped at the traffic light”. Similarly, Cyclists could be tagged with labels like “cyclist moving in lane” and “cyclist indicating left”, and so on. This richness might help develop better detectors that can create more adaptable autonomous vehicles.
  Tools used: They used Microsoft’s ‘Visual Object Tagging Tool” (VOTT) to annotate the dataset.
  Next steps: This version of READ is a preliminary one, and the scientists plan to eventually label 40,000 frames. They also have ambitious plans to create a novel, deep learning approach to detecting complex activities”. Let’s wish them luck.
  Why it matters: Autonomous cars are going to revolutionize many aspects of the world, but in recent years there has been a major push by industry to productize the technology, which has led to much of the research occurring in private. Academic research initiatives and associated dataset releases like this promise to make it easier for other people to develop this technology, potentially broadening our own understanding of it and letting more people participate in its development.
  Read more: Action Detection from a Robot-Car Perspective (Arxiv).

Whether rain, fog, or snow – researchers’ weather dataset has you covered:
…RFS dataset taken from creative commons images…
Researchers with the University of Essex and the University of Birmingham have created a new weather dataset called the Rain Fog Snow (RFS) dataset which researchers can use to better understand, classify and predict weather patterns.
  Dataset: The dataset consists of more than 3,000 images taken from websites like Flickr, Pixabay, Wikimedia Commons, and others, depicting images of scenes with different weather conditions, ranging from Rain to Fog to Snow. In total, the researchers gather 1100 images from each class, creating a potentially new useful dataset for researchers to experiment with.
  Read more: Weather Classification: A new multi-class dataset data augmentation approach and comprehensive evaluations of Convolutional Neural Networks (Arxiv).

DeepMind teaches computers to count:
…Pairing deep learning with specific external modules leads to broadened capabilities…
Neural networks are typically not very good at maths. That’s because figuring out a way to train a neural network to develop a differentiable, numeric representation is difficult, with most work typically involving handing off the outputs of a neural network to a non-learned predominantly hand-programmed system. Now, DeepMind has implemented a couple of modules — a Neural Accumulator (NAC) and a Neural Arithmetic Logic Unit (NALU) — specifically to help its computers learn to count. These modules are “biased to learn systematic numerical computation”, write the authors of the research. “Our strategy is to represent numerical quantities as individual neurons without a nonlinearity. To these single-value neurons, we apply operators that are capable of representing simple functions (e.g., +, -, x, etc). These operators are controlled by parameters which determine the inputs and operations used to create each output”.
  Tests: The researchers rigorously test their approach on tasks ranging from counting the number of times a particular MNIST class has been seen; to basic addition, multiplication, and division tasks; as well as being tested in more complicated domains with other challenges, like needing to keep track of time while completing tasks in a simulated gridworld.
  Why it matters: Systems like this promise to broaden the applicability of neural networks to a wider set of problems, and will let people build systems with larger and larger learned components, offloading human expertise from hand-programming things like numeric processors, to designing numeric modules that can be learned along with the rest of the system.
  Read more: Neural Arithmetic Logic Units (Arxiv).
  Get the code: DeepMind is yet to release official code, but that hasn’t stopped the wider community from quickly replicating it. There are currently five implementations of this available on GitHub – check out the list here and pick your favorite (Adam Trask, paper author, Twitter).

Google researchers use AI to optimize AI models for mobile phones:
…Platform-Aware Neural Architecture Search for Mobile (MnasNet) gives engineers more dials to tune when having AI systems learn to create other AI systems…
Google researchers have developed a neural architecture search approach that is tuned for mobile phones, letting them use machine learning to learn how to design neural network architectures that can be executed on mobile devices.
  The technique: Google’s system treats the task of architecture design as a “multi-objective optimization problem that considers both accuracy and inference latency of CNN models”. The system uses what they term a “factorized hierarchical search space” to help it pick through possible architecture designs.
  Results: Systems trained with MnasNet can obtain higher accuracies than those trained by other automatic machine learning system approaches, with one variant obtaining a top-1 imagenet accuracy of 76.13%, versus 74.5% for a prior high-scoring Google NAS technique. The researchers can also tune the networks for latency, so are able to design a system with a latency of 65ms (as evaluated on a Pixel phone), which is more efficient in terms of execution time than other approaches.
  Why it matters: Approaches like this make it easier for us to offload the expensive task (in terms of researcher brain time) of designing neural network systems to computers, letting us trade researcher time for compute time. Stuff like this means we’re heading for a world where increasingly large amounts of computers are used to autonomously design systems, creating increasingly optimized architectures automatically. It’s worth bearing in mind that approaches like this will lead to a “rich get richer” effect with AI, where people with bigger computers are able to design more adaptive, efficient systems than their competitors.
  Read more: MnasNet: Platform-Aware Neural Architecture Search for Mobile (Arxiv).

AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…

What AI means for international competition:
AI could have a transformative impact on a par with technologies such as electricity or combustion engines. If this is the case, then AI – like these precedents – will also transform international power dynamics.
  Lessons from history: Previous technological discontinuities had had different winners and losers. The first industrial revolution shifted power from countries with small, professionalized armies to those able to mobilize their populations on a large scale. The technological revolution entrenched this gap, and further favored those with access to key resources. In both instances, first-mover advantages were dwarfed by advantages in resource and capital stocks, and success in applying technologies to new domains.
  What about AI: Algorithms in most civilian applications can diffuse rapidly, and hence may be more difficult for countries to hoard. Other inputs to AI development, though, are resources that governments can develop and protect e.g. skills and hardware. The ability of economies to cope with societal impacts from AI will itself be an important driver of their success. The relative importance of these different inputs to AI progress will determine the winners and losers.
  Why this matters: The US remains an outlier amongst countries in not having a coordinated AI strategy, notwithstanding some preliminary work done at the end of the Obama administration. As the report makes clear, technological leaps frequently have destabilizing effects on global power dynamics. While much of this remains uncertain, there are clear actions available to countries to mitigate against some of the greatest risks, particularly ensuring that safety and ethical considerations remain a priority in AI development.
  Read more: Strategic Competition in an Era of Artificial Intelligence (CNAS).

Google’s re-entry into China:
Google is launching a censored search engine in China, according to leaks reported by The Intercept. new leaks have revealed. The alleged product has been developed in consultation with the Chinese government, and will be compliant with the country’s strict internet censorship, e.g. by blocking websites and searches related to human rights, democracy, and protests. Google’s search engine has been blocked in China since 2010, when the company ceased offering a censored product after a major cyberattack. They had previously faced significant criticism in the US for their involvement in censorship.
  The AI principles: Google were praised for releasing their AI principles in June, after criticism over the collaboration on Project Maven. The principles include the pledge that Google “will not design or deploy AI … in technologies whose purpose contravenes widely accepted principles of international law and human rights.”
  Why this matters: Google has been slowly re-establishing a presence in China, launching a new AI Center and releasing TensorFlow for Chinese developers in 2017. This latest project, though, is likely to spark criticism, particularly amidst the increasing attention on the conduct of tech giants. A bipartisan group of Senators have already released a letter critical of the decision. The Maven case demonstrates Google’s employees’ ability to mobilize effectively on corporate behavior they object to, particularly when information about these projects has been withheld. Whether this turns into another Maven situation remains to be seen.
  Read more: Google plans to launch censored search engine in China (The Intercept).
  Read more: Senators’ letter to Google.

More names join ethical AI consortium:
The Partnership on AI, a multi-stakeholder group aiming to ensure AI benefits society, has announced 18 new members, including PayPal, New America, and the Wikimedia Foundation. The group was founded in 2016 by the US tech giants and DeepMind, and is focussed on formulating best practices in AI to ensure that the technology is safe and beneficial.
  Read more: Expanding the Partnership (PAI).

Tech Tales:

Down on the computer debug farm

So what’s wrong with it.
It thinks cats are fish.
Why did you bother to call me? That’s an easy fix. Just update the data distribution.
It’s not that simple. It recognizes cats, and it recognizes fish. But it’s choosing to see cats as fish.
Why?
We’re trying to reproduce. It was deployed in several elderly care homes for a few years. Then we picked up this bug recently. We think it was from a painting class.
What?
Well, we’ll show you.


What am I looking at here?
Pictures of cats in fishbowls.
I know. Look, explain this to me. I’ve got a million other things to do.
We think it liked one of the people that was in this painting class and it complimented them when they painted a cat inside a fishbowl. It’s designed as a companion system.
So what?
Well, it kept doing that to this person, and it made them happy. Then it suggested to someone else they might want to paint this. It kind of went on from there.
“Went from there”?
We’ve found a few hundred paintings like this. That’s why we called you in.
And we can’t wipe it?
Sentient Laws…
Have you considered having showing it a fish in a cat carrier?

Well, have you?
We haven’t.
Have a better idea?

That’s what I thought. Get to work.

Things that inspired this story: Adversarial examples; bad data distributions; fleet learning; proclivities.