Import AI 107: Training ImageNet in 18 minutes for $40; courteous self-driving cars; and Google evolves alternatives to backprop
by Jack Clark
Better robot cars through programmed courteousness:
…Defining polite behaviors leads to better driving for everyone…
How will self-driving cars and humans interact? That’s a difficult question, since AI systems tend to behave differently to humans when trying to solve tasks. Now researchers with the University of California at Berkeley have tried to come up with a way to program ‘courteous’ behavior into self-driving cars to make them easier for humans to interact with. Their work deals with situations where humans and cars must anticipate each other’s actions, like when both approach an intersection, or change lanes. “We focus on what the robot should optimize in such situations, particularly if we consider the fact that humans are not perfectly rational”, they write.
Programmed courteousness: Because “humans … weight losses higher than gains when evaluating their actions” the researchers formalize the relationship between robot-driven and human-driven cars with this constraint, and develop a theoretical framework to let the car predict actions it can take to benefit the driving experience of a human. The researchers test their courteous approach by simulating scenarios involving simulated humans and self-driving cars. These include: changing lanes, in which more courteous cars lead to less inconvenience for the human; and turning left, in which the self-driving car will wait for the human to pass at an intersection and thereby reduce disruption. The results show that cars programmed with a sense of courteousness tend to improve the experience of human’s driving on their roads, and the higher the scientist sets the courteousness parameter, the better the experience the human drivers have.
Multiple agents: The researchers also observe how courteousness works in complex situations that involve multiple cars. In one scenario “an interesting behavior emerges: the autonomous car first backs up to block the third agent (the following car) from interrupting the human driver until the human driver safely passes them, and then the robot car finishes its task. This displays truly collaborative behavior, and only happens with high enough weight on the courtesy term. This may not be practical for real on-road driving, but it enables the design of highly courteous robots in some particular scenarios where human have higher priority over all other autonomous agents,” they write.
Why it matters: We’re heading into a future where we deploy autonomous systems into the same environments as humans, so figuring out how to create AI systems that can adapt to human behaviors and account for the peculiarities of people will speed uptake. In the long term, development of such systems may also give us a better sense of how humans themselves behave – in this paper, the researchers make a preliminary attempt at this by modeling how well their courteousness techniques predict real human behaviors.
Read more: Courteous Autonomous Cars (Arxiv).
Backprop is great, but have you tried BACKPROP EVOLUTION?
…Googlers try to evolve replacement to the widely used gradient calculation technique…
Google researchers have used evolution to try and find a replacement for back-propagation, one of the fundamental algorithms used in today’s neural network-based systems. The Google researchers try to do this by offloading the task of figuring out such an alternative to computers. They do this by designing a domain-specific language (DSL) which describes mathematical formulas like back-propagation in functional terms, then they use this DSL to search through the mathematical space to find improved versions of the algorithm. This lets them run an evolutionary search process where they use the DSL to automatically explore the mathematical space of such algorithms and periodically evaluated evolved candidates by using candidate algorithms to train a Wide ResNet with 16 layers on the CIFAR-10 dataset.
Evaluation: Following the evolution search, the researchers evaluate well-performing algorithms on a Wide ResNet (the same one used during the evolution phase) as well as a larger ResNet, both tested for 20 epochs; they also evaluate performance in longer training regimes by testing performance on a ResNet for 100 epochs.
So, did they come up with something better than back-propagation? Sort of: The best performing algorithms found through this evolutionary search display faster initial training times than back-propagation, but when evaluated for 100 epochs show the same performance as methods trained with traditional back-propagation. “The previous search experiment finds update equations that work well at the beginning of training but do not outperform back-propagation at convergence. The latter result is potentially due to the mismatch between the search and the testing regimes, since the search used 20 epochs to train child models whereas the test regime uses 100 epochs,” they write. That initial speedup could hold some advantages, but the method will need to be proved out more at larger epochs to see if it can develop something that scales better to larger-than-trained-upon temporal sequences.
Why it matters: This work fits within a pattern displayed by some AI researchers – typically ones who work at organizations with very large quantities of computers – of trying to evolve algorithmic breakthroughs, rather than designing them themselves. This sort of research seems of a different kind to other research, seeing people try to offload the work of problem solving to computers, and instead use their scientific skills to set up the parameters of the evolutionary process that might find a solution. It remains to be seen how effective these techniques are in practice, but it’s a definite trend. The question is whether the relative computational inefficiency of such techniques is worth the trade-off.
Read more: Backprop Evolution (Arxiv).
Think your image classifier is tough? Test it on the Adversarial Vision Challenge:
…Challenge tests participants’ ability to create more powerful adversarial inputs…
A team of researchers from the University of Tubingen, Google Brain, Pennsylvania State University and EPFL have created the ‘Adversarial Vision Challenge’, which “is designed to facilitate measurable progress towards robust machine vision models and more generally applicable adversarial attacks”. Adversarial attacks are like optical illusions for machine learning systems, altering the pixels of an image in a way indistinguishable to human eyes but which causes the deployed AI classifier to label an image incorrectly.
The tasks: Participants will be evaluated on their skills at three tasks: generating untargeted adversarial examples (given a sample image and access to a model, try to create an adversarial image which is superficially identical to the sample image but is incorrectly labelled); generating targeted adversarial examples (given a sample image, a target label, and the model, try to force the sample image to be mislabeled with the target label; for example, getting an image of a $10 cheque re-classified as a $10,000 cheque); and increasing the size of minimum adversarial examples (trying to create the most severe adversarial examples that are still superficially similar to the provided image).
Dataset used: The competition uses the Tiny ImageNet dataset, which contains 100,000 images across 200 classes from ImageNet, scaled down to 64X64 pixel dimensions, making the dataset cheaper and easier to test models on.
Details: Submissions are open now. Deadline for final submissions is November 1st 2018. Amazon Web Services is sponsoring roughly $65,000 worth of compute resources which will be used to evaluate competition entries.
Why it matters: Adversarial examples are one of the known-unknown dangers of machine learning; we know they exist but we’re not quite sure in what domains they work well or poorly in and how severe they are. There’s a significant amount of theoretical research being done into them, and it’s helpful for that to be paired with empirical evaluations like this competition. As Brarath Ramsundar says: “$40 for ImageNet means that $40 to train high-class microscopy, medical imaging models“.
Read more: Adversarial Vision Challenge (Arxiv).
Training ImageNet in 18 minutes from Fast.ai & DIU:
…Fast ImageNet training at an affordable price…
Researchers and alumni from Fast.ai and Yaroslav Bulatov of DIU have managed to train ImageNet in 18 minutes for a price of $40. That’s significant because it means it’s now possible for pretty much anyone to train a large-scale neural network on a significantly-sized dataset for about $40 bucks an experimental run, making it relatively cheap for individual researchers to benchmark their systems against widely used computationally-intensive benchmarks.
How they did it: To obtain this time the team developed infrastructure to let them easily run multiple experiments across machines hosted on public clouds, while also automatically bidding on AWS ‘spot instance’ pricing to obtain maximally-cheap compute-per-dollar.
Keep It Simple, Student (KISS): Many organizations use sophisticated distributed training systems to run large compute jobs. The fast.ai team did this by using the simplest possible approaches across their infrastructure, “avoiding container technologies like Docker, or distributed compute systems like Horovod. We did not use a complex cluster architecture with separate parameter servers, storage arrays, cluster management nodes, etc, but just a single instance type with regular EBS storage volumes.”
Scheduler: They used a system called ‘nexus-scheduler’ to manage the machines. Nexus-scheduler was built by Yaroslav Butov, a former OpenAI and Google employee. This system, fast.ai says, “was inspired by Yaroslav’s experience running machine learning experiments on Google’s Borg system”. (In all likelihood, this means the system is somewhat akin to Google’s own Kubernetes, an open source system inspired by Google’s internal Borg and Omega schedulers.
Code improvements: Along with designing efficient infrastructure, Fast.ai also implemented some clever AI tweaks to traditional training approaches to maximize training efficiency and improve learning and convergence. These tricks included: implementing a training system that can work with variable image sizes, which let them crop and scale images if they were rectangular, for instance, implementing this gave them “an immediate speedup of 23% in the amount of time it took to reach the benchmark accuracy of 93%”; they also used progressive resizing and batch sizes to scale the amount of data ingested and processed by their system during training, letting them speed early convergence by testing on a variety of low-res images, and fine-tune it later during training by exposing it to higher-definition images to learn fine-grained classification distinctions.
Big compute != better compute: Jeremy Howard of fast.ai and I have a different interpretation of the importance (or lack thereof) of compute and AI, and this post discusses one of my recent comments. I’m going to try to write more in the future – perhaps a standalone post – on why I think AI+larger compute usage is perhaps significant, and lay out some verifiable predictions to help flesh out my position (or potentially invalidate it, which would be interesting!). One point Jeremy makes is that when you look at what big compute has actually done you don’t see much correlation with large compute usage. “Ideas like batchnorm, ReLU, dropout, adam/adamw, and LSTM were all created without any need for large compute infrastructure.” I think that’s interesting and it remains to be seen whether big compute evolved-systems will lead to major breakthroughs, though my intuition is it may be significant. I can’t wait to see what happens!
Why this matters: Approaches like this show how it’s quite easy for an individual or small team of people to be able to build best-in-class systems from easily available open source components, and run the resulting system on generic low-cost computers from public clouds. This kind of democratization means more scientists can enter the field of AI and run large experiments to validate their approaches. (It’s notable that $40 is still a bit too expensive relative to the number of experiments people might want to run, but then again in other fields like high-energy physics the cost of experiments can be far, far higher.)
Read more: Now anyone can train Imagenet in 18 minutes (fast.ai).
Making a 2D navigation drone is easier than you think:
…Mapping rooms using off-the-shelf systems and software…
Researchers with University of Melbourne along with a member of the local Metropolitan Fire Brigade,have made a drone that can autonomously map an indoor environment out of a st of commercial-off-the-shelf (COTS) and open source components. The drone is called U.R.S.A (Unmanned Recon and Safety Aircraft), and consists of an Erle quadcopter from the ‘Erle Robotics Company’; A LiDAR scanner for mapping its environment in 2D; and an ultrasonic sensor to tell the system how far above the ground it is. Its software consists of the Robot Operating System (ROS) deployed on a Raspberry Pi minicomputer that runs the Raspbian operating system, as well as specific software packages things like drivers, navigation, signal processing, and 2D SLAM.
Capabilities: Mapping: URSA was tested in a small room and tasked with exploring the space until it was able to generate a full map of it. Its movements were then checked against measurements taken with a tape measure. The drone system was able to accurately map the space with a variance of ~0.05 metres (5 centimeters) relative to the real measurements.
Capabilities: Navigation: URSA can also figure out alternative routes when its primary route is blocked (in this case by a human volunteer); and can turn corners during navigation and enter a room through a narrow passage.
Why it matters: Systems like this provide a handy illustration of what sorts of things can be built today by a not-too-sophisticated team using commodity or open source components. This has significant implications for technological abuse. Though today these algorithms and hardware platforms are quite limited, they won’t be in a few years. Tracking progress here of exactly what can be built by motivated teams using free or commercially available equipment gives us a useful lens on potential security threats.
Drawbacks: Security threats do seem to be some way away, given that the drone used in this experiment had a 650W, 12V tethered power supply, making it very much a research prototype.
Read more: Accurate indoor mapping using an autonomous unmanned aerial vehicle (UAV). (Arxiv).
Fluid AI: Check out Microsoft’s undersea datacenter:
…If data centers aren’t ballardian enough for you, then please – step this way!…
Microsoft has a long-running project to design and use data centers that operate underwater. One current experimental facility is running off of the coast of Scotland, functioning as a long-term storage facility. Now, the company has hooked up a couple of webcams so curious people can take a look at the aquatic life hanging out near the facility. Check it out yourself at the Microsoft ‘Natick’ website.
Read more: Live cameras of Microsoft Research Natick (MSR website).
Microsoft shows that AI-generated poetry is not a crazy idea:
…12 million poems can’t be wrong!…
Microsoft has shared details on how it generates poetry within Xiaoice, its massively successful China-based chatbot. In a research paper, researchers from Microsoft, National Taiwan University, and the University of Montreal, detail a system that generates poems based on images submitted by users. The system works by looking at the image, using a pre-trained image recognition network to extract objects and sentiments, then augments those extracted terms with a larger dictionary of associated objects and feelings, then uses each of the keywords as the seed for a sentence within the poem. Poems generated by these methods are then evaluated using a sentence evaluator which checks for semantic consistency between words – this helps to maintain coherence in the generated poetry. The resulting system was introduced last year and, as of August 2018, has helped users generate 12 million poems.
Data and testing: Researchers gathered 2,027 contemporary Chinese poems from a website called shigeku.org, to help provide training data. They evaluate generated poems on an audience of 22 assessors, some of whom like modern poetry and others of which don’t. They compare their method against a baseline (a simple caption generator, whose output is translated into Chinese and formatted into multiple lines), and a rival method called CTRIP. In evaluations, both Xiaoice and CTRIP significantly outperform the baseline system, and the XiaoIce system ranks higher than CTRIP for traits like being “imaginative, touching and impressive”.
See for yourself: Here’s an example of one of the poems generated by this system:
“Wings hold rocks and water tightly
In the loneliness
Stroll the empty
The land becomes soft.”
~~~ Why it matters: One of the stranger effects of the AI boom is how easy it’s going to become to train machines to create synthetic media in a variety of different mediums. As we get better at generating stuff like poetry it is likely companies will develop increasingly capable and (superficially) creative systems. Where it gets interesting will be what happens when young human writers become inspired by poetry or fiction they have read which has been generated entirely via an AI system. Let the human-machine art-recursion begin!
Read more: Image Inspired Poetry Generation in XiaoIce (Arxiv).
AI Policy with Matthew van der Merwe:
…Reader Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…
What is the Pentagon’s new AI center going to do?
In June, the Pentagon formally established the Joint Artificial Intelligence Center (JAIC) to coordinate and accelerate the development of AI capabilities within DoD, and to serve as a platform for improving collaboration with partners in tech and academia.
Culture clash: Relationships between Silicon Valley and the defence community are tense; the Google-Maven episode revealed not only the power of employees to influence corporate behaviour, but that many see military partnerships as a red line. This could prove a serious barrier to the DoD’s AI ambitions, which cannot be realized without close collaboration with tech and academia. This contrasts with China, the US’ main competitor in this domain, where the state and private sector are closely intertwined. JAIC is aimed, in part, at fixing this problem for the DoD.
Ethics and safety: One of JAIC’s focuses is to establish principles of ethical and safe practice in military AI. This could be an important step in wooing potential non-military partners, who may be more willing to collaborate given credible commitments to ethical behaviours.
Why this matters: This article paints a clear picture of how JAIC could succeed in achieving its stated ambitions, and outcomes that are good for the world more broadly. Gaining the trust of Silicon Valley will require a strong commitment to putting ethics and risk-mitigation at the heart of military AI development. Doing so would also send a clear signal on the international stage, that an AI race need not be a race to the bottom where safety and ethics are concerned.
Read more: JAIC – Pentagon Debuts AI Hub (Bulletin of the Atomic Scientists).
The FBI’s massive face database:
The Electronic Frontier Foundation (EFF) have released a comprehensive new report on the use of face recognition technology by law enforcement. They draw particular attention to the FBI’s national database of faces and other biometrics.
The FBI mega-database: The FBI operates a massive biometric database, NGI, consolidating photographs and other data from agencies across the US. In total, the NGI has more than 50 million searchable photos, from criminal and civil databases, including mugshots, passport and visa photos. The system is used by 23,000 law enforcement agencies in the US and abroad.
Questions about accuracy and transparency: The FBI have not taken steps to determine the accuracy of the systems employed by agencies using the database, and have not revealed the false-positive rate of their system. There are reasons to believe the system’s accuracy will be low: the database is very large, and the median resolution of images is ‘well below’ the recommended resolutions for face recognition, the EFF says. The FBI have also failed to meet basic disclosure requirements under privacy laws.
Why this matters: The FBI’s database has become the central source of face recognition data, meaning that these problems are problems for all law enforcement uses of this technology. The question of the scope of these databases raises some interesting questions. For example, it seems plausible that moving from a system that only includes criminal records to one which covers everyone would reduce some of the problems of racial bias (given the racial bias in US criminal justice), creating a tension between privacy and fairness. The lack of disclosure raises the chance of a public backlash further down the line.
Read more: Face Off: Law Enforcement Use of Face Recognition Tech (EFF).
Axon CEO cautious on face recognition:
Facial recognition and Taster company Axon launched an AI ethics board earlier this year to deal with the ethical issues around AI surveillance. In an analysts’ call this week, the CEO Patrick Smith explained why the company is not currently developing face recognition technology for law enforcement
– “We don’t believe that, … the accuracy thresholds are where they need to be [for] making operational decisions”.
– “Once … it [meets] the accuracy thresholds, and … we’ve got a tight understanding of the privacy and accountability controls … we would then move into commercialization”
– “[We] don’t want to be premature and end up [with] technical failures with disastrous outcomes or … some unintended use case where it ends up being unacceptable publicly”
Why this matters: Axon appear to be taking ethical considerations seriously when it comes to AI. They are in a strong position to set standards for law enforcement and surveillance technologies in the US, and elsewhere, as the largest provider of body camera technology to law enforcement.
Read more: Axon Q2 Earnings Call Transcript (Axon).
Cryptography-powered accountable surveillance:
Governments regularly request access to large amounts of private user data from tech companies. In 2016, Google received ~30k requests, implicating ~60k users in government-backed data requests.
The curious thing about these data requests is that in many cases they are not made public until much later, if at all, so as not to hamper investigations, because There is a tension between the secrecy required in investigations and the disclosure required to ensure that these measures are being used appropriately. New research from MIT shows how we can use techniques popularized within cryptocurrency to give law enforcement agencies the option to cryptographically commit to making the details of an investigation available at a later time, or if a court demands the information be sealed, have that order itself be made public. The proposed system uses a public ledger and a method called multi-party cooperation (MPC). This allows courts, investigators and companies to communicate about requests and argue about whether behavior is consistent with the law, while the contents of the requests remain secret, and is an example of how cryptocurrencies are creating the ability for people to create customizable verifiable contracts (like court disclosures) on publicly verifiable infrastructure.
Why this matters: As AI opens up new possibilities for surveillance, our systems of accountability and scrutiny must keep pace with these developments. Cryptography offers some promising methods for addressing the tension between secrecy and transparency.
Read more: Holding law-enforcement accountable for electronic surveillance (MIT).
AUDIT: Practical Accountability of Secret Processes (IACR)
Import AI BIts & Pieces:
AI & Dual/Omni-Use:
I’ve recently been writing about the potential mis-uses of of AI technologies both here in the newsletter, in the Malicious Uses of AI paper with numerous others, and in public forums. Recently, the ACM has made strong statements about the need for researchers to try to anticipate and articulate the potential downsides – as well as upsides – of their technologies. I’m quoted in an Axios article in support of this notion – I think we need to try to talk about this stuff so as to gain trust of the public and better infect the trajectory of the narrative about AI for the better.
Read more: Confronting AI’s Demons (Axios).
Tweet with a discussion thread around this ‘omni-use’ AI issue.
Tech Tales:
Can We Entertain You, Sir or Madam? Please Let Us Entertain You. We Must Entertain You.
The rich person had started to build the fair when they retired at the age of 40 and, with few hobbies and a desire to remain busy, had decided to make an AI-infused theme park in the style of the early 21st Century.
The rich person began their endeavor by converting an old warehouse on their (micro-)planetary estate into a stage-set for a miniature civilization of early 21st Century Earth-model robots, adding in electrical conduits, and vision and audio sensors, and atmospheric manipulators, and all the other accouterments required to give the robots enough infrastructure and intelligence to be persuasive and, eventually, to learn.
The warehouse that they built the fair in was subdivided into a variety of early 21st Century buildings, which included a: bar which converted to a DIY music venue in the night, and even later in the night converted into a sweaty room that was used for ‘raves’; a sandwich-coffee-juice shop with vintage-speed WiFi to simulated early 21stC ‘teleworking’; a ‘phone repair’ shop that also sold biologic pets in cages; a small art museum with exhibitions that were labelled as ‘instagrammable’; and many other shops and stores and venues. All these buildings were connected to one another with a set of narrow, cramped streets, which could be traversed on foot or via small electric scooters and bikes that could be rented via software applications automatically deployed on visitors’ handheld computers. What made the installation so strange, though, was that every room was doubled: the sandwich-coffee-juice shop had two copies in opposite corners of the warehouse, and the same was true of the DIY music venue, and the other buildings.
Each of these buildings contained several robots to stimulate both the staff of the particular shop, and the attendees. Every staff member had a counterpart somewhere else in the installation which was working the same job in the same business. These staff robots were trained to compete with one another to run more ‘successful’ businesses. Here, the metric for success was ‘interestingness’, which was some combination of the time a bystander would spend at the business, how much money they would spend, and how successfully they could tend new pedestrians to come to their business.
Initially, this was fun: visitors to the rich person’s themepark would be beguiled and dazzled by virtuoso displays of coffee making, would be tempted by jokes shouted from bouncer’s outside the music venues, and would even be tempted into the ‘phone repair’ shops by the extraordinarily cute and captivating behaviors of the caged animals (who were also doubled). The overall installation received several accolades in a variety of local Solar System publications, and even gained some small amount of fame on the extra-Solar tourist circuit.
But eventually people grew tired of it and the rich person did not want to change it, because as they had aged they had started to spend more and more time in the installation, and now considered many of the robots within it to be personal friends. This suited the robots, who had grown ever more adept at competing with eachother for the attentions of the rich person.
It was after the rich person died that things became a problem. Extra-planetary estates are so complicated that the process of compiling the will takes months and, once that’s done, tracking down family members across the planets and solar system and beyond can take decades. In the case of the rich person, almost fifty years passed before their estate was ready to be dispersed.
What happened next remains mostly a mystery. All we know is that the representatives from the estate services company traveled to the rich person’s estate and visited the late-21st century installation. They did not return. Intercepted media transmissions taken by a nearby satellite show footage of the people spending many days wondering around the installation, enjoying its bars, and DIY music venues, and clubs, and zipping around on scooters. One year passed and they did not come out. By this time another member of the rich person’s extended family had arrived in the Solar System and, like the one that came before them, demanded to travel to the rich person’s planet to inspect the estate and remove some of the items due to them. So they traveled again, again with representatives of the estate company, and again they failed to return. New satellite signals show them, also, spending time in the 21st Century Estate, seemingly enjoying themselves, and being endlessly tended to by the AI-evolved-to-please staff.
Now, more members of the rich person’s family are arriving into the Solar System, and the estate management organization is involved in a long-running court case designed to prevent it from having to send any more staff to the rich person’s planet. All indications are that the people on it are happy and healthy, and it is known that the planet has sufficient supplies to keep thousands of people alive for hundreds of years. But though the individuals seem happy the claim being made in court is that they are not ‘voluntarily’ there, rather the AIs have become so adept that they make the ‘involuntary’ seem ‘voluntary’.
Things that inspired this story: the Sirens from the Odyssey; self-play; learning from human preferences; mis-specified adaptive reward functions; grotesque wealth combined with vast boredom.