Import AI 142: Import AI 142: Berkeley spawns cheap ‘BLUE’ arm; Google trains neural nets to prove math theorems; seven questions about GANs

Google reveals HOList, a platform for doing theorem proving research with deep learning-based methods:
…In the future, perhaps more math theorems will be proved by AI systems than humans…
Researchers with Google want to develop and test AI systems that can learn to solve mathematical theorems, so have made tweaks to theorem proving software to make it easier for AI systems to interface with. In addition, they’ve created a new theorem proving benchmark to spur development in this part of AI.

HOL List: The software they base their system on is called HOL Light. For this project, they develop “an instrumented, pre-packaged version of HOL Light that can be used as a large scale distributed environment of reinforcement learning for practical theorem proving using our new, well-defined, stable Python API”. This software ships with 41 “tactics” which are basically algorithms to use to help prove math theorems.

Benchmarks: The researchers have also released a new benchmark on HOL Light, and they hope this will “enable research and measuring progress of AI driven theorem proving in large theories”. The benchmarks are initially designed to measure performance on a few tasks, including: predicting the same methodologies used by humans to create a proof; and trying to prove certain subgoals or aspects of proofs without access to full information.

DeepHOL: They design a neural network-based theorem prover called DeepHOL which tries to concurrently encode the goals and premises while generating a proof. “In essence, we propose a hybrid architecture that both predicts the correct tactic to be applied, as well as rank the premise parameters required for meaningful application of tactics”. They test out a variety of different neural network-based approaches within this overall architecture and train them via reinforcement learning, with the best system able to prove 58% of the proofs in the training set – no slam-dunk, but very encouraging considering these are learning-based methods.

Why this matters: Theorem proving feels like a very promising way to test the capabilities of increasingly advanced machines, especially if we’re able to develop systems that start to generate new proofs. This would be a clear validation of the ability for AI systems to create novel scientific insights in a specific domain, and I suspect would give us better intuitions about AI’s ability to transform science more generally as well.  “We hope that our initial effort fosters collaboration and paves the way for strong and practical AI systems that can learn to reason efficiently in large formal theories,” they write.
  Read more: HOList: An Environment for Machine Learning of Higher-Order Theorem Proving (Extended Version).

#####################################################

Think GANs are interesting? Here are seven underexplored questions:
…Googler searches for the things we know we don’t know…
Generative adversarial networks have become a mainstay component of recent AI research given their utility in creative applications, where you need to teach a neural network about some data well enough that it can generate synthetic data that looks similar to the source, whether videos or images or audio.

But GANs are quite poorly understood, so researcher Augustus Odena has published an essay on Distill listing seven open questions about GANs.

The seven questions: These are:
– What are the trade-offs between GANs and other generative models?
– What sorts of distributions can GANs model?
– How can we scale GANs beyond image synthesis?
– What can we say about the global convergence of the training dynamics?
– How should we evaluate GANs and when should we use them?
– How does GAN training scale with batch size?
– What is the relationship between GANs and adversarial examples?

Why this matters: Better understanding how to answer these questions will help researchers better understand the technology, which will allow us to make better predictions about economics costs of training GAN systems, likely failures to expect, and point to future directions for work. It’s refreshing to see researchers publish exclusively about the problems and questions related to a technique, and I hope to see more scholarship like this.
  Read more: Open Questions about Generative Adversarial Networks (Distill).

#####################################################

Human doctors get better with aid of AI-based diagnosis system:
…MRNet dataset, competition, and research, should spur research into aiding clinicians with pre-trained medical-problem-spotting neural nets…
Stanford University researchers have developed a neural network-based technique to assess Knee MR scans for abnormalities and a few specific diagnoses (eg, ligament tears). They find that clinicians which have access to this model have a lower rate of mistaken diagnoses than those without access to it. When using this model “for every 100 healthy patients, ~5 are saved from being unnecessarily considered for surgery,” they write.

MRNet dataset: Along with their research, they’ve also released an underlying dataset: MRNet, a collection of 1,370 knee MRI exams performed at Stanford University Medical Center, spread across normal and abnormal knees.

Competition: “We are hosting a competition to encourage others to develop models for automated interpretation of knee MRs,” the researchers write. “Our test set (called internal validation set in the paper) has its ground truth set using the majority vote of 3 practicing board-certified MSK radiologists”.

Why this matters: Many AI systems are going to augment rather than substitute for human skills, and I expect this to be especially frequent in medicine, where we can expect to give clinicians more and more AI advisor systems to use when making diagnoses. In addition, datasets are crucial to the development of more sophisticated medical AI systems and competitions tend to drive attention towards a specific problem – so the release of both in addition to the paper should spur research in this area.
  Read more and register to download the dataset here: MRNet Dataset (Stanford ML Group).
  Read more about the underlying research: MRNet: Deep-learning-assisted diagnosis for knee magnetic resonance imaging (Stanford ML Group).

#####################################################

As AI hype fades, applications arrive:
…Now we’ve got to superhuman performance we need to work on human-computer interaction…
Jeffrey Bigham, a human-computer interaction researcher, thinks that AI is heading into an era of less hype – and that’s a good thing. This ‘AI autumn’ is a natural successor to the period we’re currently in, since we’re moving from the development to the deployment phase of many AI technologies.

Goodbye hype, hello applications:
“Hype deflates when humans are considered,” Bigham writes. “Self-driving cars seem much less possible when you think about all the things human drivers do in addition to the driving on well-known roads in good lighting conditions. They find passengers, they get gas, they fix the car sometimes, they make sure drunk passengers aren’t in danger, they walk elderly passengers into the hospital, etc”.

Why this matters: “If hype is at the rapidly melting tip of the iceberg, then the great human-centered applied work is the super large mass floating underneath supporting everything,” he writes. And, as most people know, working with humans is challenging and endlessly surprising, so the true test of AI capabilities will be to first reach human parity at certain things, then be deployed in ways that make sense to humans.
  Read more: The Coming AI Autumn (Jeffrey Bigham blog).

#####################################################

Berkeley researchers design BLUE, a (deliberately) cheap robot for AI research:
…BLUE promises human-like capabilities in a low-cost platform…
Berkeley researchers have developed the Berkeley robot for Learning in Unstructured Environments (BLUE), robotic arm designed for AI research and deployments. The robot was developed by a team of more than 15 researchers over the last three years. It is designed to cost around ~$5000 to build when built in batches of 1,500 units, and many design-choices have been constrained by the goal of making it both cheap to build and safe to operate around humans.

The robot can be used to train AI approaches on a cheap robotics platform, and works with teleoperation systems so it can be trained directly from human behaviors.

BLUE has seven degrees of freedom, distributed across three joints in the shoulder, one in the elbow, and three in the wrist. When designing BLUE, the researchers optimized for a “useful” robot – this required sufficient precision to be human-like (in this case, it can move with a precision of around 4 millimeters, which is far less than ultra-precise industrial robots) cheap enough to be manufactured at scale, and capable of a general class of manipulation tasks in unconstrained (aka, the opposite of a factory production line) environments.

Low-cost design: The BLUE robots use quasi-direct drive actuation (QDD), an approach that has most recently become popular in legged locomotion systems. They also designed a cheap, parallel jaw gripper (“we chose parallel jaws for their predictability, robustness, simplicity (low cost), and ease of simulation”).

Why this matters: In recent years, techniques based on deep learning have started to give robots unprecedented perception and manipulation capabilities. One thing that has held back deployment, though, is the absence of cheap robot platforms which researchers can experiment with. BLUE seems to have the nice properties of being built by researchers to reflect AI needs, while also being designed to be manufactured at scale. “Next up for the project is continued stress testing and ramping manufacturing,” they write. “The goal is to get these affordable robots into as many researchers’ hands as possible”.
  Read more: Project Blue (Berkeley website).
  Read the research paper: Quasi-Direct Drive for Low-Cost Compliant Robotic Manipulation (Arxiv).

#####################################################

Network architecture search gets more efficient with Single-Path NAS:
…Smarter search techniques lower the computational costs of AI-augmented search…
Researchers with Carnegie Mellon University, Microsoft, and the Harbin Institute of Technology have figured out a more efficient way to get computers to learn how to design AI systems for deployment on phones.

The approach, called Single-Path NAS, makes it more efficient to spend compute to search for more sophisticated AI models. The key technical trick is, at each layer of the network, to search over “an over-parameterized ‘superkernel’ in each ConvNet layer’. What this means in practice is the researchers have made it more efficient to rapidly iterate through different types of AI component at each layer of the network, making their approach more efficient than other NAS techniques.
  “Without having to choose among different paths/operations as in multi-path methods, we instead solve the NAS problem as finding which subset of kernel weights to use in each ConvNet layer”, they explain.

Hardware-aware: The researchers add a constraint during training that lets them optimize for the latency of the resulting architecture – this lets them automatically search for an architecture that best maps to the underlying hardware capabilities.

Testing, testing, testing: They test their approach on a Pixel 1 phone – a widely-used premium Android phone, developed by Google. They benchmark by using Single-Path NAS to design networks for image classification on ImageNet and compare it against state-of-the-art systems designed by human researchers as well as ones discovered via other neural architecture search techniques.  

  Results: Their approach gets an accuracy of 74.96%, which they claim is “the new state-of-the-art ImageNet accuracy among hardware-efficient NAS methods. Their system also takes about 8 epoches to train, compared to hundreds (or thousands) for other methods.

Why this matters: Being able to offload the cost of designing new network architectures from human designers to machine designers has the potential to further accelerate AI research progress and AI application deployment. This sort of technique fits into the broader trend of the industrialization of AI (which has been covered in this newsletter in a few different ways) – worth noting that the authors of this technique are spread across multiple companies and institutions, from CMU, to Microsoft, to the Harbin Institute of Technology in Harbin, China.
  Read more: Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 hours (Arxiv).
  Get the code: Single-Path-NAS (GitHub).

#####################################################

How should the Department of Defense use Artificial Intelligence? Tell them your thoughts:
Can the DoD come up with principles for how it uses AI? There are ways you can help…
The Defense Innovation Board, an advisory committee to the Secretary of Defense, is trying to craft a set of AI principles that the DoD can use as it integrates AI technology into its systems – and it wants help from the world.

  Share your thoughts at Stanford this month: If you’re in the Bay Area, you may want to come to ‘The Ethical and Responsible Use of Artificial Intelligence for the Department of Defense (DoD)” at Stanford University on April 25th 2019, where you can give thoughts on how the DoD may want to consider using (or not using) AI. You can also submit public comments online.

  Why this matters: Military organizations around the world are adopting AI technology, and it’s unusual to see a military organization publicly claim to be so interested in the views of people outside its own bureaucracy. I think it’s worth people submitting thoughts here (especially if they’re constructively critical), as this will provide us evidence for how or if the general public can guide the ways in which these organizations use AI.
  Read more about the AI Principles project here (DiB website).

#####################################################

OpenAI Bits & Pieces:

OpenAI Five wins matches against pros, cooperates with humans:
  This weekend, OpenAI’s neural network-based system for playing Dota 2, OpenAI Five, beat a top professional team in San Francisco. Additionally, we showed how the same system can play alongside humans.
  OpenAI Five Arena: We also announced OpenAI Five Arena, a website which people can use to play with or against our Dota 2 agents. Sign up via: arena.openai.com. Wish us luck as we try to play against the entire internet next week.

#####################################################

Tech Tales:

The Big Art Machine

The Big Art Machine, or as everyone called it, The BAM, was a robot about thirty feet tall and a hundred and fifty feet long. It looked kind of like a huge, metal centipede, except instead of having a hundred legs, it had a hundred far more sophisticated appendages – each able to manipulate the world around it, and change its own dimensions through a complicated series of interlocking, metal plates.

The BAM worked like this: you and a hundred or so of your friends would pile into the machine and each of you would sit in a small, sealed room housed at the intersection between each of its hundred appendages and its main body. Each of these rooms contained a padded chair, and each chair came with a little swing-out screen, and on this screen you’d see two movie clips of how your appendage could move – you’d pick whichever one you preferred, then it’d show you another one, and so on.

The BAM was a big AI thing, essentially. Each of the limbs started out dumb and uncoordinated, and at first people would just focus on calibrating their own appendage, then they’d teach their own appendage to perhaps strike the ground, or try and pull something forward, or so on. There were no rules. Mostly, people would try to get the BAM to walk or – very, very occasionally – run. After enough calibration, the operators of each of the appendages would get a second set of movies on their screen – this time, movies of how their appendage plus another appendage elsewhere on the BAM might move together. In this way, the crowd would over time select harmonious movements, built out of idiosyncratic underlays.

So hopefully this gives you an idea for how difficult it was to get the BAM to do anything. If you’ve ever hosted a party for a hundred people before and tried to get them to agree on something – music, a drinking game, even just listening to one person give a speech – then you’ll know how difficult getting the BAM to do anything is. Which is why we were so surprised that one day a team of people got into the BAM and, after the first few hours of aimless clanking and probing, it started to walk, then it started to run, and then… we lost it.

Some people say that they taught it to swim, and took it into the ocean. Others say that it’s not beyond the realms of feasibility that it was possible to teach the thing to fly – though the coordination required and the time it would take to explore its way to such a particular combination of movements was so lengthy that many thought it impossible. Now, we tell stories about the BAM as a lesson in collective action and calibration, and children when they learn about it in school immediately dream of building machines in which thousands of people work together, calibrating around some purpose that comes from personal chaos.

Things that inspired this story: Learning from human preferences; heterogeneous data; the beautiful and near-endless variety of ways in which humans approach problems; teamwork; coordination; inverse reinforcement learning; robotics, generative models.