Import AI 163: Oxford researchers release self-driving car dataset; the rumors are true – non-experts can use AI; plus, a meta-learning robot therapist!

by Jack Clark

How badly can reality mess with object detection algorithms? A lot, it turns out:
…Want to stresstest your streetsign object detection system? Use CURE-TSD-Real…
“The new system-breaking tests have arrived!” I imagine a researcher at a self-driving car company shouting, upon seeing the release of ‘CURE-TSD-Real’, a new dataset developed by researchers at Georgia Tech. CURE-TSD-Real collects footage of streetsigns, then algorithmically augments the footage to generate a variety of different, challenging examples to test systems against.

CURE-TSD-Real ingredients: The dataset contains 2,989 videos distinct containing around ~650,000 annotated signs. The dataset is also diverse – relative to other datasets – containing a range of traffic and perception conditions including rain, snow, shadow, haze, illumination, decolorization, blur, noise, codec error, dirty lens, occlusion, and overcast. The videos were collected in Belgium. The dataset is arranged into ‘levels’, where higher levels correlate to tests where a larger proportion of the images contain distortions, and so on.

Breaking baselines with CURE-TSD-Real: In tests, the researchers show that the presence of these tricky conditions can reduce performance by anywhere between 20% and 60%, depending on the evaluation criteria being used. Occlusions like shadows resulted in relatively little degradation (around 16%), whereas occlusions like codec errors and exposures could damage performance by as much as 80%.

Why this matters: One of the best ways to understand something is to break it, and datasets like CURE-TSC-Real make it easier than ever for researchers to test their systems against challenging systems, then observe how they do.
   Get the data from here (official CURE-TSD GitHub).
   Read more: Traffic Sign Detection under Challenging Conditions: A Deeper Look Into Performance Variations and Spectral Characteristics (Arxiv).

####################################################

What it takes to trick a machine learning classifier:
…MLSEC competition winner explains what they did and how they did it…
If we start deploying large amounts of machine learning into computer security, how might hackers respond? At this year’s ‘DEFCON’ hacking conference, the ‘MLSEC’ (ImportAI #159) competition challenged hackers to work out how to smuggle 50 distinct malicious executables past machine learning classifiers. Now, the winner of the competition has written a blog post explaining how they won.

What it takes to defeat a machine learning classifier: It’s worth reading the post in full, but one of the particularly nice exploits is that they took a look at benign executable files and “found a large chunk of strings which appeared to contain Microsoft’s End User License Agreement (EULA). This is a nice example of how many machine learning exploits work – find something in that data that causes the system to consistently predict one thing, and then find a way to emphasize this data.

Why this matters: Competitions like MLSEC generate evidence about the effectiveness of various machine learning exploits and defenses; writeups from competition winners are a neat way to understand the tools people use in this domain, and to develop intuitions about how computer security might work in the future.
   Read more: Evading Machine Learning Malware Classifiers (Medium).

####################################################

Can medical professionals use AI without needing to code?
…Study suggests our tools are good enough for non-expert use, but our medical datasets are lacking…
AI is getting more capable and is starting to impact society – that’s the message I write here in one form or another each week. But is it useful to have powerful technology if no one can use it? That’s a problem I sometimes worry about; though the tech is progressing rapidly, it’s still really hard to use for a large number of people, and this makes it harder for us as a society to use the technology to maximum social benefit. Now, new research from researchers affiliated with the National Health Service (NHS) and DeepMind, shows how non-AI-expert medical professionals can use AI tools in their work.

What they did: The research centers on the use of Google’s ‘Cloud AutoML’ service, which is basically a nice UI sitting on top of some fancy neural architecture search technology, theoretically letting people upload a dataset, fiddle with some tuning dials, and let the AI optimize its own architecture for the task. Is it really that easy? It might be: the study focuses on two physicians “with no previous coding or machine learning experience” who spent around 10 hours studying basic shell script programming, the Google Cloud AutoML online documentation and GUI, and preparing the five input datasets they’d use in tests. They also compared the models developed via Google Cloud AutoML with strong AI baselines derived from medical literature. Four out of five models “showed comparable discriminative performance and diagnostic properties to state-of-the-art performing deep learning algorithms”, they wrote.

Medical data is harder than you think: “The quality of the open-access datasets (including insufficient information about patient flow and demographics) and the absence of measurement for precision, such as confidence intervals, constituted the major limitations of this study”.

Why this matters: For AI to change society, society needs to be able to utilize AI systems; studies like this show that we’re starting to develop sufficiently powerful and easy-to-use systems that non-experts can apply the technology in their own domains. However, the availability of things like high-quality, open datasets could hold back broader adoption of these tools – it’s not useful to have an easy-to-use tool if you lack the ingredients to make exquisite things with it.
   Read more: Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study (Elsevier).

####################################################

Radar + Self-Driving Cars:
…Addition to Oxford RobotCar Dataset gives academics more data to play with…
Oxford University researchers have added radar data to a self-driving car dataset. The data was gathered using a Navtech CTS350-X scanning radar via 32 traversals of (roughly) the same route around Oxford UK. The data was gathered under different traffic, weather, and lighting conditions in January, 2019. Radar isn’t used as much in self-driving car research as data gathered via traditional cameras and/or LIDAR; “although this modality has received relatively little attention in this context, we anticipate that this release will help foster discussion of its uses within the community and encourage new and interesting areas of research not possible before,” they write. 

Why this matters: Data helps to fuel research, and different types of data are especially useful to researchers when they can be studied in conjunction with one another. Multi-modal datasets like the Oxford Robotcar Dataset will become increasingly important to AI research.
   Read more: The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset (Arxiv).
   Get the data from here (official Oxford RobotCar Dataset site).

####################################################

Testing language engines with TABFACT:
…Can your system work out what is entailed and what is refuted by Wikipedia data?…
TABFACT consists of 118,439 annotated statements in reference to 16,621 Wikipedia tables. The statements can be ones that are entailed by the underlying dataset (a Wikipedia table) or refuted by it. To get a sense of what TABFACT data might look like, imagine a Wikipedia table that lists the particulars of Dogs that have won a dog beauty competition – in TABFACT, this table would be accompanied with some statements that are entailed by the table (e.g., Bonzo took first place) and statements that are refuted by it (e.g., Bonzo took third place). TABFACT is split into ‘simple’ and ‘complex’ statements, giving researchers a two-tier curriculum to test their systems against.

Two ways to attack TABFACT: So, how can we develop systems to do well on challenges like TABFACT? Here, the researchers pursue a couple of strategies: Table-BERT, which is basically an off-the-shelf BERT pre-trained model, fine-tuned against TABFACT data; and LPA (Latent Program Algorithm), which is a program synthesis approach.

Humans VS Machines VS TABFACT: In tests, the researchers show humans obtain an accuracy of around 92% when asked to correctly classify TabFACT statements, comparing to 50% (random guessing), and around 68% for both Table-BERT and LPA.

Why this matters: It’s interesting that Table-BERT and LPA obtain similar scores, given that one is basically a big blob of generic neural stuff (a pre-trained language model model) that is lightly retrained against the target dataset (TABFACT), while LPA is a much more sophisticated system with much more structure encoded into it by its human designers. I wonder how far pre-trained language models might go in domains like this, and how well they ultimately might perform relative to hand-written systems like LPA?
   Read more: TabFact: A Large-scale Dataset for Table-based Fact Verification (Arxiv).
   Get the TABFACT data and code (official TABFACT GitHub repository).

####################################################

Detecting great apes with a three-module neural net:
…Spotting apes with cameras accompanied by neural net sensors…
Researchers with the University of Bristol have created a AI system to automatically spot and analyze great apes in the wild, presaging a future where semi-autonomous classifiers observe and analyze the world.

How it works: To detect the gorillas, the researchers build a system consisting of three main components – a backbone feature pyramid network, and a temporal context module and a spatial context module. “Each of these modules is driven by a self-attention mechanism tasked to learn how to emphasize most relevant elements of a feature given its context,” they explain. “In particular, these attention components are effective in learning how to ‘blend’ spatially and temporally distributed visual cues in order to reconstruct object locations under dispersed partial information; be that due to occlusion or lighting”.

Testing: They test their system against 500 videos of great apes, consisting of 180,000 frames in total. These videos include “significant partial occlusions, challenging lighting, dynamic backgrounds, and natural camouflage effects,” the authors explain. They show that baselines which use residual networks (ResNets) get around 80% accuracy, and the addition of the temporal and spatial modules leads to a significant boost in performance to a little over 90% accuracy. Additionally, in qualitative evaluations the researchers “found that the SCM+TCM setup consistently improves detection robustness compared to baselines in such cases”.

Why this matters: AI is going to let us watch and analyze the planet. I’m optimistic that as we work out how to make it cheaper and easier for people to automatically monitor things like wildlife populations, we’ll be able to produce more data to motivate people to preserve our ecosystem(s). I think one of the ‘grand opportunities’ of large-scale AI development is the creation of a planet-scale ‘sense&respond’ infrastructure for wildlife analysis and protection.
   Read more: Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending (Arxiv).

####################################################

Tech Tales:

The Meta-Learning Therapist.

“Why don’t you just imagine yourself jumping out of the window?”
“How would that help? I’m getting divorced, I’m not suicidal!”
“I apologize, I’m still calibrating. Are you eating and sleeping well?”
“I’m eating a lot of fast food, but I’m getting regular meals. The sleep is okay.”
“That is great to hear. Do you dream of snakes?”
“No, sometimes I dream of my wife.”
“Does your wife dream about snakes?”
“If she did, what would that tell you?”
“I apologize, I’m still calibrating. What do you think your wife dreams about?”
“I think she has a lot of dreams that don’t include me.”
“And how does that make you feel?”
“It makes me feel like it’s more likely she is going to divorce me.”
“How do you feel about divorce? Some people find it quite liberating.”
“I’m sure the ones that find it liberating are the ones that are asking for the divorce. I’m not asking for it, so I don’t feel good about it.”
“And you came here because…?”
“My doctor prescribed me a session. I haven’t ever had a human therapist. I don’t think I’d want one. I figured – why not?”
“And how are you feeling about it?”
“I’m more interested in how you are feeling about it…”
“…”
“…that’s a question. Will you answer?”
“Yes. I feel like I understand you better than I did at the start of the conversation. I think we’re ready to begin our session.”
“We hadn’t started?”
“I was calibrating. I think you’ll find our conversation from this point on to be much more satisfying. Now, please tell me about why you think your partner wishes to divorce you.”
“Well, it started a few years ago…”

Thanks to Joshua Achiam at OpenAI for the lunchtime conversation that inspired this story!
Things that inspired this story: Eliza; meta-learning; one-shot adaptation; memory buffers; decentralized, individualized learning with strings attached; psychiatry; our peculiar tolerance ofr being asked the ‘wrong’ questions in pursuit of the right ones.