Import AI 118: AirBnB splices neural net into its search engine; simulating robots that touch with UnrealROX; and how long it takes to build a quadcopter from scratch
by Jack Clark
Building a quadcopter from scratch in ten weeks:
…Modeling the drone ecosystem by what it takes to build one…
The University of California at San Diego recently ran a course where students got the chance to design, build, and program their own drones. A writeup of the paper outlines how the course is structured and gives us a sense of what it takes to build a drone today.
Four easy pieces: The course breaks building the drones into four phases: designing the PCB, implementing the flight control software, assembling the PCB, and getting the quadcopter flying. Each of this phases has numerous discrete steps which are detailed in the report. One of the nice things about the curriculum is the focus on the cost of errors: “Students ‘pay’ for design reviews (by course staff or QuadLint) with points deduced from their lab grade,” they write. “This incentivizes them to find and fix problems themselves by inspection rather than relying on QuadLint or the staff”.
The surprising difficulty of drone software: Building the flight controller software for the drone proves to be one of the most challenging aspects of the research because of the numerous potential causes for bugs, so root cause analysis can be challenging.
Teaching tools: While developing the course the instructors noticed that they were spending a lot of time checking and evaluating PCB designs for correctness, so they designed their own program called ‘QuadLint’ to try to auto-analyze and grade these submissions. “QuadLint is, we believe, the first autograder that checks specific design requirements for PCB designs,” they write.
Costs: The report includes some interesting details on the cost of these low-powered drones, with the quadcopter itself costing about $35 per PCB plus $40 for the components. Currently, the most expensive component of the course is the remote ($150) and for the next course the teachers are evaluating cheaper options.
Small scale: The quadcopters all use a PCB to host their electronics and serve as an airframe. They measure less than 10 cm on a side and are suitable for flight indoors over short distances. “The motors are moderately powerful, “brushed” electric motors powered by a small lithium-polymer (LiPo) battery, and we use small, plastic propellers. The quadcopters are easy to operate safely, and a blow from the propeller at full speed is painful but not particularly dangerous. Students wear eye protection around their flying quadcopters.”
Why it matters: In paper notes that the ‘killer apps’ of the future “will lie at the intersection of hardware, software, sensing, robotics, and/or wireless communications”. This seems true – especially when we look at the chance for major uptake from the success of companies like DJI and the possibility for unit economics driving the price down. Therefore, tracking and measuring the cost and ease with which people can build and assemble them out of (hard to track, commodity) components gives us better intuitions about this aspect of drones+security. While the hardware and software is under-powered and somewhat pricey today it won’t stay that way for long.
Read more: Trial by Flyer: Building Quadcopters From Scratch in a Ten-Week Capstone Course (Arxiv).
Amazon tries to make Alexa smarter via richer conversational data:
…Who needs AI breakthroughs when you’ve got a BiLSTM, lots of data, and patience?…
Amazon researchers are trying to give personal assistants like Alexa the ability to have long-term, conversations about specific topics. The (rather unsurprising) finding they make in a new research paper is is that you can “extend previous work on neural topic classification and unsupervised topic keyword detection by incorporating conversational context and dialog act features”, yielding personal assistants capable of longer and more coherent conversations than their forebears, if you can afford to annotate the data.
Data used: The researchers used data collected during the 2017 ‘Alexa Prize’ competition, which consists of over 100,000 utterances containing interactions between users and chatbots. They augmented this data by classifying the topic for each utterance into one of 12 categories (eg: politics, fashion, science & technology, etc), and also trying to classify the goal of the user or chatbot (eg: clarification, information request, topic switch, etc). They also asked other annotators to rank every single chatbot response with metrics relating to how comprehensible it was, how relevant the response was, how interesting it was, and whether a user might want to continue the conversation with the bot.
Baselines and BiLSTMs: The researchers implement two baselines (DAN, based on a bag-of-words neural model; ADAN, which is DAN extend with attention), and then develop two versions of a bidirectional LSTM (BiLSTM) system, where one uses context from the annotated dataset and the other doesn’t. They then evaluate all these methods by testing their baselines (which contain only the current utterance) against systems which incorporate context, systems which incorporate data, and systems which incorporate both context and data. The results show that a BiLSTM fed with context in sequence does almost twice as well as a baseline ADAN system that uses context and dialog, and almost 25% better than a DAN fed with both context and dialog.
Why it matters: The results indicate that – if a developer can afford the labeling cost – it’s possible to augment language interaction datasets with additional information about context and topic to create more powerful systems, which seems to imply that in the language space we can expect to see large companies invest in teams of people to not just transcribe and label text at a basic level, but also perform more elaborate meta-classifications as well. The industrialization of deep learning continues!
Read more: Contextual Topic Modeling For Dialog Systems (Arxiv).
Why AI won’t be efficiently solving a 2D gridworld quest soon:
…Want humans to be able to train AIs? The key is curriculum learning and interactive learning, says BabyAI creators…
Researchers with the Montreal Institute for Learning Algorithms (MILA) have designed a free tool called BabyAI to let them test AI systems’ ability to learn generalizable skills from curriculums of tasks set in an efficient 2D gridworld environment – and the results show that today’s AI algorithms display poor data efficiency and generalization at this sort of task.
Data efficiency: BabyAI uses gridworlds for its environment, which the researchers have written to be efficient enough that researchers can use the platform without needing access to vast pools of compute; the BabyAI environments can be run at up to 3,000 frames per second “on a modern multi-core laptop” and can also be integrated with OpenAI Gym).
A specific language: BabyAI uses “a comparatively small yet combinatorially rich subset of English” called Baby Language. This is meant to help researchers write increasingly sophisticated strings of instructions for agents, while keeping the state space from exploding too quickly.
Levels as a curriculum: BabyAI ships with 19 levels which increase in difficulty of both the environment, and the complexity of the language required to solve it. The levels test each agent on a variety of 13 different competencies, ranging from things like being able to unlock doors, navigating to locations, ignoring distractors placed into the environment, navigating mazes, and so on. The researchers also design a bot which can solve any of the levels using a variety of heuristics – this bot serves as a baseline against which to train a model.
So, are today’s AI techniques sophisticated enough to solve BabyAI? The researchers train an imitation learning-based baseline for each level and and assess how well it does. The systems are able to learn to perform basic tasks, but struggle to imitate the expert at tasks that require multiple actions to solve. One of the most intriguing parts of a paper is the analysis of the relative efficiency of systems trained via both imitation and from pure reinforcement learning, which shows that today’s algorithms are wildly inefficient at learning pretty much anything: simple tasks like learning to go to a red ball hidden within a map take 40,000-60,000 demos when using imitation learning, and around 453,000 to 470,000 when learning using reinforcement learning without an expert teacher to attempt to mimic. The researchers also show that using pre-training (where you learn on other tasks before attempting certain levels) does not yield particularly impressive performance, with pre-training yielding at most a 3X speedup.
Why it matters: Platforms like BabyAI give AI researchers fast, efficient tools to use when tackling hard research projects, while also highlighting the deficiency of many of today’s algorithms. The transfer learning results “suggest that current imitation learning and reinforcement learning methods scale and generalize poorly when it comes to learning tasks with a compositional structure,” they write. “An obvious direction of future research to find strategies to improve data efficiency of language learning.”
Get the code for BabyAI (GitHub).
Read more: BabyAI: First Steps Towards Grounded Language Learning with a Human In the Loop (Arxiv).
Simulating robots that touch and see in AAA-game quality detail:
…The new question AI researchers will ask: But Can It Simulate Crysis?…
Researchers with the 3D Perception Lab at the University of Alicante have designed UnrealROX, a high-fidelity simulator based on Unreal Engine 4, built for simulating and training AI agents embodied in (simulated) touch-sensitive robots.
Key ingredients: UnrealROX has the following main ingredients: a simulated grasping system that can be applied to a variety of finger configurations; routines for controlling robotic hands and bodies using commercial VR setups like the Oculus Rift and HTC Vive; a recorder to store full sequences from scenes; and customizable camera locations.
Drawback: The overall simulator can run at 90 frames-per-second, the researchers note. While this may sound impressive it’s not particularly useful for most AI research unless you can run it far faster than that (compare this with BabyAI, which runs at 3,000 FPS).
Simulated robots with simulated hands: UnrealROX ships with support for two robots: a simulated ”Pepper’ robot from company Aldebaran, and a spruced-up version of the mannequin that ships with UE4. Both of these robots have been designed with extensible, customizable grasping systems, letting them reach out and interact with the world around them. “The main idea of our grasping subsystem consists in manipulating and interacting with different objects, regardless of their geometry and pose.”
Simulators, what are they good for? UnrealROX may be of particular interest to researchers that need to create and record very specific sequences of behaviors on robots, or who wish to test the ability to learn useful policies from a relatively small amount of high-fidelity information. But it seems likely that the relative slowness of the simulator will make it difficult to use for most AI research.
Why it matters: The current proliferation of simulated environments represents a kind of simulation-boom in AI research that will eventually produce a cool historical archive of the many ways in which we might think robots could interact with each other and the world. Whether UnrealROX is used or not, it will contribute to this historical archive.
Read more: UnrealROX An eXtremely Photorealistic Virtual Realty Environment for Robotics Simulations and Synthetic Data Generation (Arxiv).
AirBnB augments main search engine with neural net, sees significant performance increase:
…The Industrialization of Deep Learning continues…
Researchers with home/apartment-rental service AirBNB have published details on how they transitioned AirBnB’s main listings search engine to a neural network-based system. The paper highlights how deploying AI systems in production is different to deploying AI systems in research. It also sees AirBnB follow Google, which in 2015 augmented its search engine with ‘RankBrain’, a neural network-based system that almost overnight became one of the most significant factors in selecting which search results to display to a user. “”This paper is targeted towards teams that have a machine learning system in place and are starting to think about neural networks (NNs),” the researchers write.
Motivation: “The very first implementation of search ranking was a manually crafted scoring function. Replacing the manual scoring function with a gradient boosted decision tree (GBDT) model gave one of the largest step improvements in homes bookings in Airbnb’s history,” the researchers write. This performance boost eventually plateaued, prompting them to implement neural network-based approaches to improve search further.
Keep it simple, (& stupid): One of the secrets about AI research is the gulf between frontier research and production use-cases, where researchers tend to prioritize novel approaches that work on small tasks, and industry and/or large-scale operators prioritize simple techniques that scale well. This fact is reflected in this research, where the researchers started work with a single layer neural net model, moved on to a more sophisticated system, then opted for a scale-up solution as their final product. “We were able to deprecate all that complexity by simply scaling the training data 10x and moving to a DNN with 2 hidden layers.”
Input features: For typical configurations of the network the researchers gave it 195 distinct input ‘features’ to learn about, which included properties of listings like price, amenities, historical booking count; as well as features from other smaller models.
Failure: The paper includes a quite comprehensive list of some of the ways in which the Airbnb researchers failed when trying to implement new neural network systems. Many of these failures are due to things like overfitting, or trying to architect too much complexity into certain parts of the system.
Results: AirBNB doesn’t reveal the specific quantitative performance boost as this would leak some proprietary commercial information, but does include a couple of graphs that shows that the usage of the 2-layer simple neural network leads to a very meaningful relative gain in the number of bookings made using the system, indicating that the neural net-infused search is presenting people with more relevant listings which they are more likely to book. “Overall, this represents one of the most impactful applications of machine learning at Airbnb.,” they write.
Why it matters: AirBNB’s adoption of deep learning for its main search engine further indicates that deep learning is well into its industrialization phase, where large companies adopt the technology and integrate it into their most important products. Every time we get a paper like this the chance of an ‘AI Winter’ decreases, as it creates another highly motivated commercial actor that will continue to invest in AI research and development, regardless of trends in government and/or defence funding.
Read more: Applying Deep Learning to AirBNB Search (Arxiv).
Read more: Google Turning Its Lucrative Web Search Over to AI Machines (Bloomberg News, 2015).
Refining low-quality web data with CurriculumNet:
…AI startup shows how to turn bad data into good data, with a multi-stage weakly supervised training scheme…
Researchers with Chinese computer vision startup Malong have released code and data for CurriculumNet, a technique to train deep neural networks on large amounts of data with variable annotations, collected from the internet. Approaches like this are useful if researchers don’t have access to a large, perfectly labeled dataset for their specific task. But the tradeoff is that the labels on datasets gathered in this way are far noisier than those from hand-built datasets, presenting researchers with the challenge of extracting enough signal from the noise to be able to train a useful network.
CurriculumNet: The researchers train their system on the WebVision database, which contains over 2,400,000 images with noisy labels. Their approach works by training an Inception_v2 model over the whole dataset, then studying the feature space which all the images are mapped into; CurriculumNet then sorts these images into clusters, then sorts each cluster these into three subsets according to how similar all the images in the set are to eachother in featurespace, with the intuition being that subsets with lots and lots of similar images will be easier to learn from than those which are very diverse. They then start to train a model over this where they start by using the subsets with similar image features, then mix in the noisier subsets. By iteratively learning a classifier from good labels, then adding in ones with noisier ones, the researchers say they are able to increase the generalization of their trained systems.
Testing: They test CurriculumNet on four benchmarks: WebVision, ImageNet, Clothing1M, and Food101. They find that systems trained using the largest amount of noisy data converge to higher accuracies than those trained without, seeing reductions in error of multiple percentage points on WebVision (“these improvements are significant on such a large-scale challenge,” they write). CurriculumNet gets state-of-the-art results for top-1 accuracy on WebVision, with performance increasing even further when they train on more data (such as combining ImageNet and WebVision).
Why it matters: Systems like CurriculumNet show how researchers can use poorly-labeled data, combined with clever training ideas, to increase the value of lower-quality data. Approaches like this can be viewed as being analogous to a clever refinement process applied when extracting a natural resources.
Read more: CurriculumNet: Weakly Supervized Learning from Large-Scale Web Images (Arxiv).
Get the trained models from Malong’s Github page.
[2025: Podcast interview with the inventor of GFY]
Reality Bites, So Change It.
Or: There Can Be Hope For Those of Us Who Were Alone And Those We Left Behind
My Father was struck by a truck and killed while riding his motorbike in the countryside; no cameras, no witnesses; he was alone. There was an investigation but no one was ever caught. So it goes.
At the funeral I told stories about the greatness of my Father and I helped people laugh and I helped people cry. But I could not help myself because I could not see his death. It was as though he opened a door and disappeared before walking through it and the door never closed again; a hole in the world.
I knew many people who had lost friends and parents to cancer or other illnesses and their stories were quite horrifying: black vomit before the end; skeletons with the faces of parents; tales of seeing a dying person when they didn’t know they were being watched and seeing rage and fear and anguish on their face. The retellings of so many bad jokes about not needing to pay electricity bills, wheezed out over hospital food.
I envied these people, because they all had a “goodbye story” – that last moment of connection. They had the moment when they held a hand, or stared at a chest as it heaved in one last breath, or confessed a great secret before the chance was gone. Even if they weren’t there at the last they had known it was coming.
I did not have my goodbye, or the foreshadowing of one. Imagine that.
So that is why I built Goodbye For You(™), or GFY for short. GFY is software that lets you simulate and spend the last few moments with a loved one. It requires data and effort and huge amounts of patience… but it works. And as AI technology improves, so does the ease of use and fidelity of GFY.
Of course, it is not quite real. There are artifacts: improbable flocks of birds, or leaves that don’t fall quite correctly, or bodies that don’t seem entirely correct. But the essence is there: With enough patience and enough of a record of the deceased, GFY can let you reconstruct their last moment, put on a virtual reality haptic-feedback suit, and step into it.
You can speak with them… at the end. you can touch them and they can touch you. We’re adding smell soon.
I believe it has helped people Let me try to explain how it worked the first time, all those years ago.
I was able to see the truck hit his bike. I saw his body fly through the air. I heard him say “oh no” the second after impact as he was catapulted off his bike and towards the side of the road. I heard his ribs break as he landed. I saw him crying and bleeding. I was able to approach his body. He was still breathing. I got on my knees and bent over him and I cried and the VR-helmet saw my tears in reality and simulated these tears falling onto his chest – and he appeared to see them, then looked up at me and smiled.
He touched my face and said “my child” and then he died.
Now I have that memory and I carry it in my heart as a candle to warm my soul. After I experienced this first GFY my dreams changed. It felt as though I had found a way to see him open the door – and leave. And then the door shut.
Grief is time times memory times the rejuvenation of closure: of a sense of things that were once so raw being healed and knitted back together. If you make the memory have closure things seem to heal faster.
Yes, I am still so angry. But when I sleep now I sometimes dream of that memory, and in my imagination we say other things, and in this way continue to talk softly through the years.
Things that inspired this story: The as-yet-untapped therapeutic opportunities afford by synthetic media generation (especially high-fidelity conditional video); GAN progression from 2014 to 2018; compute growth both observed and expected for the next few years; Ander Monson’s story “I am getting comfortable with my grief”.