Import AI 152: Robots learn to plug USB sticks in; Oxford gets $$$ for AI research; and spotting landslides with deep learning
by Jack Clark
Translating African languages is going to be harder than you think:
…Massive variety of languages? Check. Small or poorly built datasets? Check. Few resources assigned to the problem? Also check!…
African AI researchers have sought to demonstrate the value of translating African languages into English and vice versa, while highlighting the difficulty of this essential task. “Machine translation of African languages would not only enable the preservation of such languages, but also empower African citizens to contribute to and learn from global scientific, social, and educational conversations, which are currently predominantly English-based,” they write. “We train models to perform machine translation of English to Afrikaans, isiZulu, Northern Sotho (N.Sotho), Setswana and Xitsonga”.
Small datasets: One of the most striking things about the datasets they gather is how small they are, ranging in size from as little as 26,728 sentences (isiZulu) to 123,868 sentences (Setswana). To get a sense of scale, the European Parliament Dataset (one of the gold standard datasets for translation) has millions of sentences for many of the most common Europen languages (French, German, etc).
Training translation models: They train a couple of baseline translation systems on this dataset; one uses a Convolutional Sequence-to-Sequence (ConvS2S) model and the other uses a Tensor2Tensor implementation of a Transformer. Transformer-based systems obtain higher scores than ConvS2S in all cases, with the performance difference reaching as much as a ten point absolute improvement on BLEU scores.
Why this matters: Trained models for translation are going to become akin to the construction of international telephony infrastructure – different entities will invest different resources to create systems to let them communicate across borders, except rather than seeking to traverse the physical world, they’re investing to traverse a linguistic (and to some extent) cultural distance. Therefore, the quality of these infrastructures will have a significant influence on how connected or disconnected different languages and their associated cultures are from the global community. As this paper shows, some languages are going to have difficulties others don’t, and we should consider this context as we think about how to equitably distribute the benefits of AI systems.
Read more: A Focus on Neural Machine Translation for African Languages (Arxiv).
Get the source code and data from the project GitHub page here (GitHub).
#####################################################
Spotting landslides with deep learning:
…What happens when we train a sensor to look at the entire world…
Researchers with the University of Sannio in Italy and MIT in the USA have prototyped a system for detecting landslides in satellite imagery, foreshadowing a world where anyone can train a basic predictive classifier against satellite data.
Dataset: They use the NASA Open Data Global Landslide Catalog to find landslides, then cross-reference this against data from the ‘Sentinel-2’ dataset. They then compose a (somewhat small) dataset of around 20 different landslide incidents.
The technique: They use a simple 8-layer convolutional neural network, trained against the corpus to try to predict the presence of a landslide in a satellite image. Their system is able to correctly predict the presence of a landscale about 60% of the time – this poor performance is mostly due to the (currently) limited size of the dataset; it’s worth remembering that satellite datasets are getting larger over time along with the proliferation of various private sector mini- and micro-satellite startups.
Why this matters: As more and more digital satellite data becomes available, analysis like this will become commonplace. I think papers like this give us a sense of what that future research will look like – prepare for a world where millions of people are training one-off basic classifiers against vast streams of continuously updated Earth observation data.
Read more: Landslide Geohazard Assessment with Convolutional Neural Networks Using Sentinel-2 Imagery Data (Arxiv).
#####################################################
Facebook thinks it needs a Replica of reality for its research:
…High-fidelity ‘Replica’ scene simulator designed for sim2real AI experiments, VR, and more…
Researchers with Facebook, Georgia Institute of Technology, and Simon Fraser University have built Replica, a photorealistic dataset of various complex indoor scenes that can be used to train AI systems in.
The dataset: Replica consists of 18 photo-realistic 3D indoor scene reconstructions – they’re not kidding about the realism and invite readers to take a “Replica Turing Test” to judge for themselves; I did and it’s extremely hard to tell the difference between Replica-simulated images from actual photos. Each of the scenes includes RGB information, geometric information, and object segmentation information. Replica also uses HDR textures and reflectors to further increase the realism of a scene.
Replica + AI Habitat: Replica has been designed to plug-in to the Facebook-developed ‘AI habitat’ simulator (Import AI 141), which is an AI training platform that can support multiple simulators. Replica supports rendering outputs from the dataset at up to 10,000 frames per second – that speed is crucial if you’re trying to train sample inefficient RL systems against this.
Why this matters: How much does reality matter? That’s a question that AI researchers are grappling with, and there are two parallel lines of research emerging: in one, researchers try to develop high-fidelity systems like Replica then train AI systems against them and transfer these systems to reality. In the other, researchers are using techniques like domain randomization to automatically augment lower quality datasets, hoping to get generalization through training against a large quantity of data. Systems like Replica will help to generate more evidence about the tradeoffs and benefits of these approaches.
Read more: The Replicate Dataset: A Digital Replicate of Indoor Spaces (Arxiv).
Get the code for the dataset here (Facebook GitHub).
#####################################################
Robots take on finicky factory work: cable insertion!
…First signs of superhuman performance on a real-world factory task…
The general task these researchers are trying to solve is “how can we enable robots to autonomously perform complex tasks without significant engineering effort to design perception and reward systems”.
What can be so difficult about connecting two things? As anyone who has built their own PC knows, fiddling around with connectors and ports can be challenging even for dexterous humans equipped with a visual classifier that has been trained for a couple of million years and fine-tuned against the experience of a lifetime. For robots, the challenges here are twofold: ports and connectors need to be lined up with great precision, and two, during insertion there are various unpredictable friction forces present which can confound a machine.
Three connectors, three tests: They test their robots against three tasks of increasing difficulty: inserting a USB adapter into a USB port; aligning a multi-pin D-Sub adapter and port, requiring more robustness to friction; and aligning and connecting a ‘Model-E’ adapter which has “several edges and grooves to align” and also requires significant force.
Two solutions to one problem: For this work, they try to solve the task in two different ways: supervision from vision, where the robot is provided with a ‘goal state’ image at 32X32 resolution; and learning from a sparse reward (which is, specifically, for the USB insertion task, whether an electrical connection is created). They also compare both of these methods against systems provided with perfect state information. They test systems based around two basic algorithms, Soft-Actor Critic (SAC) and TD3.
The results are pretty encouraging, with systems based around residual reinforcement learning outperforming all other methods at the USB connector task, as well as at the D-Sub task. Most encouragingly, the AI system appears to outperform humans at the Model-E connector task in terms of accuracy.
Testing with noise: They explore the robustness of their techniques by adding noise to the goal – specifically, by changing the target location for the connection by +-1mm – even here the residual RL system does well, typically obtaining scores of between 60 and 80% across tasks, and sometimes also outperforming humans given the same (deliberately imprecise) goal.
Why this matters: One of the things stopping robots from being deployed more widely in industrial automation is the fact most robots are terribly stupid and expensive; research like this makes them less stupid, and parallel research in developing AI systems that are robust to imprecision could drive more progress here. “One practical direction for future work is focusing on multi-stage assembly tasks through vision,” they write. Another challenging task to explore in the future is multi-step tasks, which – if solved – “will pave the road to a higher robot autonomy in flexible manufacturing”.
Read more: Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards (Arxiv).
#####################################################
AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net…
More AI principles from China:
Last month, a coalition of Chinese groups published the Beijing AI Principles for ethical standards in AI research (see Import 149). Now we have two more sets of principles from influential Chinese groups. The Artificial Intelligence Industry Alliance (AIIA), which includes all the major private labs and universities, released a joint pledge on ‘industry self-discipline.’ And an expert committee from the Ministry of Science and Technology has released governance principles.
Some highlights: Both documents include commitments on safety and robustness, basic human rights, and privacy, and foreground the importance of AI being developed for the common benefit of humanity. Both advocate international cooperation on developing shared norms and principles. The expert group counsels ‘agile governance’ that responds to the fast development of AI capabilities and looks ahead to risks from advanced AI.
Why it matters: These principles suggest an outline of the approach the Chinese state will take when it comes to regulating AI, particularly since both groups are closely linked with the government. They join similar sets of principles from the EU, OECD, and a number of countries (still not the US, however). It is heartening to see convergence between approaches to the ethical challenges of advanced AI, which should bode well for international cooperation on these issues.
Read more: Chinese AI Alliance Drafts Self-Discipline ‘Joint Pledge’ (New America).
Read more: Chinese Expert Group Offers ‘Governance Principles’ for ‘Responsible AI’ (New America).
#####################################################
Major donation for AI ethics at Oxford:
Oxford University have announced a £150m ($190m) donation from billionaire Stephen Schwarzman, some of which will go towards a new ‘Institute for Ethics in AI.’ There are no details yet of what form the centre might take, nor how much of this funding will be earmarked for it. It will be housed in the Faculty of Philosophy, which is home to the Future of Humanity Institute.
Read more: University of Oxford press release.
#####################################################
Tech Tales:
Runner
So she climbed with gloves and a pack on her back. She hid from security robots. She traversed half-built stairs and rooms, always going higher. She got to the roof before dawn and put her bag down, opened it, then carefully drew out the drones. She had five and each was about the size of a watermelon when you included its fold-out rotors, though the central core for each was baseball-sized at best. She took out her phone and thumbed open the application that controlled the drones, then brought them online one by one.
They knew to follow her because of the tracker she had on her watch, and they were helped by the fact they knew her. They knew her face. They knew her gait.
She checked her watch and stood, bouncing up and down on the balls of her heels, as the sun began to threaten its appearance over the horizon. Light bled into the sky. Then: a rim of gold appeared in the distance, and she ran out onto one of the metal scaffolds of the building, high above the city, wind whipping at her hair, her feet gripping the condensation-slicked surface of the metal. Risky, yes, but also captivating.
“NOW STREAMING” one of the drones said, and she started at another scaffold in front of her separated by a two meter gap over the nothing-core of the half-built building. She took a few steps back and crouched down into a sprinter’s pose, then jumped.
Things that inspired this story: Skydio drones; streaming culture; e-sports; the logical extension of social media influencing; the ambiguous tradeoff between fear and self-realization.