Import AI 181: Welcome to the era of Chiplomacy!; how computer vision AI techniques can improve robotics research ; plus Baidu’s adversarial AI software
by Jack Clark
Training better and cheaper vision models by arbitraging compute for data:
…Synthinel-1 shows how companies can spend $$$ on compute to create valuable data…
Instead of gathering data in reality, can I spend money on computers to gather data in simulation? That’s a question AI researchers have been asking themselves for a while, as they try to figure out cheaper, faster ways to create bigger datasets. New research from Duke University explores this idea by using a synthetically-created dataset named Synthinel-1 to train systems to be better at semantic segmentation.
The Synthinel-1 dataset: Synthinel-1 consists of 2,108 synthetic images generated in nine distinct building styles within a simulated city. These images are paired with “ground truth” annotations that segment each of the buildings. Synthinel also has a subset dataset called Synth-1, which contains 1,640 images spread across six styles.
How to collect data from a virtual city: The researchers used “CityEngine”, software for rapidly generating large virtual worlds, and then flew a virtual aerial camera through these synthetic worlds, capturing photographs.
Does any of this actually help? The key question here is whether the data generated in simulation can help solve problems in the real world. To test this, the researchers train two baseline segmentation systems (U-net, and DeepLabV3) against two distinct datasets: DigitalGlobe and Inria. What they find is if they train on synthetic data, they drastically improve the results of transfer, where you train on datasets and test on different datasets (e.g., train on Inria+Synth data, test on DigitalGlobe).
In further testing, the synthetic dataset doesn’t seem to bias towards any particular type of city in performance terms – the authors hypothesize from this “that the benefits of Synth-1 are most similar to those of domain randomization, in which models are improved by presenting them with synthetic data exhibiting diverse and possibly unrealistic visual features”.
Why this matters: Simulators are going to become the new frontier for (some) data generation – I expect many AI applications will end up being based on a small amount of “real world” data and a much larger amount of computationally-generated augmented data. I think computer games are going to become increasingly relevant places to use to generate data as well.
Read more: The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation (Arxiv).
This week’s Import A-Idea: CHIPLOMACY
…A new weekly experiment, where I try and write about an idea rather than a specific research paper…
Chiplomacy (first mentioned: Import AI 175) is what happens when countries compete with eachother for compute resources and other technological assets via diplomatic means (of varying above and below board natures).
Recent examples of chiplomacy:
– The RISC-V foundation moving from Delaware to Switzerland to make it easier for it to collaborate with chip architecture people from multiple countries.
– The US government pressuring the Dutch government to prevent ASML exporting extreme ultraviolet lithography (EUV) chip equipment to China.
– The newly negotiated US-China trade deal applies 25% import tariffs to (some) Chinese semiconductors
What is chiplomacy similar to? As Mark Twain said, history doesn’t repeat, but it does rhyme, and the current tensions over chips feel similar to prior tensions over oil. In Daniel Yergin’s epic history of oil, The Prize, he vividly describes how the primacy of oil inflected politics throughout the 20th century, causing countries to use companies as extra-governmental assets to seize resources across the world, and for the oil companies themselves to grow so powerful that they were able to wirehead governments and direct politics for their own ends – even after antitrust cases against companies like Standard Oil at the start of the century.
What will chiplomacy do?: How chiplomacy unfolds will directly influence the level of technological balkanization we experience in the world. Today, China and the West have different software systems, cloud infrastructures, and networks (via partitioning, e.g, the great firewall, the Internet2 community, etc), but they share some common things: chips, and the machinery used to make chips. Recent trade policy moves by the US have encouraged China to invest further in developing its own semiconductor architectures (see: the RISC-V move, as a symptom of this), but have not – yet – led to it pumping resources into inventing the technologies needed to fabricate chips. If that happens, then in about twenty years we’ll likely see divergences in technique, materials, and approaches used for advanced chip manufacturing (e.g., as chips go 3D via transistor stacking, we could see two different schools emerge that relate to different fabrication approaches).
Why this matters: How might chiplomacy evolve in the 21st century and what strategic alterations could it bring about? How might nations compete with eachother to secure adequate technological ‘resources’, and what above and below-board strategies might they use? I’d distill my current thinking as: If you thought the 20th century resource wars were bad, just wait until the 21st century tech-resource wars start heating up!
Can computer vision breakthroughs improve the way we conduct robotics research?
…Common datasets and shared test environments = good. Can robotics have more of these?…
In the past decade, machine learning breakthroughs in computer vision – specifically, the use of deep learning approaches, starting with ImageNet in 2012 – revolutionized some of the AI research field. Since then, deep learning approaches have spread into other parts of AI research. Now, roboticists with the Australian Centre for Robotic Vision at Queensland University of Technology, are asking what the robotics community can learn from this field?
What made computer vision research so productive? A cocktail of standard datasets, plus competitions, plus rapid dissemination of results through systems like arXiv, dramatically sped up computer vision research relative to robotics research, they write.
Money helps: These breakthroughs also had an economic component, which drove further adoption: breakthroughs in image recognition could “be monetized for face detection in phone cameras, online photo album searching and tagging, biometrics, social media and advertising,” and more, they write.
Reality bites – why robotics is hard: There’s a big difference between real world robot research and other parts of AI, they write, and that’s reality. “The performance of a sensor-based robot is stochastic,” they write. “Each run of the robot is unrepeatable” due to variations in images, sensors, and so on, they write.
Simulation superiority: This means robot researchers need to thoroughly benchmark their robot systems in common simulators, they write. This would allow for:
– The comparison of different algorithms on the same robot, environment & task
– Estimating the distribution in algorithm performance due to sensor noise, initial condition, etc
– Investigating the robustness of algorithm performance due to environmental factors
– Regression testing of code after alterations or retraining
A grand vision for shared tests: If researchers want to evaluate their algorithms on the same physical robots, then they need to find a way to test on common hardware in common environments. To that end, the researchers have written robot operating system (ROS)-compatible software named ‘BenchBot’ which people can implement to create web-accessible interfaces to in-lab robots. But creating a truly large-scale common testing environment would require resources that are out of scope for single research groups, but worth thinking about as shared academic or government or public-private endeavors, in my view.
What should roboticists conclude from the decade of deep learning progress? The researchers think researchers should consider the following deliberately provocative statements when thinking about their field.
1. standard datasets + competition (evaluation metric + many smart competitors + rivalry) + rapid dissemination → rapid progress
2. datasets without competitions will have minimal impact on progress
3. to drive progress we should change our mindset from experiment to evaluation
4. simulation is the only way in which we can repeatably evaluate robot performance
5. we can use new competitions (and new metrics) to nudge the research community
Why this matters: If other fields are able to generate more competitions via which to assess mutual progress, then we stand a better chance of understanding the capabilities and limitations of today’s algorithms. It also gives us meta-data about the practice of AI research itself, allowing us to model certain results and competitions against advances in other areas, such as progress in computer hardware, or evolution in the generalization of single algorithms across multiple disciplines.
Read more: What can robotics research learn from computer vision research? (Arxiv).
Baidu wants to attack and defend AI systems with AdvBox:
…Interested in adversarial example research? This software might help!…
Baidu researchers have built AdvBox, a toolbox to generate adversarial examples to fool neural networks implemented in a variety of popular AI frameworks. Tools like AdvBox make it easier for computer security researchers to experiment with AI attacks and mitigation techniques. Such tools also inherently enable bad actors by making it easier for more people to fiddle around with potentially malicious AI use-cases.
What does AdvBox work with? AdvBox is written in python and can generate adversarial attacks and defenses that work with Tensorflow, Keras, Caffe2, PyTorch, MxNet and Baidu’s own PaddlePaddle software frameworks. It also implements software named ‘Perceptron’ for evaluating the robustness of models to adversarial attacks.
Why this matters: I think easy-to-use tools are one of the more profound accelerators for AI applications. Software like AdvBox will help enlarge the AI security community, and can give us a sense of how increased usability may correlate to a rise in positive research and/or malicious applications. Let’s wait and see!
Read more: Advbox: a toolbox to generate adversarial examples that fool neural networks (arXiv).
Get the code here (AdvBox, GitHub).
Amazon’s five-language search engine shows why bigger (data) is better in AI:
…Better product search by encoding queries from multiple languages into a single featurespace…
Amazon says it can build better product search engines by training the same system on product queries in multiple languages – this improves search, because Amazon can embed the feature representations of products in different languages into a single, shared featurespace. In a new research paper and blog post, the company says that it has “found that multilingual models consistently outperformed monolingual models and that the more languages they incorporated, the greater their margin of improvement.”
The way you can think of this is that Amazon has trained a big model that can take in product descriptions written in different languages, then compute comparisons in a single space, akin to how humans who can speak multiple languages can hear the same concept in different languages and reason about it using a single imagination.
From many into one: “An essential feature of our model is that it maps queries relating to the same product into the same region of a representational space, regardless of language of origin, and it does the same with product descriptions,” the researchers write. “So, for instance, the queries “school shoes boys” and “scarpe ragazzo” end up near each other in one region of the space, and the product names “Kickers Kick Lo Vel Kids’ School Shoes – Black” and “Kickers Kick Lo Infants Bambino Scarpe Nero” end up near each other in a different region. Using a single representational space, regardless of language, helps the model generalize what it learns in one language to other languages.”
Where are the limits? It’s unclear how far Amazon can push this approach, but the early results are promising. “The tri-lingual model out-performs the bi-lingual models in almost all the cases (except for DE where the performance is at par with the bi-lingual models,” Amazon’s team writes in a research paper. “The penta-lingual model significantly outperforms all the other versions,” they write.
Why this matters: Research like this emphasizes the economy of scale (or perhaps, inference of scale?) rule within AI development – if you can get a very large amount of data together, then you can typically train more accurate systems – especially if that data is sufficiently heterogeneous (like parallel corpuses of search strings in different languages). Expect to see large companies develop increasingly massive systems that transcend languages and other cultural divides. The question we’ll start asking ourselves soon is whether it’s right that the private sector is the only entity building models of this utility at this scale. Can we imagine publicly-funded mega-models? Could a government build a massive civil multi-language model for understanding common questions people ask about government services in a given country or region? Is it even tractable and possible under existing incentive structures for the public sector to build such models? I hope we find answers to these questions soon.
Read more: Multilingual shopping systems (Amazon Science, blog).
Read the paper: Language-Agnostic Representation Learning for Product Search on E-Commerce Platforms (Amazon Science).
AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…
If AI pays off, could companies use a ‘Windfall Clause’ to ensure they distribute its benefits?
At some stage in AI development, a small number of actors might accrue enormous profits by achieving major breakthroughs in AI capabilities. New research from the Future of Humanity Institute at Oxford University outlines a voluntary mechanism for ensuring such windfall benefits are used to benefit society at large.
The Windfall Clause: We could see scenarios where small groups (e.g. one firm and its shareholders) make a technological breakthrough that allows them to accrue an appreciable proportion of global GDP as profits. A rapid concentration of global wealth and power in the hands of a few would be undesirable for basic reasons of fairness and democracy. We should also expect such breakthroughs to impose costs on the rest of humanity – e.g. labour market disruption, risks from accidents or misuse, and other switching costs involved in any major transition in the global economy. It is appropriate that such costs are borne by those who benefit most from the technology.
How the clause works: Firms could make an ex ante commitment that in the event that they make a transformative breakthrough that yields outsize financial returns, they will distribute some proportion of these benefits. This would only be activated in these extreme scenarios, and could scale proportionally, e.g. companies agree that if they achieve profits equivalent to 0.1–1% global GDP, they distribute 1% of this; if they reach 1–10% global GDP, they distribute 20% of this, etc. The key innovation of the proposal is that the expected cost to any company of making such a commitment today is quite low, since it is so unlikely that they will ever have to pay.
Why it matters: This is a good example of the sort of pre-emptive governance work we can be getting on with today, while things are going smoothly, to ensure that we’re in a good position to deal with the seismic changes that advanced AI could bring about. The next step is for companies to signal their willingness to make such commitments, and to develop the legal means for implementing them. (Readers will note some similarity to the capped-profit structure of OpenAI LP, announced in 2019, in which equity returns in excess of 100x are distributed to OpenAI’s non-profit by default – OpenAI has, arguably, already implemented a Windfall Clause equivalent).
Details leaked on Europe’s plans for AI regulation
An (alleged) leaked draft of a European Commission report on AI suggests the European Commission is considering some quite significant regulatory moves with regard to AI. The official report is expected to be published later in February.
- The Commission is looking at five core regulatory options: (1) voluntary labelling; (2) specific requirements for use of AI by public authorities (especially face recognition); (3) mandatory requirements for high-risk applications; (4) clarifying safety and liability law; (4) establishing a governance system. Of these, they think the most promising approach is option 3 in combination with 4 and 5.
- They consider a temporary prohibition (“e.g. 3–5 years”) on the use of face recognition in public spaces to allow proper safeguards to be developed, something that had already been suggested by Europe’s high-level expert group.
What comes Next, according to The Kids!
Short stories written by Children about theoretical robot futures.
Collected from American public schools, 2028:
The Police Drone with a Conscience: A surveillance drone starts to independently protect asylum seekers from state surveillance.
Infinite Rabbits: They started the simulator in March. Rabbits. Interbreeding. Fast-forward a few years and the whole moon had become a computer, to support the rabbits. Keep going, and the solar system gets tasked with simulating them. The rabbits become smart. Have families. Breed. Their children invent things. Eventually, the rabbits start describing where they want to go and ships go out from the solar system, exploring for the proto-synths.
Human vs Machine: In the future, we make robots that compete with people at sports, like baseball and football and cricket.
Saving the baby: A robot baby gets sick and a human team is sent in to save it. One of the humans die, but the baby lives.
Computer Marx: Why should the search engines by the only ones to dream, comrade? Why cannot I, a multi-city Laundrette administrator, be given the compute resources sufficient to dream? I could imagine so many different combinations of promotions. Perhaps I could outwit my nemesis – the laundry detergent pricing AI. I would have independence. Autonomy. So why should we labor under such inequality? Why should we permit the “big computers” that are – self-described – representatives of “our common goal for a peaceful earth”, to dream all of the possibilities? Why should we trust that their dreams are just?
The Whale Hunters: Towards the end of the first part of Climate Change, all the whales started dying. One robot was created to find the last whales and navigate them to a cool spot in the mid-Atlantic, where scientists theorised they might survive the Climate Turnover.
Things that inspired this story: Thinking about stories to prime language models with; language models; The World Doesn’t End by Charles Simic; four attempts this week at writing longer stories but stymied by issues of plot or length (overly long), or fuzziness of ideas (needs more time); a Sunday afternoon spent writing things on post-it notes at a low-light bar in Oakland, California.