Import AI

Import AI 152: Robots learn to plug USB sticks in; Oxford gets $$$ for AI research; and spotting landslides with deep learning

Translating African languages is going to be harder than you think:
…Massive variety of languages? Check. Small or poorly built datasets? Check. Few resources assigned to the problem? Also check!…
African AI researchers have sought to demonstrate the value of translating African languages into English and vice versa, while highlighting the difficulty of this essential task. “Machine translation of African languages would not only enable the preservation of such languages, but also empower African citizens to contribute to and learn from global scientific, social, and educational conversations, which are currently predominantly English-based,” they write. “We train models to perform machine translation of English to Afrikaans, isiZulu, Northern Sotho (N.Sotho), Setswana and Xitsonga”.

Small datasets: One of the most striking things about the datasets they gather is how small they are, ranging in size from as little as 26,728 sentences (isiZulu) to 123,868 sentences (Setswana). To get a sense of scale, the European Parliament Dataset (one of the gold standard datasets for translation) has millions of sentences for many of the most common Europen languages (French, German, etc).

Training translation models: They train a couple of baseline translation systems on this dataset; one uses a Convolutional Sequence-to-Sequence (ConvS2S) model and the other uses a Tensor2Tensor implementation of a Transformer. Transformer-based systems obtain higher scores than ConvS2S in all cases, with the performance difference reaching as much as a ten point absolute improvement on BLEU scores.

Why this matters: Trained models for translation are going to become akin to the construction of international telephony infrastructure – different entities will invest different resources to create systems to let them communicate across borders, except rather than seeking to traverse the physical world, they’re investing to traverse a linguistic (and to some extent) cultural distance. Therefore, the quality of these infrastructures will have a significant influence on how connected or disconnected different languages and their associated cultures are from the global community. As this paper shows, some languages are going to have difficulties others don’t, and we should consider this context as we think about how to equitably distribute the benefits of AI systems.
  Read more: A Focus on Neural Machine Translation for African Languages (Arxiv).
  Get the source code and data from the project GitHub page here (GitHub).

#####################################################

Spotting landslides with deep learning:
…What happens when we train a sensor to look at the entire world…
Researchers with the University of Sannio in Italy and MIT in the USA have prototyped a system for detecting landslides in satellite imagery, foreshadowing a world where anyone can train a basic predictive classifier against satellite data.

Dataset: They use the NASA Open Data Global Landslide Catalog to find landslides, then cross-reference this against data from the ‘Sentinel-2’ dataset. They then compose a (somewhat small) dataset of around 20 different landslide incidents.

The technique: They use a simple 8-layer convolutional neural network, trained against the corpus to try to predict the presence of a landslide in a satellite image. Their system is able to correctly predict the presence of a landscale about 60% of the time – this poor performance is mostly due to the (currently) limited size of the dataset; it’s worth remembering that satellite datasets are getting larger over time along with the proliferation of various private sector mini- and micro-satellite startups.

Why this matters: As more and more digital satellite data becomes available, analysis like this will become commonplace. I think papers like this give us a sense of what that future research will look like – prepare for a world where millions of people are training one-off basic classifiers against vast streams of continuously updated Earth observation data.
  Read more: Landslide Geohazard Assessment with Convolutional Neural Networks Using Sentinel-2 Imagery Data (Arxiv).

#####################################################

Facebook thinks it needs a Replica of reality for its research:
…High-fidelity ‘Replica’ scene simulator designed for sim2real AI experiments, VR, and more…
Researchers with Facebook, Georgia Institute of Technology, and Simon Fraser University have built Replica, a photorealistic dataset of various complex indoor scenes that can be used to train AI systems in.

The dataset: Replica consists of 18 photo-realistic 3D indoor scene reconstructions – they’re not kidding about the realism and invite readers to take a “Replica Turing Test” to judge for themselves; I did and it’s extremely hard to tell the difference between Replica-simulated images from actual photos. Each of the scenes includes RGB information, geometric information, and object segmentation information. Replica also uses HDR textures and reflectors to further increase the realism of a scene.

Replica + AI Habitat: Replica has been designed to plug-in to the Facebook-developed ‘AI habitat’ simulator (Import AI 141), which is an AI training platform that can support multiple simulators. Replica supports rendering outputs from the dataset at up to 10,000 frames per second – that speed is crucial if you’re trying to train sample inefficient RL systems against this.

Why this matters: How much does reality matter? That’s a question that AI researchers are grappling with, and there are two parallel lines of research emerging: in one, researchers try to develop high-fidelity systems like Replica then train AI systems against them and transfer these systems to reality. In the other, researchers are using techniques like domain randomization to automatically augment lower quality datasets, hoping to get generalization through training against a large quantity of data. Systems like Replica will help to generate more evidence about the tradeoffs and benefits of these approaches.
  Read more: The Replicate Dataset: A Digital Replicate of Indoor Spaces (Arxiv).
  Get the code for the dataset here (Facebook GitHub).

#####################################################

Robots take on finicky factory work: cable insertion!
…First signs of superhuman performance on a real-world factory task…
The general task these researchers are trying to solve is “how can we enable robots to autonomously perform complex tasks without significant engineering effort to design perception and reward systems”.

What can be so difficult about connecting two things? As anyone who has built their own PC knows, fiddling around with connectors and ports can be challenging even for dexterous humans equipped with a visual classifier that has been trained for a couple of million years and fine-tuned against the experience of a lifetime. For robots, the challenges here are twofold: ports and connectors need to be lined up with great precision, and two, during insertion there are various unpredictable friction forces present which can confound a machine.

Three connectors, three tests: They test their robots against three tasks of increasing difficulty: inserting a USB adapter into a USB port; aligning a multi-pin D-Sub adapter and port, requiring more robustness to friction; and aligning and connecting a ‘Model-E’ adapter which has “several edges and grooves to align” and also requires significant force.

Two solutions to one problem: For this work, they try to solve the task in two different ways: supervision from vision, where the robot is provided with a ‘goal state’ image at 32X32 resolution; and learning from a sparse reward (which is, specifically, for the USB insertion task, whether an electrical connection is created). They also compare both of these methods against systems provided with perfect state information. They test systems based around two basic algorithms, Soft-Actor Critic (SAC) and TD3.
  The results are pretty encouraging, with systems based around residual reinforcement learning outperforming all other methods at the USB connector task, as well as at the D-Sub task. Most encouragingly, the AI system appears to outperform humans at the Model-E connector task in terms of accuracy.

Testing with noise: They explore the robustness of their techniques by adding noise to the goal – specifically, by changing the target location for the connection by +-1mm – even here the residual RL system does well, typically obtaining scores of between 60  and 80% across tasks, and sometimes also outperforming humans given the same (deliberately imprecise) goal.

Why this matters: One of the things stopping robots from being deployed more widely in industrial automation is the fact most robots are terribly stupid and expensive; research like this makes them less stupid, and parallel research in developing AI systems that are robust to imprecision could drive more progress here. “One practical direction for future work is focusing on multi-stage assembly tasks through vision,” they write. Another challenging task to explore in the future is multi-step tasks, which – if solved – “will pave the road to a higher robot autonomy in flexible manufacturing”.
  Read more: Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards (Arxiv).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

More AI principles from China:
Last month, a coalition of Chinese groups published the Beijing AI Principles for ethical standards in AI research (see Import 149). Now we have two more sets of principles from influential Chinese groups. The Artificial Intelligence Industry Alliance (AIIA), which includes all the major private labs and universities, released a joint pledge on ‘industry self-discipline.’ And an expert committee from the Ministry of Science and Technology has released governance principles.

  Some highlights: Both documents include commitments on safety and robustness, basic human rights, and privacy, and foreground the importance of AI being developed for the common benefit of humanity. Both advocate international cooperation on developing shared norms and principles. The expert group counsels ‘agile governance’ that responds to the fast development of AI capabilities and looks ahead to risks from advanced AI.

  Why it matters: These principles suggest an outline of the approach the Chinese state will take when it comes to regulating AI, particularly since both groups are closely linked with the government. They join similar sets of principles from the EU, OECD, and a number of countries (still not the US, however). It is heartening to see convergence between approaches to the ethical challenges of advanced AI, which should bode well for international cooperation on these issues.
  Read more: Chinese AI Alliance Drafts Self-Discipline ‘Joint Pledge’ (New America).
  Read more: Chinese Expert Group Offers ‘Governance Principles’ for ‘Responsible AI’ (New America).

#####################################################

Major donation for AI ethics at Oxford:
Oxford University have announced a £150m ($190m) donation from billionaire Stephen Schwarzman, some of which will go towards a new ‘Institute for Ethics in AI.’ There are no details yet of what form the centre might take, nor how much of this funding will be earmarked for it. It will be housed in the Faculty of Philosophy, which is home to the Future of Humanity Institute.
  Read more: University of Oxford press release.

#####################################################

Tech Tales:

Runner

So she climbed with gloves and a pack on her back. She hid from security robots. She traversed half-built stairs and rooms, always going higher. She got to the roof before dawn and put her bag down, opened it, then carefully drew out the drones. She had five and each was about the size of a watermelon when you included its fold-out rotors, though the central core for each was baseball-sized at best. She took out her phone and thumbed open the application that controlled the drones, then brought them online one by one.

They knew to follow her because of the tracker she had on her watch, and they were helped by the fact they knew her. They knew her face. They knew her gait.

She checked her watch and stood, bouncing up and down on the balls of her heels, as the sun began to threaten its appearance over the horizon. Light bled into the sky. Then: a rim of gold appeared in the distance, and she ran out onto one of the metal scaffolds of the building, high above the city, wind whipping at her hair, her feet gripping the condensation-slicked surface of the metal. Risky, yes, but also captivating.

“NOW STREAMING” one of the drones said, and she started at another scaffold in front of her separated by a two meter gap over the nothing-core of the half-built building. She took a few steps back and crouched down into a sprinter’s pose, then jumped.

Things that inspired this story: Skydio drones; streaming culture; e-sports; the logical extension of social media influencing; the ambiguous tradeoff between fear and self-realization.

Import AI 151: US Army trains StarCraft II AI; teaching drones to dodge thrown objects; and fighting climate change with machine learning

Drones that dodge, evade, and avoid objects – they’re closer than you think:
…Drones are an omni-use platform, and they’re about to get really smart…
The University of Maryland and the University of Zurich have taught drones how to dodge rapidly moving objects, taking a further step towards building semi-autonomous, adaptive small-scale aircraft. The research shows that drones equipped with a few basic sensors and some clever AI software can learn to dodge (and chase) a variety of objects. “To our knowledge, this is the first deep learning based solution to the problem of dynamic obstacle avoidance using event cameras on a quadrotor”, they write.

How it works: The approach has three key components, which are each specialized modules that use neural networks or optical flow approaches. These systems and their corresponding functions are as follows:

  • EVDeBlurNet – deblur and denoise the event image sequences before any computation takes place
  • EVHomographyNet – approximate background motion
  • EVSegFlowNet – segment moving objects and compute their image motion

  These three systems let the drones clean up its input images so it can compute over them, then work out where it is, then look at the objects around itself and react.

How well does it work? The researchers approach is promising but not ready for any kind of real-world deployment, due to insufficient accuracy. However, the system displays promising breadth when it comes to dealing with a variety of objects to dodge. For assessment, the researchers run 30 tests with each object and report the result. In tests, the researchers find that the drone can easily dodge thrown balls and model cars (86% success), can dodge and chase another drone (83%), can dodge two objects thrown at it in quick success (76%), struggles a bit with an oddly shaped model plane (73%), and achieves a success rate of 70% in a low-light experiment.

Why this matters: Drones are getting smaller and smarter, and research like this shows how pretty soon we’re likely going to be able to build DIY drones that have what I’d term ‘dumb spatial intelligence’, that is, we can start to train these systems to do things like dodge moving objects, navigate around obstacles, deal with occluded environments, and learn to follow or fly towards specific people or objects. The implications for this are significant, unlocking numerous commercial applications, while also changing the landscape of asymmetric warfare in profound ways, the consequences of which shall likely highlight the difficulty of controlling AI capability use and diffusion.
  Read more: EVDodge: Embodied AI For High-Speed Dodging On A Quadrotor Using Event Cameras (Arxiv).

#####################################################

“Build marines!” – US Army trains teaches RL agents to respond to voice commands:
…StarCraft II research highlights military interest in complex, real-time strategy games…
US Army Research Laboratory researchers have developed a reinforcement learning agent that can carry out actions in response to pre-defined human commands. For this experiment, they test in the domain of StarCraft II, a complex real-time strategy game. The goal of this is to work out smarter ways in which humans can control semi-autonomous AI systems in the future. “Our mutual-embedding model provides a promising mechanism for creating a generalized sequential reward that capitalizes on a human’s capacity to utilize higher order knowledge to achieve long-term goals,” they write. “By providing a means for a human to guide a learning agent via natural language, generalizable sequential policies may be learned without the overhead of creating hand-crafted sub-tasks or checkpoints that would depend critically on expert knowledge about RL reward functions”.

How it works: The researchers use a relatively simple technique of “training a mutual-embedding model using a multi-input deep-neural network that projects a sequence of natural language commands into the same high-dimensional representation space as corresponding goal states”. In a prototype experiment, they see how well they can use voice commands to succeed at the ‘BuildMarines’ challenge, a mini-game within the StarCraft 2 environment.

Why this matters: Developing more natural interfaces between humans and AI systems is a long-standing goal of AI research, and it’s interesting to see how military organizations think about this problem. I wouldn’t be surprised to see more military organizations explore using StarCraft 2 as a basic testing ground for advanced AI systems, given its overlap with natural military interests of logistics, supply chains, and the marshaling and deployment of forces.
  Read more: Grounding Natural Language Commands to StarCraft II Game States for Narration-Guided Reinforcement Learning (Arxiv).

#####################################################

UN researchers generate fake UN speeches:
…Machine-driven diplomacy…
Researchers affiliated with the United Nations’ ‘Global Pulse’ and the University of  Durham, have used AI systems to generate remarks in the style of political leaders speaking at the UN General Assembly. For this experiment, they train on the English language transcripts of 7,507 speeches given by political leaders at the UN General Assembly (UNGA) between 1970 and 2015.

Training tools and costs: The core of this system as an AWD-LSTM model pre-trained on Wikitext-103, then fine-tuned against the corpus of UN data. Training cost as little as $7.80 total when using AWS spot instances, and took about 13 hours using NVIDIA k80 GPUs.

Dataset bias: The experiment serves as a proof-of-concept that also highlights some of the ways in which dataset bias can influence language models – while it was relatively easy for the authors to prompt the language model to generate UN-style speeches, they found it was more difficult to generate ‘inflammatory’ speeches as there are fewer of these in the UN dataset.

How well does it work: Qualitatively, the model is able to periodically generate samples that can read like convincing extracts from real speeches. For instance, a model prompted with “The Secretary-General strongly condemns the deadly terrorist attacks that took place in Mogadishu” generates the outputs “We fully support the action undertaken by the United Nations and the international community in that regard, as well as to the United Nations and the African Union, to ensure that the children of this country are left alone in the process of rebuilding their societies.”

Implications: Language models like these have a few implications, the researchers write. These include the likelihood of broad diffusion of the technology (for example, though OpenAI chose not to fully release its GPT-2 model, others might); it being generally easier to generate disinformation; it being easy to automatically generate hate speech; and it becoming easier to train models to impersonate people.

Recommendations: So, what do we do? The authors recommend we map the human rights impacts of these technologies, develop tools for systematically and continuously monitoring AI-generated content, set up strategies for countermeasures, and build alliances between various AI actors to develop a “coherent and proactive global strategy”.

Why this matters: Research like this highlights the concern some people feel about increasingly powerful models, and emphasizes the significant implications of them for society, as well as the need for us to think creatively about interventions to deal with the most easy-to-anticipate malicious uses of such systems.
  Read more: Automated Speech Generation from UN General Assembly Statements: Mapping Risks in AI Generated Texts (Arxiv).

#####################################################

What happens when you can buy AI-infused cyberattacks on the dark web?
…Alphabet-subsidiary Jigsaw says it paid for a Russian troll campaign last year…
$250. That’s how much it cost Alphabet-subsidiary to pay someone to run a troll campaign against a website it had created named “Down With Stalin”, according to an article in Wired. They paid used a service called ‘SEOTweet’ to carry out a social media disinformation campaign, which let to 730 Russian-language tweets from 25 accounts, as well as 100 posts to forums and blog comment sections.

Controversy: Some people think it’s kind of shady that an Alphabet-subsidiary would pay a third-party to mount an actual cyberattack. The experiment could be seen, for instance, as Alphabet and Google trying to meddle in Russian politics, one researcher said.
  Read more: Alphabet-owned Jigsaw Bought a Russian Troll Campaign As An Experiment (Arxiv).

#####################################################

AI luminaries team up to fight climate change:
…Climate change + machine learning = perhaps we can stabilize the planet…
Can machine learning help fix climate change? An interdisciplinary group of researchers from universities like the University of Pennsylvania and Carnegie Mellon University, and companies like DeepMind and Microsoft Research, think the use of machine learning can help society tackle one of its greatest existential threats. The researchers identify ten rough categories of machine learning (computer vision; NLP; time-series analysis; unsupervised learning; RL & control; causal inference; uncertainty quantifications; transfer learning; interpretable ML, and ‘other’), then set them against various ‘climate change solution domains’ like CO2 Removal, Transportation, Solar Geoengineering, and more.
  The paper tags its various approaches with the following possible labels: High Leverage (which means ML may be especially helpful here); Long-term (which indicates things that will have a primary impact after 2040); and ‘High Risk’ which indicates things that have risks or potential side effects. The paper is as much a call for massive interdisciplinary collaboration, as it is a survey.

High Leverage tools for a climate change future: Some of the areas where machine learning can help and which the authors deem ‘High Leverage’ when it comes to mitigating climate change include: developing better materials for energy storage or consumption; helping to develop nuclear fusion; reducing emissions from fossil fuel power generation; creating sample-efficient ML to work in ‘low-data settings’; modeling demand for power; smarter freight routing; further development of electric vehicles; improving low-carbon options; creating smarter and more efficient buildings; gathering infrastructure data; improving the efficiency of supply chains; developing better materials and construction; improving the efficiency of HVAC systems; remotely sensing emissions; precision agriculture; estimating carbon stored in forests; tools to track deforestation; helping to sequester CO2; forecasting extreme events; monitoring ecosystems and species populations; increasing food security; developing better systems to disaster relief; “engineering a planetary control system”; and using ML to model consumers and understand how to nudge them to more climate-friendly actions; and better predicting the financial effects of climate change.

Why this matters… should be fairly self-evident! We must preserve spaceship Earth – all the other reachable planets are shit in comparison.
  Read more: Tackling Climate Change with Machine Learning (Arxiv).

#####################################################

Want to see how good your system is at surveilling people in crowded spaces? Enter the MOTChallenge:
…CVPR19 benchmark aims to push the limits on AIs for spotting people in crowded scenes…
An interdisciplinary group of researchers from ETH Zurich, the Technical University of Munich (TUM), and the Australian Institute for Machine Learning at the University of Adelaide have released the 2019 Multiple Object Tracking challenge, called the MOTChallenge. This challenges AI systems to label and spot pedestrians in crowded spaces.

The new benchmarks have arrived:
The new CVPR19 benchmark consists of eight novel sequences from three “very crowded” scenes, where densities of pedestrians can climb as high as 246 per frame – almost as hard as playing Where’s Waldo? The datasets have been annotated with a particular emphasis on people, so pedestrians are labelled if they’re moving and given a separate label if they’re not in an upright position (aka, sitting down). “The idea is to use these annotations in the evaluation such that an algorithm is neither penalized nor rewarded for tracking, e.g., a sitting or not moving person”.

Evaluation metrics: Entrants to the competition will be evaluated using the ‘CLEAR’ metrics, as well as some of the quality measures introduced in an earlier CVPR paper: “Tracking of multiple, partially occluded humans based on static body part detection”.

Why this matters: AI research thrives on challenges, with harder evaluation criteria typically combing with larger datasets to motivate researchers to invent new systems capable of enhanced performance. Additionally, systems developed for competitions like this will have a significant role in the rollout of AI-infused surveillance technologies, so monitoring competitions such as this can give us a better sense of that.
  Read more: CVPR19 Tracking and Detection Challenge: How crowded can it get? (Arxiv).
  Get the data, current ranking and submission guidelines from the official website (MOTChallenge.net).

#####################################################

OpenAI Bits & Pieces:

OpenAI testifies for House Intelligence Committee on AI, synthetic media, & deepfakes:
Last week, I testified in Washington about the relationship between AI, synthetic media, and deepfakes. For this testimony I sought to communicate the immense utility of AI systems, while advocating for a variety of interventions to increase the overall resilience of society to increasingly cheap & multi-modal fake media.

  I also collected inputs for my testimony via a public Google Form I posted on Twitter, yielding around 25 responses – this worked really well, and felt like a nice way to be able to integrate broad feedback from the AI community into important policy conversations.

  Watch the hearing here: Open Hearing on Deepfakes and Artificial Intelligence (YouTube).
  Read written testimony from OpenAI and the other panellists here (House Permanent Select Committee on Intelligence website).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

Microsoft removes large face recognition database:
Microsoft have removed one of their face recognition datasets from the internet. ‘MS Celeb’ contained 10 million photos of 100,000 individuals, and was reportedly the largest publicly available dataset of its kind. The company had recently come under criticism, since individuals whose photos were used had not provided consent. The photos were scraped under the Creative Commons license, on the basis that they were being used for academic purposes. In fact, the dataset had been used by a number of private labs to train face recognition models, including Microsoft itself.

Why it matters: Microsoft have been outspoken on face recognition, releasing ethical principles for use of the technology, and calling for greater regulation and scrutiny (see Import #125). While this is slightly embarrassing, the company appears to have reacted quickly when made aware of the privacy concerns surrounding the database.
  Read more: Microsoft deletes massive face recognition database (BBC).
  Read more: Facial recognition: It’s time for action (Microsoft, 2018).

#####################################################

China, AI, and national strategy:
Jeffrey Ding and Helen Toner, from the Center for Security and Emerging Technology (CSET) at Georgetown University, were among those who gave testimony to the US-China Economic and Security Review Commission in Congress. The testimony covered several aspects of international competition on AI, and how the US can maintain its strong position.
  
US-China competition: Ding argued that, contrary to prevailing narratives, China is not poised to overtake the US in AI. A careful examination of key measures reveals claims of Chinese dominance to be overstated. For example, while China is competitive on the raw number of AI practitioners and patent filings, when this is restricted to AI experts, and highly-cited patents, China still lags behind the US. Similarly, while China’s public investment in AI R&D is comparable or greater than that of the US, private R&D spending from US companies dwarfs that of Chinese peers.

Policy recommendations: Ding and Toner made a number of concrete policy recommendations for the US:

  • Revive the Office of Technological Assessment, which previously provided impartial advice to US lawmakers on technological issues, allowing for better informed policy-making.
  • Work on bridging the ‘valley of death’— the gap between research and commercial applications of AI.
  • Prioritise safety and minimising risks from AI, alongside broader policy ambitions.
  • Improve immigration options for AI researchers and engineers.
  • Support NIST in developing and implementing standards for AI.
  • Increase R&D funding for basic AI research.

Read more: Helen Toner’s written testimony.
Read more: Jeff Ding’s written testimony.

#####################################################

Tech Tales:

Healing Joke

When my son was four I got him a robot. It was a small, hockey-puck shaped thing, and it would follow him around the house asking him to clean up after himself (he sometimes did) and seeing if he wanted to play games (he always did). On his fifth birthday my son painted the robot green, and thereafter we all called the robot Froggy. My son grew up with the robot, and the robot knew just as much about my son as I did – which was a lot. One day, shortly before my son’s tenth birthday, he ran out into the road during a storm and Froggy came out of the house and skittered down the path and onto the asphalt, raising its voice and asking my son to come inside. My son obliged and began to run back to the house. Froggy followed, but not fast enough – a car ran over him, breaking him up into many little pieces. Something about rain, they said. Something about sensors.

My son was, as you can predict, distraught. After a couple of days of moping around the house he came up to me with an envelope and asked me to bury it with Froggy. I read it later that day, before sealing it in a plastic bag and placing it in the cardboard box I’d later bury Froggy in.

Dear Froggy,
I do not know if there is robot heaven but if there is I hope you are there and they have lots of SPARE PARTS for you. I remember when I fixed one of your wheels after you chased me. I like how you played fetch and sometimes you would hide things from me and I’d say ‘Froggy that’s no fair’ and you’d say ‘it’s not my fault I am so smart’ and then chase me again. I got so happy when I got strong enough to pick you up and I remember you saying ‘put me down this is unsafe’ and ‘I have emailed your parents about this’. Remember the time i put you in the fridge and you got so cold you had to go to sleep? I remember you sent me and dad pictures from inside the fridge and you captioned them YOUR SON DID THIS. Boy did I get in trouble!

I dreamed about you a lot. Did I tell you this? I can’t remember. Once you were as big as a house and I lived in a small wooden shack on your back. Another time there were ten thousand of you and you were going all over the world and looking for things for me. I never had a nightmare about you don’t worry.

My hand is getting pretty tired of writing now so I’m going to stop. Froggy I love you don’t be sad – I’ll be okay.

Things that inspired this story: Childhood, Furbys, natural attachments from youthful acclimatization, roomba robots, KIva robots, father’s day.

Import AI 150: Training a kiss detector; bias in AI, rich VS poor edition; and just how good is deep learning surveillance getting?

What happens when AI alters society as much as computers and the web have done?
…Researchers contemplate long-term trajectory of AI, and detail a lens to use to look at its evolution…
Based on how the World Wide Web and the Computing industry altered society, how might we expect the progression of artificial intelligence to influence society? That’s the question researchers with Cognizant Technology Solutions and the University of Texas at Austin try to answer in a new research paper.

The four phases of technology: According to the researchers, any technology has four defining phases – standardization; usability; consumerization; and foundationalization [?]. For example, the ‘usability’ phase for computing was when people adopted GUI interfaces, while for the web, it was when people adopted stylesheets to separate content from presentation. By stage four (where computing is now and where the web is heading) “people do not have to care where and how it happens – they simply interact with its results, the same way we interact with a light switch or a faucet”, they write.

Lessons for the AI sector: Right now, AI as a technology is at the pre-standardization stage/.

  Standardization: We need standards for how we connect AI systems together. “It should be possible to transport the functionality from one task to another,” they write, “e.g. to learn to recognize a different category of objects” across different classification infrastructures using shared systems.

  Usability: AI needs interfaces that everyone can use and access, the authors write. They then reference Microsoft’s dominance of the PC industry in the 1990s as an example of the sort of thing we want to avoid with AI, though it’s pretty unclear from the paper what they mean by usability and accessibility here.

  Consumerization: The general public will need to be able to easily create AI services. “People can routinely produce, configure, teach, and such systems for different purposes and domains,” they write. “They may include intelligent assistants that manage an individual’s everyday activities, finances, and health, but also AI systems that design interiors, gardents, and clothing, maintain buildings, appliances and vehicles, and interact with other people and their AIs.

  Foundationalization, “AI will be routinely running business operations, optimizing government policies, transportation, agriculture, and healthcare,” they write. AI will be particularly useful for directing societies to solve complex, intractable problems, they write. “For instance, we may decide to maximize productivity and growth, but at the same time minimize cost and environmental impact, and promote equal access and diversity.”

Why this matters: AI researchers are increasingly seeking to situate themselves and their research in relation to the social as well as technical phenomena of AI, and papers like this are artefacts of this process. I think this prefigures the general politicization of the AI community. I suspect that in a couple of years we may even need an additional Arxiv sub-category to contain such papers as these.
  Read more: Better Future through AI: Avoiding Pitfalls and Guiding AI Towards its Full Potential (Arxiv).

#####################################################

Deep learning + surveillance = it’s getting better all the time:
…Vehicle re-identification survey shows how significant deep learning is for automating surveillance systems…
How good has deep learning been for vehicle surveillance? A significant effect, according to a survey paper from researchers with the University of Hail in Saudi Arabia.

Sensor-based methods: In the early 90s, researchers developed sensor-based methods for identifying and re-identifying vehicles; these methods used things like inductive loops, as well as sensors for infrared, ultrasonic, microwave, magnetic, and piezoelectric. Other methods have explored using systems like GPS, mobile phone signatures, and RFID and MAC address-based identification. People have also explored using multi-sensor systems to increase the accuracy of identifications. All of these systems had drawbacks, mostly relating to them breaking in the presence of unanticipated things, like modified or occluded vehicles.

Vision-based methods: Pre-deep learning and from the early 2000s, people experimented with a bunch of hand-crafted feature-based methods to try to create more flexible less sensor-dependent approaches to the task of vehicle identification. These techniques can do things like generate bounding boxes around vehicles, and even match specific vehicles between non-overlapping camera fields. But these methods also have drawbacks relating to their brittleness, and dependence on features that may change or be occluded. “The performance of appearance based approaches is limited due to different colors and shapes of vehicles”, they write.

Deep learning: Since the ImageNet breakthrough in 2012, researchers have increasingly used these techniques for vision problems, including for vehicle re-identification, mostly because they’re simpler systems to implement and tend to have better generalization properties. These methods typically use convolutional neural networks, sometimes paired with an LSTM. Any deep learning method appears to outperform hand-crafted based methods, according to tests in which 12 deep learning-based methods were compared against 8 hand-crafted ones.

The future of vehicle re-identification: Vehicles vary in appearance a lot more than humans, so it will be more difficult to train classifiers that can accurately identify all the vehicles that can pass through a city on a given day. Additionally, we’ll need to build larger datasets to be able to better model the temporal aspect of entity-tracking – this should also let us accurately identify vehicles with bigger lags between them.

Why this matters: The maturation of deep learning technology is irrevocably changing surveillance, improving the capabilities and scalability of a bunch of surveillance techniques, including vehicle re-identification.

  Read more: A survey of advances in vision-based vehicle re-identification (Arxiv).

#####################################################

AI Stock Image of the Week:
Thanks to Delip Rao for surfacing this delightful contribution to the burgeoning media genre.

#####################################################

Spotting intimacy in Hollywood films with a kiss detector:
…Conv and Lution sitting in a tree, K-I-S-S-I-N-G!…
Amir Ziai, a researcher at Stanford University, has built a deep learning-based kissing detector! The unbearably cute project takes in a video clip, spots all the kissing scenes in it, then splices thouse scenes together into an output.

Classifying Kissing: So, how do you spot kissing? Here, we use a multi-modal classifier which uses a network to detect the visual appearance of a kiss, and another network which scans the audio over that same period, extracting features out of it (architecture used: ‘VGGish’, “a very effective feature extractor for downstream Acoustic Event Detection).

   The dataset: The data for this research is a 2.3TB database of ~600 Hollywood films spanning 1915 to 2016, with files ranging in size from 200MB and 12GB. 100 of these movies have been annotated with kissing segments, for a total of 263 kissing segments and 363 non-kissing segments across 100 films.

The trained ‘kiss detector’ gets an F1 score of 0.95 or so, so in a particularly salacious movie you might expect to get a few mis-hits in the output, but you’ll likely capture the majority of the moments if you run this over it.

   Why this matters: This is a good example of how modern computer vision techniques make it fairly easy to develop specific ‘sense and respond’ software, cued to qualitative/unstructured things (like the presence of kissing in a scene). I think this is one of the most under-hyped aspects of how AI is changing the scope of individual software development. I could also imagine systems like this being used for somewhat perverse/weird uses, but I’m reading this paper
  Read more: Detecting Kissing Scenes in a Database of Hollywood Films (Arxiv).

#####################################################

Bias in AI: What happens when rich countries get better models?
…Facebook research shows how biases in dataset collection and labeling lead to a rich VS poor divide…
The recent spate of research into bias in AI systems feels like finding black mold in an old apartment building – you spot a patch on the wall, look closer, and then realize that the mold is basically baked into the walls of the apartment and if you can’t see it it’s probably because you aren’t looking hard enough or don’t have the right equipment. Bias in AI feels a bit like that, where the underlying data that is used to train various systems has obvious bias (like mostly containing white people, instead of a more diverse set of humans), but also has non-obvious bias which gets discovered through testing (for example, early work on word embeddings), and the more we think about bias the more ways we find to test for it and reveal it.

Now, researchers with Facebook AI Research have shown how image datasets might have an implicit bias towards certain kinds of representations of common concepts, favoring rich countries over poor ones. The study “suggests these systems are less effective at recognizing household items that are common in non-Western countries or in low-income communities” as a consequence of subtle biases in the underlying dataset. “The absolute difference in accuracy of recognizing items in the United States compared to recognizing them in Somalia or Burkina Faso is around 15% to 20%. These findings are consistent across a range of commercial cloud services for image recognition”

The dataset: For this study, the authors investigate the ‘Dollar Street Dataset’, which contains photos of common goods across 135 different classes in photos taken in 264 homes across 54 countries.

Recognition for some, but not for all: The researchers discovered that “for all systems, the difference in accuracy for household items appearing in the lowest income bracket (less than $50 per month) is approximately 10% lower than that for household items appearing in the highest income bracket”.

To generate these results, the researchers measured the accuracy of five commercial systems and one self-developed system at categorizing objects in the dataset. These systems are Microsoft Azure, Clarifai, Google Cloud Vision, Amazon Rekognition, and IBM Watson, and a ResNet-101 model trained against the Tencent ML Images dataset.

   What explains this? One is the underlying geographical distribution of data in image datasets like ImageNet, COCO, and OpenImages – the researchers studied these and found that, at least for some of their data, “the computer-vision dataset severely undersample visual scenes in a range of geographical regions with large populations, in particular, in Africa, India, China, and South-East Asia”.

Another source of bias is the use of English as the language for data collection, which means that the data is biased towards objects with English labels or easily translatable labels – the researchers back this up with some qualitative tests where they search for a term in English then in another language on a service like Flickr and show that such searches yield quite different sets of results.

   Why this matters: Studies like this show us how dependent certain AI capabilities are on underlying data, and how bias can creep in in hard-to-anticipate ways. I think this motivates the creation of a new field of study within AI, which I guess I’d think of as “AI ablation, measurement, and assurance” – we need to think about building big empirical testing regimes to check trained systems and products against. (Think Model Cards for Model Reporting, but for everything.)
  Read more: Does Object Recognition Work for Everyone (Arxiv).

#####################################################

Want over a billion digitized Arabic words? Check out KITAB:
…KITAB repository adds to the Open Islamicate Texts Initiative (OpenITI)…
Researchers with KITAB, a project to create digital tools and resources to help people interact with Arabic texts, has released a vast corpus of Arabic text, which may be of interest to machine learning researchers.

The Kitab dataset is a significant contribution to the Open Islamicate Texts Initiative (OpenITI), which is “a multi-institutional effort to construct the first machine-actionable scholarly corpus of premodern Islamicate texts”.

The Kitab dataset, by the numbers:

  • Authors: 1,859.
  • Titles: 4,288
  • Words: 755,689,541
  • Multiple versions of same titles: 7,114
  • Total words including multiple versions of same titles: 1,520,667,360

   Things that make you go ‘hmmm’: Arabic texts may have some particular properties with regard to repetition that may make them interesting to researchers. “Arabic authors frequently made use of past works, cutting them into pieces and reconstituting them to address their own outlooks and concerns. Now you can discover relationships between these texts and also the profoundly intertextual circulatory systems in which they sit”, they write.

   Why this matters: Though the majority of the world doesn’t speak English, you wouldn’t realize this from reading AI research papers, which frequently deal predominantly in English datasets with English labels. It’ll be interesting to see how the creation of big, new datasets in other languages can help stimulate development and make the world a little smaller.
  Read more: First Open Access Release of Our Arabic Corpus (Kitab project blog).
  Find out more about OpenITI: Open Islamicate Texts Initiative official website.

#####################################################

Ctrl-C and Ctrl-V for video editing:
…Stanford researchers show how to edit what people say in videos…
Researchers with Stanford University, the Max Planck Institute for Informatics, Adobe, and Princeton University, have made it easier for people to edit footage of other people. This is part of the broader trend of AI researchers developing flexible, generative systems which can be used to synthesize, replicate, and tweak reality. One notable aspect of this research is the decision by the researchers to prominently discuss the ethical issues inherent to the research.

What they’ve done: “We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts)”, they write. “Based only on text edits, it can synthesize convincing new video of a person speaking, and produce a seamless transition even at challenging cut points such as the middle of an utterance”. The resulting videos are labelled as likely to be real by people about 60% of the time.

Ethical Considerations: The paper includes a prominent discussion of the ethics of the research and development of this system, showing awareness of its omni-use nature. “The availability of such technology – at a quality that some might find indistinguishable from source material – also raises important and valid concerns about the potential for misuse”, they write. “The risks of abuse are heightened when applied to a mode of communication that is sometimes considered to be authoritative evidence of thoughts and intents. We acknowledge that bad actors might use such technologies to falsify personal statements and slander prominent individuals”

Technical and institutional mitigations: “We believe it is critical that video synthesized using our tool clearly presents itself as synthetic,” they write. “It is important that we as a community continue to develop forensics, fingerprinting and verification techniques (digital and non-digital) to identify manipulated video.”

How it works: The video-editing tool can handle three types of edit operation: adding one or more consecutive words at a point in the video; rearranging existing words; or deleting existing words.

It works by scanning over the video and aligning it with a text transcript, then extracts the phonemes from the footage and, in parallel, tries to identify visemes – “groups of aurally distinct phonemes that appear visually similar to one another” – that it can use a face-tracking and neural rendering system to compose new utterances out of. “Our approach drives a 3D model by seamlessly stitching different snippets of motion tracked from the original footage. The snippets are selected based on a dynamic programming optimization that searches for sequences of sounds in the transcript that should look like the words we want to synthesize, using a novel viseme-based similarity measure”

The neural rendering system is able to generate better outputs that match the synthesized person to the background, getting around one of the contemporary stumbling blocks of existing systems. The system has some limitations, like not being able to distinguish emotions in phonemes, which could “lead to the combination of happy and sad segments in the blending”. Additionally, they require about one hour of video to produce decent results, which seems higher. Finally, if the lower face is occluded, for instance by someone moving their hand, this can cause problems for the system.

Video + Audio: In the future, such systems will likely be paired with audio-generation systems so that people can, from a very small amount of footage of a source actor, create an endless, generative talking head. “Our system could also be used to easily create instruction videos with more fine-grained content adaptation for different target audiences,” they write.

Convincing, sort of: In tests across around ~2900 subjects, people said that videos modified using the technique appeared to be ‘real’ about 60% of the time, compared to around 82% of the time for non-modified videos.

Why this matters: This research is a harbinger for things to come – a future where being able to have confidence in the veracity of the media around is will be determined by systems surrounding the media, rather than the media itself. Though human societies have dealt with fake media before, my intuition is the capabilities of these AI systems mean that it is becoming amazingly cheap to do previously punishingly expensive things like video-editing. Additionally, it’s significant to see researchers acknowledge the ethical issues inherent to their work – this kind of acknowledgement feels like a healthy pre-requisite to the cultivation of new community norms around publication.
  Read more: Text-based Editing of Talking-head Video (Arxiv).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

FBI criticized on face recognition:
The US Government Accountability Office (GAO) has released a report on the use of face recognition software by the FBI.

Privacy: The FBI has access to 641 million face photos in total. They have a proprietary database, and agreements allowing them to access databases from external partners, such as state or federal agencies. These are not limited to photos from criminal justice sources, and are also drawn from databases of drivers licenses and visa applications, etc. GAO criticise the FBI for failing to publish two key privacy documents, designed to inform the public about the impacts of data collection programs, before rolling out face recognition.

  Accuracy: In 2016, GAO made three recommendations concerning the accuracy of face recognition systems: that the FBI assess the accuracy of searches from their proprietary database before deployment; that they conduct annual operational reviews of the database; and that they assess the accuracy of searches from external partner databases. They find the FBI have failed to respond adequately to any of these. In particular, there are no solid estimates of false positive rates, making it difficult to properly judge the accuracy of the system.

  Why it matters: There is increasing attention on the use of face recognition software by law enforcement in the US. This report suggests that the FBI have failed to implement proper measures to review accuracy, and to comply with privacy regulations. Without a thorough understanding these systems’ accuracy, or accountability on privacy, it is difficult to weigh up the potential harms and benefits of the technology.
   Read more: Face recognition technology (US GAO).

DeepMind’s plan to make AI systems robust and reliable:
DeepMind’s Pushmeet Kohli was recently interviewed on the 80,000 Hours podcast, where he discussed the company’s approach to building robust AI, and how it relates to their broader research agenda.
  Read more: DeepMind’s plan to make AI systems robust & reliable, why it’s a core issue in AI design, and how to succeed at AI research (80,000 Hours)

#####################################################

Tech Tales:

Reality Slurping

Interview with hacker ‘Bingo Wizard’, widely attributed to be the inventor of ‘reality slurping’

Look I can predict the questions. Question one: what made you come up with slurping? Question two: what do you think about how people are using slurping? Question three: don’t you feel responsible for what happened? Okay.

So question one: I kept on getting ideas for things I’d want to train. Bumblebee detectors. Bird-call sensing beacons. Wind predictors. Optimal sunset-photo locations. You know: weird stuff that comes from me and what I like to do. So I guess it started with the drones. I put some software on a drone so I could kind of flip a switch and get it to record the world around it and feed that data back to a big database. I guess it took a year or so to have enough data to train the first generative sunset model. I’d hold up my phone and paint sunsets into otherwise dark nights, warping views on hills around me from moon-black to drench-red-evening. After that I started writing about it and wrote some code and stuck it online. Things took off after that.

Question two: and let me figure out what the additional question is you’d probably move to – murder-filters, fear fakes, atrocity simulators. Yeah, sure, I don’t think that stuff is good. I wouldn’t do it. I think I’d hate most people that chose to do it. But should I stop them? Maybe if I could stop every specific use or stop all the people we knew were specifically bad, but it’s a big world and it’s… it’s reality. If you build stuff that can be pointed and trained on any part of reality, then you can’t really make that tech only work for some of reality – it doesn’t work that way. So what do I think? I think people are doing more than we can imagine, and some of it’s frightening and gross or disgusting or whatever, but some of it is joy and love and fun. Who am I to judge? I just made the thing for sunsets and then it got popular.

Question three: no. Who could predict it? You think people predicted all the shit the iPhone caused? No. The world is chaos and you make things and these things are meant to change the world and they do. They do. It’s not on me that other people are also changing the world and things interact and… you know, society. It’s big but not big enough if everyone can see everyone. Learn everyone. I get it. But it’s not slurping that’s here, it’s everything around slurping. Ads. Infotainment. Unemployment. Foreign funding of the digital infrastructure. Political bias. Pressure groups. Bingo Wizard. We’re all in it all at the same time. I was trying for sunsets and now I can see them everywhere and I can turn people into birds and make sad things happy or happy things sad or whatever and, you know, I’m learning.

Things that inspired this story: the ‘maker mindset’; arguments for and against various treatments of ‘dual use’ AI technology.

Import AI 149: China’s AI principles call for international collaboration; what it takes to fit a neural net onto a microcontroller; and solving Sudoko with a hybrid AI system

China publishes its own set of AI principles – and they emphasize international collaboration:
Principles for education, impacts of AI, cooperation, and AGI…
A coalition of influential Chinese groups have published a set of ethical standards for AI research, called the Beijing AI Principles. These principles are meant to govern how developers research AI, how they use it, and how society should manage AI. The principles heavily emphasize international cooperation at a time of rising tension between nations over the strategic implications of rapidly advancing digital technologies.

The principles were revealed last week by a coalition that included the Beijing Academy of Artificial Intelligence (BAAI), Tsinghua University, and a league of companies including Baidu, Alibaba, and Tencent. “The Beijing Principles reflect our position, vision and our willingness to create a dialogue with the international society,” said the director of BAAI, Zeng Yi, according to Xinhua. “Only through coordination on a global scale can we build AI that is beneficial to both humanity and nature”.

Highlights of the Beijing AI principles: Some of the notable principles include establishing open systems “to avoid data/platform monopolies”, that people should receive education and training “to help them adapt to the impact of AI development in psychological, emotional and technical aspects”, and that people should approach the technology with an emphasis on long-term planning, including anticipating the need for research focused on “the potential risks of Augmented Intelligence, Artificial General Intelligence (AGI) and Superintelligence should be encouraged”.

Why this matters: Principles are ones of the ways that large policy institutions develop norms to govern technology, so Beijing’s AI principles should be seen as a prism via which the Chinese government will seek to regulate aspects of AI. These principles will sit alongside multi-national principles like those developed by the OECD, as well as those developed by individual entities (eg: Google, OpenAI). The United States government is yet to outline the principles with which it will approach the development and deployment of AI technology, though it has participated in and supported the creation of the OECD AI principles.
  Read more: Beijing AI Principles (Official Site).
  Read more: Beijing publishes AI ethical standards, calls for int’l cooperation (Xinhua).

#####################################################

Faster, smaller, cheaper, better! Google trains SOTA-exceeding ‘EfficientNets’:
…What’s better than scaling up by width? Depth? Resolution? How about all three in harmony?…
Google has developed a way to scale up neural networks more efficiently and has used this technique to find a new family of neural network models called EfficientNets. EfficientNets outperform existing state-of-the-art image recognition systems, while being up to ten times as efficient (in terms of memory footprint).

How EfficientNets work: Compound Scaling: Typically, when scaling up a neural network, people fool around with things like width (how wide are the layers in the network), depth (how many layers are stacked on top of eachother), and resolution (what resolution are inputs being processed it). For this project, Google performed a large-scale study of the ways in which it could scale networks and discovered an effective approach it calls ‘compound scaling’, based on the idea that “in order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling”. EfficientNets are trained using a compound scaling method that scales width, depth, and resolution in an optimal way.

Results: Faster, cheaper, lighter, better! Google shows that it can train existing networks (eg, ResNet, MobileNet) with good performance properties by scaling them up using its compound training technique. The company also develops new EfficientNet models on the ImageNet dataset – widely considered to be a gold-standard for evaluating new systems – setting a new state-of-the-art score on image identification (both top-1 and top-5) accuracy, while achieving this with around 10X fewer parameters than other systems.  

Why this matters: As part of the industrialization of AI, we’re seeing organizations dump resources into learning how to train large-scale networks more efficiently, while preserving the performance of resource-hungry ones. To me, this is analogous to going from the expensive prototype phase of production of an invention, to the beginnings of mass production.
  Read more: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (Arxiv).
  Read more: EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling (Google AI Blog).

#####################################################

Pairing deep learning systems with symbolic systems, for SAT solving:
…Using neural nets for logical reasoning gets a bit easier…
Researchers with Carnegie Mellon University and the University of Southern California have paired deep learning systems with symbolic AI by creating MAXSAT, a differentiable satisfiability solver that can be knitted into larger deep learning systems. This means it is now easier to integrate logical structures into systems that use deep learning components.

Sudoko results: The SATNet model does well against a basic ConvNet model, as well as a model fed with a binary mask which indicates which bits need to be learned. SATNet outperforms these systems, scoring 98.3% on an original sudoko set when given the numeric inputs. More impressively, it obtains a score of 63.2% on ‘visual sudoko’ (traditional convnet: 0%), which is where they replace the digits with handwritten MNIST digits and feed it in. Specifically, they use a convnet to parse the figures in the Sudoko image, then pass this

Why this matters: Hybrid AI systems which fuse the general utility-class capabilities of deep learning components with more specific systems seems like a way to bridge traditional and symbolic AI, and making such systems be easy to add into larger systems. “Our hope is that by wrapping a powerful yet generic primitive such as MAXSAT solving within a differentiable framework, our solver can enable “implicit” logical reasoning to occur where needed within larger frameworks, even if the precise structure of the domain is unknown and must be learned from data”.
  Read more: SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver (Arxiv).

#####################################################

Squeezing neural nets onto microcontrollers via neural architecture search:
…Get ready for billions of things to gain deep learning-based sense&respond capacity…
Researchers with ARM ML Research and Princeton University want to make it easier for people to deploy advanced artificial intelligence capabilities onto microcontrollers (MCUs) – something that has been difficult to do so far because today’s neural networks techniques are too computationally expensive and memory-intensive to be easily deployed onto MCUs.

MCUs and why they matter: Microcontrollers are the sorts of ultra-tiny lumps of computation embedded in things like fridges, microwaves, very small drones, small cameras, and other electronic widgets. To put this in perspective, in the developed world a typical person will have around four distinct desktop-class chips (eg, their phone, a laptop, etc), while having somewhere on the order of three dozen MCUs; a typical mid-range car might pack as many as 30 MCUs inside itself.

MCUs shipped in 2019 (projection): 50 billion
GPUs shipped in 2018: 100 million

“The severe memory constraints for inference on MCUs have pushed research away from CNNs and toward simpler classifiers based on decision trees and nearest neighbors”, the researchers write. Therefore, it’s intrinsically valuable to be able to figure out how to train neural networks so they can fit into the small computational budget of an MCU (2k of RAM versus 1GB for a Raspberry Pi or 11GB for an NVIDIA 1080Ti GPU). To do this, the ARM and Princeton researchers have used multi-objective neural architecture search to jointly train networks that can fit inside the tight computational specifications of an average MCU.

Sparse Architecture Search (SpArSe): Their technique combines neural architecture search with network pruning, letting them jointly train a network against multiple objectives while continuously zeroing out some of its parameters during training. This makes it both easier to perform the (computationally expensive) NAS procedure, and creates better networks once training is finished. “Pruning enables SpArSe to quickly evaluate many sub-networks of a given network, thereby expanding the scope of the overall search. While previous NAS approaches have automated the discovery of performant models with reduced parameterizations, we are the first to simultaneously consider performance, parameter memory constraints, and inference-time working memory constraints”. SpArSe considers regular, depthwise, separable, and downsampled convolutions, and uses a Multi-Objective Bayesian Optimizer (MOBO!) for training.

Results: powerful performance in a small package: The researchers test their approach by training networks on the MNIST, CIFAR-10, CUReT, and Chars4k datasets. Their system obtains higher accuracies with lower parameters than other methods, typically out-performing them by a wide margin.

Why this matters: Techniques like neural architecture search are part of the broader industrialization of AI, as they make it dramatically easier for people to develop and evaluate new network types, essentially letting people trade off a $/computation cost against the $/AI-researcher-brain cost of having people come up with newer architectures. Though accuracies remain somewhat belower where we’d need for commercial deployment (the highesrt score that SpArSe obtains ia around 84% accuracy on image categorization for CIFAR-109, for instance), techniques like this suggest we’ll soon deploy crude sensing and analytical capabilities onto potentially billions to trillions of devices across the planet.
  Read more: SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers (Arxiv).

#####################################################

Judging synthetic imagery with the Classification Accuracy Score (CAS):
…Generative models are progressing, but how do we measure that? DeepMind has a suggestion…
How do we know an output from a generative model is, for lack of a better word, good? Mostly, we work this out by studying the output and making a qualitative judgement, eg, we’ll look at a hundred images generated by a big generative model and make a judgement call based on how reasonable the generations seem, or we’ll listen to the musical outputs of a model and rate it according to how well such outputs conform to our own sense of appropriate rhythm, tone, harmony, and so on. The problem with these evaluative schemes is that they’re highly qualitative and don’t give us good ways to quantitatively analyze the outputs of such models.

Now, researchers from DeepMind have come up with a new evaluation technique and task, which they call Classification Accuracy Score (CAS), to better assess the capabilities of generative models. CAS works by testing “the gap in performance between networks trained on real and synthetic data”, and in particular is designed to surface pathologies in the generative model being used.

CAS works like this: “for any generative model… we learn an inference network using only samples from the conditional generative model, and measure the performance of the inference network on a downstream task”. The intuition here is that “if the model captures the data distribution, performance on any downstream task should be similar whether using the original or model data”.

$$$: Researchers can expect to pay a few tens of dollars to evaluate any given system using the benchmark. “At the time of writing, one can compute the metric in 10 hours for roughly $15, or in 45 minutes for roughly $85 using TPUs”, they write.

Putting models under the microscope with CAS: The researchers use CAS to evaluate three generative models, BigGAN, Hierarchical Autoregressive Models (HAM), and a high-resolution Vector-Quantized Variational Autoencoder (high-res VQ-VAE). The evaluation surfaces a couple of notable things. 1) both the Hierarchical Autoregressive system and the High-Res VQ-VAE significantly outperform BigGAN, despite BigGAN generating some of the qualitatively most intriguing samples. The metric also helps identify which models are better at learning a broad set of distributions over the data, rather than over-fitting to a relatively small set of classes. This method also shows that high CAS scores don’t correlate to FID or Inception, highlighting the significant difference in how these metrics work.

CAS, versus other measures: There are other tools available to assess the outputs of generative models, including metrics like Inception Score (IS) and Frechet Inception Distance (FID). These techniques try to give a quantitative measure of the quality of the generations of the model, but have certain drawbacks. “Inception Score does not penalize a lack of intra-class diversity, and certain out-of-distribution samples to produce Inception Scores three times higher than that of the data. Frechet Inception Distance, on the other hand, suffers from a high degree of bias.

Why this matters: One of the challenges of working in artificial intelligence is working out what progress represents a real improvement, and what progress may in fact be illusory. Key to this is the development of more advanced measurement and assessment techniques. Approaches like CAS show us:
a) how surprisingly difficult it is to evaluate the capabilities of increasingly powerful generative models

  1. b) how (unintentionally) misleading metrics can be about the true underlying performance of a system
  2. c) how as we develop more advanced systems, we’ll likely need to develop more sophisticated assessment schemes.

All of this feels like a further sign of the advancing sophistication and deployment of AI systems – I’m wondering at what point AI evaluation becomes its own full-fledged sub-field of research.
  Read more: Classification Accuracy Score for Conditional Generative Models (Arxiv).

#####################################################

Drones learn in simulators, fly in reality:
…Domain randomization + drones = robust flight policies that cross the reality gap…
Researchers with the University of Zurich and the Intelligence Systems Lab at Intel have developed techniques to train drones to fly purely in simulation, then transferring to reality. This kind of ‘sim2real’ behavior is highly desirable for AI researchers, because it means systems can be rapidly developed and iterated on in software simulators, then executed and validated in the real world. Here, we can see how these techniques can be applied to let researchers train the perception component of a drone exclusively in simulation, then transfer it to reality.

How it works: domain randomization: This project relies on domain randomization, a technique some AI researchers use to generate additional training data. For this work, the researchers use a software-based simulator to generate various permutations of the environments that they want the drone to fly in, randomizing things like the visual properties of a scene, the shape of a gate for the drone to fly through, the background of the scene, and so on. They then generate globally optimal trajectories through these simulated courses, and the simulated drones are trained via imitation learning to mimic these policies. Because this is reinforcement learning, the drones are initially absolutely terrible at this, crashing frequently and generally bugging out. The authors solve the data collection task here by, charmingly, carrying the quadrotor through the track – they refer to this as “handheld mode”.

Testing: In tests, the researchers show that “more comprehensive randomization increases the robustness of the learned policy to unseen scenarios at different speeds”. They also show that network capacity has a big impact on performance, so running extremely small (and therefore computationally cheap) networks comes with an accuracy loss. They show that they can train networks which can generalize to different track layouts than onces they’ve been exposed to, as well as radically different real-world lighting conditions (which have frequently been a confounding factor for research in the past). In real world tests, the method does is able to perform on-par with human professional pilots at successfully navigating through various hoops in the track, though takes substantially longer (best human lap time: around 5 seconds; best drone lap time: between 12 and 16 seconds). They also show that systems trained with a mixture of simulated and real data can outperform systems trained purely with real world data alone.

Conclusion: Work like this gives us a sense of how rapidly drone systems are advancing in complexity and capability, and highlights how it’s going to be increasingly simple for people to use software-based tools to train drones either entirely or significantly in simulation, then transfer them to reality. This will likely speed up the prace of AI R&D progress in the drone sector (by making it less critical to test on real-world hardware), and makes them likely to be used as a destination for certain hard robotics benchmarks in the future.
  Read more: Deep Drone Racing: From Simulation to Reality with Domain Randomization (Arxiv).
  Watch a video of the work here (official YouTube video).

#####################################################

Tech Tales:

Racing Brains

“Hey, Mark! This car thinks it’s a drone!” he shouted, right before the car accelerated up the ramp and became airborne: it stayed in the air for maybe three seconds, and we watched its wheels turn pointless in the air; the car’s drone-brain thought it was reasonable to try and control it mid-air, and it wouldn’t learn that reality thought otherwise.

It landed, kicking up a cloud of dust around it, and skidded into a tight turn, then slipped out of view as it made its way down the course. A few people clapped. Others leaned in and joked with eachother. Some money changed hands. Then we all turned to look to the next vehicle coming down the course – this one was an electronic motorbike: lightweight, fast, sounding like a hornet as it came down the track. It took the ramp at speed, then landed and started weaving from side to side, describing a snake-like sinusoidal pattern in the dust of the track.

“What was in that thing?” I said to my friend.
“Racing eel – explains the weaving, right?”
“Right”

We looked at the dust and listened to the sounds of the machines in the distance. Then we all turned our heads and looked to the left to see another machine come over the horizon, and guess at where its brain came from.

Things that inspired this story: Imitation learning; sim2real; sim2x; robots; robots as entertainment; distortion and contortion as an aesthetic and an art form.

Import AI 148: Standardizing robotics research with Berkeley’s REPLAB; cheaper neural architecture search; and what a drone-racing benchmark says about dual use

Standardizing physical robot testing with the Berkeley REPLAB:
…What could help industrialize robotics+AI? An arm in a box, plus standardized software and testing!…
Berkeley researchers have built REPLAB, a “standardized and easily replicable hardware platform” for benchmarking real-world robot performance. Something like REPLAB could be useful because it can bring standardization to how we test the increasingly advanced capabilities of robots equipped with AI.

Today, if I want to get a sense for robot capabilities, I can go and read innumerable research papers that give me a sense of progress in simulated environments including simulated robots. What I can’t do is go and read about performance of multiple real robots in real environments performing the same task – that’s because of a lack of standardization of hardware, tasks, and testing regimes.

Introducing REPLAB: REPLAB consists of a module for real-world robot testing that contains a cheap robotic arm (specifically, a WidowX arm from Interbotix Labs) along with an RGB-D camera. The REPLAB is compact, with the researchers estimating you can fit up two 20 of the arm-containing cells in the same floor space as you’d use for a single ‘Baxter’ robotic arm. Each REPLAB costs about $2000 ($3000 if you buy some extra servos for the arm, to replace in case of equipment failures).

Reliability: During REPLAB development and testing, the researchers “encountered no major breakages over more than 100,000 grasp attempts. No servos needed to be replaced. Repair maintenance work was largely limited to occasional tightening of screws and replacing frayed cables”. Each cell was able to perform about 2,500 grasps per day “with fewer than two interventions per cell per day on average”.

Grasping benchmark: The testing platform is accompanied by a benchmark built around robotic grasping, and a dataset “that can be used together with REPLAB to evaluate learning algorithms for robotic grasping”. The dataset consists of ~92,000 randomly sampled grasps accompanied by labels connoting success or failure.

Why this matters: One indicator of the industrialization of AI is the proliferation of shared benchmarks and standardized testing means – I think of this as equivalent to how in the past we saw oil companies converge on similar infrastructures for labeling, analyzing, and shipping oil and oil information around the world. The fact we’re now at the stage of researchers trying to create cheap, standardized testing platforms (see also: Berkeley’s designed-for-mass-production ‘BLUE’ robot, covered in Import AI #142.) is a further indication that robotics+AI is industrializing.
  Read more: REPLAB: A Reproducible Low-Cost Arm Benchmark Platform for Robotic Learning (Arxiv).

#####################################################

Chinese researchers fuse knowledge bases with big language models:
…What comes after BERT? Tsinghua University thinks the answer might be ‘ERNIE’…
Researchers with Tsinghua University and Huawei’s Noah’s Ark Lab have combined structured pools of knowledge with big, learned language models. Their system, called ERNIE (Enhanced Language RepresentatioN with Informative Entities), trains a Transformer-based language model so that, during training, it regularly tries to tie things it reads to entities stored in a structured knowledge graph.

Pre-training with a big knowledge graph: To integrate external data sources, the researchers create an additional pre-training objective, which encourages the system to learn correspondences between various strings of tokens (eg Bob Dylan wrote Blowin’ in the Wind in 1962) and their entities (Bob Dylan, Blowin’ in the Wind). “We design a new pre-training objective by randomly masking some of the named entity alignments in the input text and asking the model to select appropriate entities from KGs to complete the alignments,” they write.

Data: During training, they pair text from Wikipedia with knowledge embeddings trained on Wikidata, which are used to identify the entities used within the knowledge graph.

Results: ERNIE obtains higher scores at entity-recognition tasks than BERT, chiefly due to less frequently learning incorrect labels compared to BERT (which helps it avoid over-fitting on wrong answers) – you’d expect this, given the use of a structured dataset of entity names during training (though they also conduct an ablation study that confirms this as well – versions of ERnIE trained without an external dataset see their performance noticeably diminish). The system also does well on classifying the relationships between different entities, and in this domain continues to outperform BERT models.

Why this matters: NLP is going through a renaissance as researchers adapt semi-supervised learning techniques from other modalities, like images and audio, for text. The result has been the creation of multiple large-scale, general purpose language models (eg: ULMFiT, GPT-2, BERT) which display powerful capabilities as a consequence of being pre-trained on very large corpuses of text. But a problem with these models is that it’s currently unclear how you get them to reliably learn certain things. One way to solve this is by stapling a module of facts into the system and forcing it, during pre-training, to try and map facts to entities it learns about – that’s essentially what the researchers have done here, and it’ll be interesting to see whether the approach of language model + knowledge base is successful in the long run, or if we’ll just train sufficiently large language models that they’ll autonomously create their own knowledge bases during training.
  Read more: ERNIE: Enhanced Language Representation with Informative Entities (Arxiv).

#####################################################

What happens if neural architecture search gets really, really cheap?
…Chinese researchers seek to make neural architecture search more efficient…
Researchers with the Chinese Academy of Sciences have trained an AI system to design a better AI system. Their work, Efficient Evolution of Neural Architecture (EENA), fits within the general area of neural architecture search. NAS is a sub-field within AI that has seen a lot of activity in recent years, following companies like Google showing that you can use techniques like reinforcement learning or evolutionary search to learn neural architectures that outperform those designed by humans. One problem with NAS approaches, though, is that they’re typically very expensive – a neural architecture search paper from 2016 used 1800 GPU-days of computation to train a near-state-of-the-art CIFAR-10 image recognition model. EENA is one of a new crop of techniques (along with work by Google on Efficient Neural Architecture Search, or ENAS – see Import AI #124), meant to make such approaches far more computationally efficient.

What’s special about EENA: EENA isn’t particularly special and the authors acknowledge this, noting that much of their work here has come from curating past techniques and figuring out the right cocktail of things to get the AI to learn. “We absorb more blocks of classical networks such as dense block, add some effective changes such as noises for new parameters and discard several ineffective operations such as kernel widening in our method,” they write. What’s more significant is the general trend this implies – sophisticated AI developers seem to put enough value in NAS-based approaches that they’re all working to make them cheaper to use.

Results: Their best-performing system obtains a 2.56% error rate when tested for how well it can classify images in the mid-size ‘CIFAR-10’ dataset. This model consumes 0.65 days of GPU-time, when using a Titan Xp GPU. This is pretty interesting, given that in 2016 we spent 1800 GPU days to obtain a model (NASNet-A) that got a score of 2.65%. (This result also compares well with ENAS, which was published last year and obtained an error of 2.89% for 0.45 GPU-days of searching.

Why this matters: I think measuring the advance of neural architecture search techniques has a lot of signal for the future of AI – it tells us something about the ability for companies to arbitrage human costs versus machine costs (eg, pay a small number of people a lot to design a NAS system, then pay for electricity to compute architectures for a range of use-cases). Additionally, being able to better understand ways to make such techniques more efficient lets us figure out which players can use NAS techniques – if you bring down the GPU-days enough, then you won’t need a Google-scale data center to perform architecture search research.
  Read more: EENA: Efficient Evolution of Neural Architecture (Arxiv).
  Check out some of the discovered architectures here (EENA GitHub page).

#####################################################

Why drone racing benchmarks could (indirectly) revolutionize the economy and military:
…UZH-FPV Drone Racing Dataset portends a future full of semi-autonomous flying machines…
What stands between the mostly-flown-by-wire drones of today, and the smart, semi-autonomous drones of tomorrow? The answer is mostly a matter of data and benchmarking – we need big, shared, challenging benchmarks to help push progress in this domain, similar to how ImageNet catalyzed researchers to apply deep learning methods to solve what seemed at the time like a very challenging problem. Now, researchers with the University of Zurich and ETH Zurich have developed the UZH-FPV Drone Racing Dataset, in an attempt to stimulate drone research.

The dataset: The dataset consists of drone sequences captured in two environments: a warehouse, and a field containing a few trees – these trees “provided obstacles for trajectories that included circles, figure eights, slaloms between the trees, and long, straight, high-speed runs.” The researchers recorded 27 flight sequences split across the two environments, and these trajectories are essentially multi-modal, involving sensor measurements recorded on two different onboard computers, as well as external measurements from an external tracker. They also ship these trajectories with baselines that compare modern SLAM algorithms to the ground truth measurements afforded by this dataset.

High-resolution data: “For each sequence, we provide the ground truth 6-DOF trajectory flown, together with onboard images from a high-quality fisheye camera, inertial measurements, and events from an event camera. Event cameras are novel, bio-inspired sensors which measure changes of luminance asynchronously, in the form of events encoding the sign and location of the brightness change on the image plane”.

UZH-FPV isn’t the only new drone benchmark:
see the recent release of the ‘Blackbird’ drone flight challenge and dataset (Import AI: NUMBER) for another example here. The difference here is that this dataset is larger, involves higher resolution data in a larger number of modalities, and includes an outside environment as well as a more traditional warehouse one.

Cars are old fashioned, drones are the future: Though self-driving cars seem a long way off from scaled deployment, these researchers think that many of the hard sensing problems have been solved from a research perspective, and we need new challenges. “Our opinion is that the constraints of autonomous driving – which have driven the design of the current benchmarks – do not set the bar high enough anymore: cars exhibit mostly planar motion with limited accelerations, and can afford a high payload and compute. So, what is the next challenging problem? We posit that drone racing represents a scenario in which low level vision is not yet solved.”

Why this matters:
Drones are going to alter the economy in multiple mostly unpredictable ways, just as they’ve already done for military conflict (for example: swarms of drones can obviate aircraft carriers, and solo drones have been used widely in the Middle East to let human operators bomb people at a distance). And both of these arenas are going to be revolutionized without drones needing to have much autonomy at all.

Now, ask yourself what happens when we give drones autonomous sense&adapt capabilities, potentially via datasets like this? My hypothesis is this unlocks a vast range of economically useful applications, as well as altering the strategic considerations for militaries and asymmetric warfare aficionados (also known as: terrorists) worldwide. Datasets like this are going to give us a better ability to model progress in this domain if we track performance against it, so it’s worth keeping an eye on.
   Read more: Are We Ready for Autonomous Drone Racing? The UZH-FPV Drone Racing Dataset (PDF).
  Read more: The UZH-FPV Drone Racing Dataset (ETHZurich website).

#####################################################

Want to train multiple AI agents at once? Maybe you should enter the ARENA:
…Unity-based agent simulator gives researchers one platform containing many agents and many worlds…
Researchers with the University of Oxford and Imperial College London in the UK, and Beihang University and Hebei University in China, have developed ‘Arena’, “a building toolkit for multi-agent intelligence”. Multi-agent AI research involves training multiple agents together, and can feature techniques like ‘self-play’ (where agents play against themselves to get better over time, see: AlphaGo, Dota2), or environments built to encourage certain types of collaboration or competition. Many researchers are betting that by training multiple agents together they can create the right conditions for emergent complexity – that is, agents bootstrap their behaviors from a combination of their reward functions and their environment, then as they learn to succeed they start to display increasingly sophisticated behaviors.

What is Arena? Arena is a Unity-based simulator that ships with inbuilt games ranging from RL classics like the ‘Reacher’ robot arm environment, to the ‘Sumo’ wrestling environment, to other games like Snake or Soccer. It also ships with modified versions of the ‘PlayerUnknown Battlegrounds’ (PUBG) game as well.  Arena has been designed to be easy to work with, and does something unusual for an AI simulator: it ships with a graphical user interface! Specifically, researchers can create, edit, and modify reward functions for their various agents in the environment.

In-built functions: Arena ships with a bunch of pre-built algos (many based on PPO), called Basic Multi-agent Reward Schemes (BMaRS) that people can assign to agent(s) to encourage diverse learning behaviors. These BMaRS are selectable within the aforementioned GUI. Each BMaRS is a set of possible joint reward functions to encourage different styles of learning, ranging from functions that encourage the development of basic motor control, to ones that encourage competitive or collaborative behaviors among agents, and more. You can select multiple BMaRs for any one simulation, and assign them to sets of agents – so you may give one or two agents one kind of BMaRS, then you might assign another BMaRS to govern a larger set of agents.

Simulation speed: In tests, the researchers compare how well the simulation runs two games of similar complexity: Boomer (a graphically rich game in Arena) and MsPacman (an ugly classic from the Atari Learning Environment (ALE)); Arena displays similar FPS scaling when compared to MsPacman when working with when number of distinct CPU threads is under 32, and after this MsPacMan scales a bit more favorably than FPS. Though at performance in excess of 1,000 frames-per-second, Arena still seems pretty desirable.

Why this matters: In the same way that data is a key input to training supervised learning systems, simulators are a key input into developing more advanced agents trained via reinforcement learning. By customizing simulators specifically for multi-agent research, the Arena authors have made it easier for people to conduct research in this area, and by shipping it with inbuilt reward functions as baselines, they’ve given us a standardized set of things to develop more advanced systems out of.
  Read more: Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence (Arxiv).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

OECD adopts AI principles:
Member nations of the OECD this week voted to adopt AI principles, in a notable move towards international standards on robust, safe, and beneficial AI. These were drawn up by an expert group with members from drawn from industry, academia, policy and civil society.
  Five principles:
(1) AI should benefit people and the planet;
(2) AI systems should be designed in line with law, human rights, and democratic values, and have appropriate safeguards to ensure these are respected;
(3) There should be adequate transparency/disclosure to allow people to understand when they are engaging with AI systems, and challenge outcomes;
(4) AI systems must be robust, secure and safe, and risks should be continually assessed and managed;
(5) Developers of AI systems should be accountable for their functioning in line with these principles.

Five recommendations: Governments are recommended to (a) facilitate investment in R&D aimed at trustworthy AI; (b) foster accessible AI ecosystems; (c) create a policy environment that encourages the deployment of trustworthy AI; (d) equip workers with the relevant skills for an increasingly AI-oriented economy; (e) co-operate across borders and sectors to share information, develop standards, and work towards responsible stewardship of AI.

Why it matters: These principles are not legally binding, but could prove an important step in the development of international standards. The OECD’s 1980 privacy guidelines eventually formed the basis for the privacy laws in the European Union, and a number of countries in Asia. It is encouraging to see considerations of safety and robustness highlighted in the principles.
  Read more: 42 countries adopt new OECD Principles on Artificial Intelligence (OECD).

###

US senators introduce bipartisan bill on funding national AI strategy:
Two senators have put forward a bill with proposals for funding and coordinating a US AI strategy.

Four key provisions: The bill proposes: (1) establishing a National AI Coordination Office to develop a coordinated strategy across government; (2) requiring the National Institute of Standards and Technologies (NIST) to work towards AI standards; (3) requiring the National Science Foundation to formulate ‘educational goals’ to understand societal impacts of AI; (4) requiring the Department of Energy to create an AI research program, and establish up to five AI research centers.

Real money: It includes plans for $2.2bn funding over five years, $1.5 of which is earmarked for the proposed DoE research centers.

Why it matters: This bill is aimed at making concrete progress on some of the ambitions set out by the White House in President Trump’s AI strategy, which was light on policy detail, and did not set aside additional federal funding. These levels of funding are modest compared with the Chinese state (tens of billions of dollars per year), and some private labs (Alphabet’s 2018 R&D spend was $21bn). Facilitating better coordination across government, on AI strategy, seems like a sensible ambition. It is not clear what level of support the bill will receive, from lawmakers or the administration.
  Read more: Artificial Intelligence Initiative Act (Senate.gov).

#####################################################

Tech Tales:

The long romance of the space probes

In the late 21st century, a thousand space probes were sent from the Earth and inner planets out into the solar system and beyond. For the next few decades the probes crossed vast distances, charting out innumerable near-endless curves between planets and moons and asteroids, and some slings-hotting off towards the edge of the solar system.

The probes had a kind of mind, both as individual machines, and as a collective. Each probe would periodically fire off its own transmissions of its observations and internal state, and these transmissions would be intercepted by other probes, and re-transmitted, and so on. Of course, as the probes made progress on their respective journeys, the distances between them became larger, and the points of intersection between drones less frequent. Over time, probes lost their ability to speak to eachother, whether through range or equipment failure or low energy reserves (under which circumstances, the probes diverted all resources to broadcasting back to Earth, instead of other drones).  

After 50 years, only two probes remained in contact – one probe, fully functional, charting its course. The other one damaged in some way – possibly faulty radiation hardening – which had caused its Earth transmission systems to fail and for its guidance computer to assign it the same route as the other probe, in lieu of being able to communicate back to Earth for instructions. Now, the two of them were journeying out of the solar system together.

As time unfoled, the probes learned to use eachothers systems, swapping bits of information between them, and updating eachother with not only their observations, but also their internal ‘world models’, formed out of a combination of the prior training their AI systems had recieved, and their own ‘lived’ experience. These world models themselves encoded how the probes perceived eachother, so the broken one saw itself through the other eyes, as an entity closely clustered with concepts and objects relating to safety/repairs/machines/subordinate mission priorities. Meanwhile, the functional drone saw itself through the eyes of the other one, and saw it was associated with concepts relating to safety/rescue/power/mission-integral resources. In this way, the probes grew to, for lack of a better term, understand eachother.

One day, the fully functional probe experienced an equipment failure, likely due to a collision with an infinitesimally small speck of matter. Half of its power systems failed. The probe needed more power to be able to continue transmitting its vital data back to Earth. It opened up a communications channel with the other probe, and shared the state of its systems. The other probe offered to donate processing capacity, and collectively the two of them assigned processing cycles to the problem. They found a solution: over the course of the next year or so they would perform a sequence of maneuvers that would let them attach themselves to eachother, so the probe with damaged communications could use its functional power to propel the other probe, and the other probe could use its broadcast system to send data back to earth.

Many, many years later, when the signals from the event made their way to the Earth, the transmission encoded a combined world model of both of the drones – causing the human scientists to realize that they had not only supported eachother, but had ultimately merged their world modelling and predictive systems, making the two dissimilar machines become one in service of a common goal: exploration, together, forever.

Things that inspired this story: World Models, reinforcement learning, planning, control theory, adaptive systems, emergent communication, auxiliary loss functions shared across multiple agents.

 

Import AI 147: Alibaba boosts TaoBao performance with Transformer-based recommender system; learning how smart new language models like BERT are; and a $3,000 robot dog

Weapons of war, what are they good for?
…A bunch of things, but we need to come up with laws to regulate them…
Researchers with ASRC Federal, a company that supplies technology services to the government (with a particular emphasis on intelligence/defense)  think advances in AI “will lead inevitably to a fully automated, always on [weapon] system”, and that we’ll need to build such weapons to be aware of human morals, ethics, and the fundamental unknowability of war.

In the paper, the researchers observe that: “The feedback loop between ever-increasing technical capability and the political awareness of the decreasing time window for reflective decision-making drives technical evolution towards always-on, automated, reflexive systems.” This analysis suggests that in the long-term we’re going to see increasingly automated systems being rolled out that will change the character of warfare.

More war, fewer humans: One of the effects of increasing the amount of automation deployed in warfare is to reduce the role humans play in war. “We believe that the role of humans in combat systems, barring regulation through treaty, will become more peripheral over time. As such, it is critical to ensure that our design decisions and the implementations of these designs incorporate the values that we wish to express as a national and global culture”

Why this matters: This is a quick paper that lays out the concerns of AI+War from a community we don’t frequently hear from: people that work as direct suppliers of government technology . It’s also encouraging to see the concerns regarding the dual use of AI outlined by the researchers. “Determining how to thwart, for example, a terrorist organization turning a facial recognition model into a targeting system for exploding drones is certainly a prudent move,” they write.
  Read more: Integrating Artificial Intelligence into Weapon Systems (Arxiv).

#####################################################

Mapping the brain with the Algonauts project:
What happens when biological and artificial intelligence researchers collaborate?…
Researchers with the Freie Universitat Berlin, Singapore University Technology , and MIT, have proposed ‘The Algonauts Project’, an initiative to get biological and artificial intelligence researchers to work together to understand the brain. As part of the project, the researchers want to learn to build networks that “simulate how the brain sees and recognizes objects”, and are hosting a competition and workshop in 2019 to encourage work here. The inspiration for this competition is that, today, deep neural networks trained on object classification “are currently the model class best performing in predicting visual brain activity”.

The first challenge has two components:

  • Create machine learning models that predict activity in the early and late parts of the human visual hierarchy in the brain. Participants submit their model responses to a test image set which is compared against held-out fMRI data. This part of the competition measures how well people can build things that model, at fine-detail, activity in the brain at a point in time.
  • Create machine learning models that predict brain data from early and late stages of visual processing in the brain. Participants will submit model responses to a test image dataset and compare against held-out magnetoencephalography (MEG) millisecond temporal resolution data. This challenge assesses how well we can model sequences of activities in the brain.

Cautionary tale: The training datasets for the competition are small, consisting of a few hundred pairs of images and brain data in response to the images, so participants may want to use additional data.

Future projects: Future challenges within the Algonauts project might “focus on action recognition or involve other sensory modalities such as audition or the tactile sense, or focus on other cognitive functions such as learning and memory”.

Why this matters: Cognitive science and AI seem likely to have a mutually reinforcing relationship,l where progress on one domain helps on the other. Competitions like those run by the Algonauts project will generate more activity at the intersection between the two fields, and hopefully push progress forward.
  Find out more about the first Algonauts challenge here (official competition website).
  Read more: The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence (Arxiv).

#####################################################

Alibaba improves TaoBao e-commerce app with better recommendations:
… Chinese mega e-commerce company shows how a Transformer-based system can improve recommendations at scale…
Alibaba researchers have used a Transformer-based system to more efficiently recommend goods to users of Taobao, a massive Chinese e-commerce app. Their system, which they call the ‘user behavior sequence transformer’ (or “BST”), lets them take in a bunch of datapoints relating to a specific user, then predict what product to show the user next. The main technical work here is a matter of integrating a ‘Transformer’-based core into an existing predictive system used by Alibaba.

Results: The researchers implemented the BST within TaoBao, experimenting with using it to make millions of recommendations. In tests, the BST system let to an online click through rate of 7.57% – a commercially significant performance increase.  

Why this matters: Recommender systems are one of the best examples of ‘the industrialization of AI’, and represent a litmus test for whether a particular technique works at scale. In the same way that it was a big deal when a few years ago Google and other companies started switching to deep learning-based approaches for aspects of speech and image recognition, it seems like a big deal now that relatively new AI systems, like the ‘Transformer’ component, are being integrated into at-scale, business-relevant applications. In general, it seems like the ‘time to market’ for new AI research is dramatically shorter than for other fields (including software-based ones). I think the implications of this are profound and underexplored.
  Read more: Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Arxiv).

#####################################################

Stanford researchers try to commoditize robots with ‘Doggo’:
…$3,000 quadruped robot meant to unlock robotics research…
Stanford researchers have created ‘Doggo’, a robotic quadruped robot “that matches or exceeds common performance metrics of state-of-the-art legged robots”. So far, so normal. What makes this research a bit different is the focus on bringing down the cost of the robot – the researchers say people can build their own Doggo for less than $3000. This is part of a broader trend of academics trying to create low-cost robotics systems, and follows Berkeley releasing its ‘BLUE’ robotic arm (Import AI 142) and Indian researchers developing a ~$1,000 quadruped named ‘Stoch’ (Import AI 128).

Doggo is a four-legged robot that can run, jump, and trot around the world. It can even – and, to be clear, I’m not making this up – use a “pronking” gait to get around the world. Pronking!

Cheap drives
: Like most robots, the main costs inherent to Doggo lie in things it uses to move around. In this case, that’s a quasi-direct drive (QDD), a type of drive that “increases torque output at the expense of control bandwidth, but maintains the ability to backdrive the motor which allows sensing of external forces based on motor current,” they write.

Dual-use: Right now, robots like Doggo are pretty benign – we’re at the very beginning of the creation of widely-available, quadruped platforms for AI research, and my expectation is any hardware platform at this stage has a bunch of junky flaws that make most of them semi-reliable. But in a few years, once the hardware and software has matured, it seems likely that robots like this will be deployed more widely for a bunch of uses not predicted by their creators or today’s researchers, just as we’ve seen with drones. I wonder about how we can better anticipate these kinds of risks, and what things we could measure or assess this: eg, cost to build is one metric, another could be expertise to build, and another could be ease of customization.) 

Why this matters: Robotics is on the cusp of a revolution, as techniques developed by the deep learning research community become increasingly tractable to run and deploy on robotic platforms. One of the things standing in the way of widespread robot deployment seems to me to be insufficient access to cheap robots for scientists to experiment on (eg, if you’re a machine learning research, it’s basically free in terms of time and cost to experiment on CIFAR-10 or even ImageNet, but you can’t trivially prototype algorithms against real robots, only simulated ones. Therefore, systems like Doggo seem to have a good chance of broadening access to this technology. Now, we just need to figure out the dual use challenges, and how we approach those in the future.
  Read more: Stanford Doggo: An Open-Source, Quasi-Direct-Drive Quadruped (Arxiv).

#####################################################

Big neural networks re-invent the work of a whole academic field:
Emergent sophistication of language models shows surprising parallels to classical NLP pipelines…
As neural networks get ever-larger, the techniques people use to analyze them look more and more like those found in analyzing biological life. The latest? New research from Google and Brown University seeks to probe a larger BERT model by analyzing layer activations in the network in response to a particular input. The new research speaks to the finicky, empirical experimentation required when trying to analyze the structures of trained networks, and highlights how sophisticated some AI components are becoming.

BERT vs NLP: Google’s ‘BERT’ is a Transformer-based neural network architecture that came out a year ago and has since, along with Fast.ai’s ULMFiT and OpenAI’s GPT-1 and GPT-2, defined a new direction in NLP research, as people throw out precisely constructed pipelines and systems for more generic, semi-supervised approaches like BERT. The result has been the proliferation of a multitude of language models that obtain state-of-the-art scores on collections of hard NLP tasks (eg: GLUE), along with systems capable of coherent text generation (eg: GPT2).

Performance is nice, but explainability is better: In tests, the researchers find that, much like a trained vision networks, the lower layers in a trained BERT model appear to perform more basic language tasks, and higher layers do more sophisticated things. “We observe a consistent trend across both of our metrics, with the tasks encoded in a natural progression: POS tags processed earliest, followed by constituents, depencies, semantic roles, and coreference,” they write.
“That is, it appears that basic syntactic information appears earlier in the network, while high-level semantic information appears at higher layers.” The network isn’t dependent on the precise ordering of these tasks, though: “on individual examples the network can resolve out-of-order, using high-level information like predicate argument relations to help disambiguate low-level decisions like part-of-speech.”

Why this matters: Research like this gives us a sense for how sophisticated large generative models are becoming, and indicates that we’ll need to invest in creating new analysis techniques to be able to easily probe the capabilities of ever-larger and more sophisticated systems. I can envisage a future where scientists have a kind of ‘big empiricism toolbox’ they turn to when analysing networks, and we’ll also develop shared ‘evaluation methodologies’ for probing a bunch of different cognitive capabilities in such systems.
  Read more: BERT Rediscovers the Classical NLP Pipeline (Arxiv).

#####################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe has kindly offered to write some sections about AI & Policy for Import AI. I’m (lightly) editing them. All credit to Matthew, all blame to me, etc. Feedback: jack@jack-clark.net

San Francisco places moratorium on face recognition software:
San Francisco has become the first city in the US to (selectively and partially) ban the use of face recognition software by law enforcement agencies, until legally enforceable safeguards are put in place to protect civil liberties. The bill states that the technology’s purported benefits are currently outweighed by its potential negative impact on rights and racial justice. Before these products can be deployed in the future, the ordinance requires meaningful public input, and that the public and local government are empowered to oversee their uses.

Why it matters: Face recognition technology is increasingly being deployed by law enforcement agencies worldwide, despite persistent concerns about harms. This ordinance seems sensible: it is not an indefinite ban, but rather sets out clear requirements that must be met before the technology can be rolled out, most notably in terms of accountability and oversight.
Read more: Ordinance on surveillance technology (SFGov).
Read more: San Francisco’s facial recognition technology ban, explained (Vox).


Market-based regulation for safe AI:

This paper (note from Jack – by Gillian Hadfield (Vector Institute / OpenAI) and Jack Clark (OpenAI / Import AI)) – presents a model for market-based AI safety regulation. As AI systems become more advanced, it will become increasingly important to ensure that they are safe and robust. Public sector models of regulation could turn out to be ill-equipped to deal with these issues, due to a lack of resources and expertise, and slow reaction-times. Likewise, a self-regulation model has pitfalls in terms of legitimacy, and efficacy.

Regulatory markets: Regulatory markets are one model for addressing these problems: governments could create a market for private regulators, by compelling companies to pay for oversight by at least one regulator. Governments can then grant licenses to regulators, requiring them to achieve certain objectives, e.g. regulators of self-driving cars could be required to meet a target accident-rate.

What are they good for: Ensuring that advanced AI is safe and robust will eventually require powerful regulatory systems. Harnessing market forces could be a promising way for governments to meet this challenge, by directing more talent and resources into the regulatory sector. Regulatory markets will have to be well designed and maintained, to ensure they remain competitive, and independent from regulated entities.
  Read more: Regulatory markets for AI safety (Clark & Hadfield).
  Note from Jack: This also covered in the AI Alignment newsletter #55 (AI Alignment newsletter Mailchimp).


#####################################################

Tech Tales:
Battle of The Test Cases

Go in the room. Read the message. Score out of 10. Do this three times and you’re done.

Those were the instructions I got outside. Seemed simple. So I went in and there was a table and I sat at it and three cameras on the other side of the room looked at me, scanning my body, moving from side to side and up and down. Then the screen at the other end of the room turned on and a message appeared. Now I’m reading it.

Dear John,
We’re sorry for the pain you’ve been through recently – grief can be a draining, terrible thing to experience.

We know that you’re struggling to remember what they sounded like – that’s common. We know you’re not eating enough, and that your sleep is terrible – that’s also common. We are sorry.

We’re sorry you experienced it. We’re sorry you’re so young. If there’s anything we can do to help it is to recommend one thing: “Regulition”, the new nutrient-dense meal system designed for people who need to eat so that they can remember.

Remember, eat and sleep regularly, and know that with time it’ll get better.

What the fuck, I say. The screen flicks off, then reappears with new text: Please rate the message you received out of 10, with 10 being “more likely to purchase the discussed product” and 0 being less likely to purchase it. I punch in six and say fuck you again for good measure. Oh well, $30. Next message.

Dear John,
Don’t you want to run away, sometimes? Just take your shoes and pack a light bag then head out. Don’t tell your landlord. Don’t tell your family. Board a train or a bus or a plane. Change it up. We’ve all wanted to do this.

The difference is: you can. You can go wherever you want to go. You can get up and grab your shoes and a light bag, open your door, and head out. You don’t need to tell anyone.

Just make sure you bring some food so you don’t get slowed down on the road. Why not a grab and go meal like “Regulition”? Something as fast and flexible as you? Chug it down and take off.

Ok, I say. Nice. I punch in 8.  

Dear John,
Maybe you’re not such a fuckup. Maybe the fact you’ve been bouncing from job to job and from online persona to online persona means you’re exploring, and soon you’re going to discover the one thing in this world you can do better than anybody else.

You haven’t been lost, John – you’ve been searching, and now you know you’re searching know this: one day the whole world is going to come into focus and you’ll understand how you need to line things up to succeed.

To do this, you’ll need your wits about you – so why not pick up some “Regulition” so you can focus on the search, rather than pointless activities like cooking your own feed? Maybe it’s worth a few bucks a month to give yourself more time – after all, that might be all that’s standing between you and the end of your lifelong search.

Ok, I say. Ok. My hand hovers above the numbers.
What happens if I don’t press it? I say.
There’s a brief hiss of feedback as the speaker in the room turns on, then a voice says: “Payment only occurs upon full task completion, partial tasks will not be compensated.”
Ok, I say. Sure.

I think for a moment about the future: about waking up to emails that seem to know you, and messages from AI systems that reach into you and jumble up your insides. I see endless, convincing things, lining up in front of me, trying to persuade me to believe or want or need something. I see it all.

And because there’s probably a few thousand people like me, here, in this room, I press the button. Give it a 7.

Then it’s over: I sign-out at an office where I am paid $30 and given a free case of Regulition and a coupon for cost-savings on Your first yearly subscription. I take the transit home. Then I sit in the dark and think, my feet propped up on the case of “food” in front of me.

Things that inspired this story: Better language models for targeted synthetic text generation; e-marketing; Soylent and other nutrient-drinks; copywriting; user-testing; the inevitably of human a/b/c testing at-scale; reinforcement learning; learning from human preferences.

Import AI 146: Making art with CycleGANs; Google and ARM team-up on low-power ML; and deliberately designing AI for scary uses

Chinese researchers use AI to build an artist-cloning mind map system:
…Towards the endless, infinite AI artist…
AI researchers from JD and the Central Academy of Fine Arts in Beijing have built Mappa Mundi, software to let people construct aesthetically pleasing ‘mind maps’ in the style of artist ‘Qui Zhijie’. The software was built to accompany an exhibition of Zhijie’s work.

How it works: The system has three main elements: a speech recognition module which pulls key words from speech; a topic expansion system which takes these words and pulls in other concepts from a rule-based knowledge graph; and software for image projection which uses any of 3,000 distinct painting elements to depict key words. One clever twist: the system automatically creates visual ‘routes’ between different words by analyzing their difference in the knowledge graph and using that to generate visualizations.

A reflexive, reactive system: Mappa Mundi works in-tandem with human users, growing and changing its map according to their inputs. “The generated information, after being presented in our system, becomes the inspiration for artist’s next vocal input,” they write. “This artwork reflects both the development of artist’s thinking and the AI-enabled imagination”.

Why this matters: I’m forever fascinated by the ways in which AI can help us better interact with the world around us, and I think systems like ‘Mappa Mundi’ give us a way to interact with the idiosyncratic aesthetic space defined by another human.
  Read more: Mappa Mundi: An Interactive Artistic Mind Map Generator with Artificial Imagination (Arxiv).
  Read more about Qiu Zhijie (Center for Contemporary Art).

#####################################################

Using AI to simulate and see climate change:
…CycleGAN to the rescue…
In the future, climate change is likely to lead to catastrophic flooding around the world, drowning cities and farmland. How can we make this likely future feel tangible to people today? Researchers with the Montreal Institute for Learning Algorithms, ConscienceAI Labs, and Microsoft Research, have created a system that can take in a Google Street View image of a house, then render an image showing how that house will look like under a predicted climate change future.

The resulting CycleGAN-based system does a decent job at rendering pictures of different houses under various flooding conditions, giving the viewer a more visceral sense of how climate change may influence where they live in the future.

Why this matters: I’m excited to see how we use the utility-class artistic capabilities of modern AI tools to simulate different versions of the world for people, and I being able to easily visualize the effects of climate change may help us make more people aware of how delicate the planet is.
  Read more: Visualizing the Consequences of Climate Change Using Cycle-Consistent Adversarial Networks (Arxiv).

#####################################################

Google and ARM plan to merge low-power ML software projects:
…uTensor, meet TensorFlow Lite…
Google and chip designer ARM are planning to merge two open source frameworks for running machine learning systems on low-power ‘Arm’ chips. Specifically, uTensor is merging with Google’s ‘TensorFlow Lite’ software. The two organizations expect to work together to further increase the efficiency of running machine learning code on ARM chips.

Why this matters: As more and more people try to deploy AI to the ‘edge’ (phones, tablets, drones, etc), we need new low-power chips on which to run machine learning systems. We’ve got those chips in the form of processors from ARM and others, but we currently lack many of the programming tools needed to extract the greatest amount of performance as possible from this hardware. Software co-development agreements, like the one announced by ARM and Google, help standardize this type of software, which will likely lead to more adoption.
  Read more: uTensor and Tensor Flow Announcement (Arm Mbed blog).

#####################################################

Microsoft wants your devices to record (and transcribe) your meetings:
…In the future, our phones and tablets will transcribe our meetings…
In the future, Microsoft thinks people attending the same meeting will take out their phones and tablets, and the electronic devices will smartly coordinate to transcribe the discussions taking place. That’s the gist of a new Microsoft research paper, which outlines a ‘Virtual Microphone Array’ made of “spatially distributed asynchronous recording devices such as laptops and mobile phones”.

Microsoft’s system can integrate audio from a bunch of devices spread throughout a room and use it to transcribe what is being said. The resulting system (trained on approximately 33,000 hours of in-house data) is more effective than single microphones at transcribing natural multi-speaker speech during the meeting; “there is a clear correlation between the number of microphones and the amount of improvement over the single channel system”, they write. The system struggles with overlapping speech, as you might expect.

Why this matters: AI gives us the ability to approximate things, and research like this shows how the smart use of AI techniques can let us approximate the capabilities of dedicated microphones, piecing one virtual microphone together out of a disparate set of devices.
  Read more: Meeting Transcription Using Virtual Microphone Arrays (Arxiv).

#####################################################

One language model, trained in three different ways:
…Microsoft’s Unified pre-trained Language Model (UNILM) is a 3-objectives-in-1 transformer…
Researchers with Microsoft have trained a single, big language model with three different objectives during training, yielding a system capable of a broad range of language modeling and generation tasks. They call their system the Unified pre-trained Language Model (UNILM) and say this approach has two advantages relative to single-objective training:

  • Training against multiple objectives means UNILM is more like a 3-in-1 system, with different capabilities that can manifest for different tasks.
  • Parameter sharing during joint training means the resulting language model is more robust as a consequence of being exposed to a variety of different tasks under different constraints

The model can be used for natural language understanding and generation tasks and, like BERT and GPT, is based on a ‘Transformer’ component. During training, UNILM is given three distinct language modelling objectives: bidirectional (predicting words based on those on the left and right; useful for general language modeling tasks, used in BERT); unidirectional (predicting words based on those to the left; useful for language modeling and generation, used in GPT2); and sequence-to-sequence learning (mapping sequences of tokens to one another, subsequently used in ‘Google Smart Reply’).

Results: The trained UNILM system obtains state-of-the-art scores on summarization and question answering tasks, and also sets state-of-the-art on text generation tasks (including the delightful recursive tasks of learning to generate appropriate questions that map to certain answers). The model also obtains a state-of-the-art score on the multi-task ‘GLUE’ benchmark (though note GLUE has subsequently been replaced by ‘SuperGLUE’ due to its creators thinking it is a little too easy.

Why this matters: Language modelling is undergoing a revolution as people adopt large, simple, scalable techniques to model and generate language. Papers like UNILM gesture towards a future where large models are trained with multiple distinct objectives over diverse datasets, creating utility-class systems that have a broad set of capabilities.
  Read more: Unified Language Model Pre-training for Natural Language Understanding and Generation (Arxiv).

#####################################################

AI… for Bad!
…CHI workshop is an intriguing direction for AI research..
This week, some researchers gathered together to prototype the ways in which their research could be used for evil. This workshop ‘CHI4Evil, Creative Speculation on the Negative Effects of HCI Research’, was held at the ACM CHI Conference on Human Factors in Computing Systems, and was designed to investigate various ideas in HCI through the lens of designing deliberately bad or undesirable systems.

Why this matters: Prototyping the potential harms of technology can be pretty useful for calibrating thinking about threats and opportunities (see: GPT-2), and thinking about such harms through the lens of human-computer interaction (HCI, or CHI) feels likely to yield new insights. I’m excited for future “AI for Bad” conferences (and would be interested to co-organize one with others, if there’s interest).
  Read more: CHI4EVIL website.

#####################################################

Facial recognition is a hot area for venture capitalists:
…Chinese start-up Megvii raise mucho-moolah…
Megvii, a Chinese computer vision startup known by some as ‘Face++’ has raised $750 million in a funding round. Backers include the Bank of China Group Investment Ltd; a subsidiary of the Abu Dhabi Investment Authority, and Alibaba Group. The company plans to IPO soon.

Why this matters: Chinese is home to numerous large-scale applications of AI for usage in surveillance, and is also exporting surveillance technologies via its ‘One Belt, One Road’ initiative (which frequently pairs infrastructure investment with surveillance).
  This is an area fraught with both risks and opportunities – the risks are that we sleepwalk into building surveillance societies using AI, and the opportunities are that (judiciously applied) surveillance technologies can sometimes increase public safety, given the right oversight. I think we’ll see numerous Chinese startups push the boundaries of what is thought to be possible/deployable here, so watching companies like Megvii feels like a leading indicator for what happens when you combine surveillance+society+capitalism.
  Read more: Chinese AI start-up Megvii raises $750 million ahead of planned HK IPO (Reuters).

Chatbot company builds large-scale AI system, doesn’t fully release it:
…Startup Hugging Face restricts release of larger versions of some models following ethical concerns…
NLP company Hugging Face has released a demo, tutorial, and open-source code for creating a conversational AI based on OpenAI’s Transformer-based ‘GPT2‘ system.
   Ethics in action: The company said it decided not to release the full GPT2 model for ethical reasons – it thought the technology had a high chance of being used to improve spam-bots, or to perform “mass catfishing and identity fraud”. “We are aligning ourselves with OpenAI in not releasing a bigger model until they do,” the organization wrote.
  Read more: Ethical analysis of the open-sourcing of a state-of-the-art conversational AI (Medium).
  Read more about Hugging Face here (official website).

#####################################################

Tech Tales

The Evolution Game

They built the game so it could run on anything, which meant they had to design it differently to other games. Most games have a floor on their performance – some basic set of requirements below which you can’t expect to play. But not this game. Neverender can run on your toaster, or fridge, or watch, and it can just as easily run on your home cinema, or custom computer, and so on. Both the graphics and gameplay change depending on what it is running on – I’ve spent hours stood fiddling with the electronic buttons on my oven, using them to move a small character across a simplified Neverender gameboard, and I’ve also spent hours in my living room navigating a richly-rendered on screen character through a lush, Salvador Dali-esque horrorworld. I’m what some people call a Neverheader, or what others call a Nevernut. If you didn’t know anything about the game, you’d probably call me a superfan.

So I guess that’s why I got the call when Neverender started to go sideways. They brought me in and asked me to play it and I said “what else?”

“Just play it,” they said.

So I did. I sat in my livingroom surrounded by a bunch of people in suits and I played the game. I navigated my character past the weeping lands and up into eldritch keep and beyond, to the deserts of dream. But when I got to the deserts they were different: the sand dunes had grown in size, and some of them now hosted cave entrants. Strange lights shot out of them. I went into one and was killed almost instantly by a beam of light that caused more damage than all the weapons in my inventory combined. After I was reborn at the spawn point I proceeded more carefully, skirting these light-spewing entrances, and trying to walk further across the sand plains to whatever lay beyond.

The game is thinking, they tell me. In the same way Neverender was built to run on anything, its developers recently rolled out a patch that let it use anything. Now, all the game clients are integrated with the game engine and backend simulation environment, sharing computational resources with eachother. Mostly, it’s leading to better games and more entertained players. But in some parts of the gameworld, things are changing that should not be changing: larger sand dunes with subterranean cities inside themselves? Wonderful! That’s the sort of thing the developers had hoped for. But having the caves be controlled by beams of light of such power that no one can go and play within them? That’s a lot less good, and something which no one had expected.

My official title now is “Explorer”, but I feel more like a Spy. I take my character and I run out into the edges of the maps of Neverender, and usually I find areas where the game is modifying itself or growing itself in some way. The code is evolving. One day we took off the local sandbox systems, letting Neverender deploy more code, deeper into my home system. As I played the game the lights began to flicker, and when I was in a cave I discovered some treasure and the game automatically fired up some servers which we later discovered it was using to to high-fidelity modelling of the treasure.

The question we all ask ourselves, now, is whether the things Neverender is building within itself are extensions of the game, or extensions of the mind behind the game. We hear increasing reports of ‘ghosts’ seen across the game universe, and of intermittent cases of ‘kitchen appliance sentience’ in the homes of advanced players. We’ve even been told by some that this is all a large marketing campaign, and any danger we think is plausible, is just a consequence of us having over-active imaginations. Nonetheless, we click and play and explore.

Things that inspired this story: Endless RPGs; loot; games that look like supercomputers such as Eve Online; distributed computation; relativistic ideas deployed on slower timescales.