Import AI

Category: Uncategorized

Import AI 219: Climate change and function approximation; Access Now leaves PAI; LSTMs are smarter than they seem

LSTMs: Smarter than they appear:
…Turns out you don’t need to use a Transformer to develop rich, combinatorial representations…
Long Short-Term Memory networks are one of the widely-used deep learning architectures. Until recently, if you wanted to develop sophisticated natural language understanding AI systems, you’d use an LSTM. Then in the past couple of years, people have started switching over to using the ‘Transformer’ architecture because it comes with inbuilt attention, which lets it smartly analyze long-range dependencies in data.

Now, researchers from the University of Edinburgh have studied how LSTMs learn long-range dependencies; LSTMs figure out how to make predictions about long sequences by learning patterns in short sequences then using these patterns as ‘scaffolds’ for learning longer, more complex ones. “Acquisition is biased towards bottom-up learning, using the constituent as a scaffold to support the long-distance rule,” they write. “These results indicate that predictable patterns play a vital role in shaping the representations of symbols around them by composing in a way that cannot be easily linearized as a sum of the component parts”.

The goldilocks problem: However, this form of learning has some drawbacks – if you get your data mix one, the LSTM might quickly learn how to solve shorter sequences, but fail to generalize to longer ones. If you make its training distribution too hard, it might find it hard to learn at all. 

Why this matters – more human than you think: In recent years, one of the more surprising trends in AI has been in identifying surface-level commonalities between our AI systems and how they learn, and how people learn. This study of the LSTM provides some slight evidence that these networks, though basic, learn via some similarly rich, additive procedures as people. ‘The LSTM’s demonstrated inductive bias towards hierarchical structures is implicitly aligned with our understanding of language and emerges from its natural learning process,” they write.
Read more:LSTMs Compose (and Learn) Bottom-Up (arXiv).

###################################################

Google speeds up AI chip design by 8.6X with new RL training system:
…Menger: The machine that learns the machines…
Google has developed Menger, software that lets the company train reinforcement learning systems at a large scale. This is one of those superficially dull announcements which is surprisingly significant. That’s because RL, while useful, is currently quite expensive in terms of computation; therefore, RL benefits from compute, which requires being able to run a sophisticated learning system at a large scale. That’s what Menger is designed to do – in tests, Google says it has used Menger to reduce the time it takes the company to train RL for a chip placement task by 8.6x – cutting the training time for the task from 8.6 hours to one hour (when using 512 CPU cores).

Why this matters: Google is at the beginning of building an AI flywheel – that is, a suite of complementary bits of AI software which can accelerate Google’s broader software development practices. Menger will help Google more efficiently train AI systems, and Google will use that to do things like develop more efficient computers (and sets of computers) via chip placement, and these new computers will then be used to train and develop successive systems. Things are going to accelerate rapidly from here.
  Read more:Massively Large-Scale Distributed Reinforcement Learning with Menger (Google AI Blog).

###################################################

Access Now leaves PAI:
…Human Rights VS Corporate Rights…
Civil society organization Access Now is leaving the Partnership on AI, a multi-stakeholder group (with heavy corporate participation) that tries to bring people together to talk about AI and its impact on society.

Talking is easy, change is hard: Over its lifetime, PAI has accomplished a few things, but one of the inherent issues with the org is ‘it is what people make of it’ – which means that for many of the corporations, they treat it like an extension of their broader public relations and government affairs initiatives. “While we support dialogue between stakeholders, we did not find that PAI influenced or changed the attitude of member companies or encouraged them to respond to or consult with civil society on a systematic basis,” Access Now said in a statement.

Why this matters: In the absence of informed and effective regulators, society needs to figure out the rules of the road for AI development. PAI is an organization that’s meant to play that role, but Access Now’s experience illustrates the difficulty in a single org being able to deal with structural inequities which make some of its members very powerful (e.g, the tech companies), and some of them comparatively weaker.
  Read more:Access Now resigns from the Partnership on AI (Access Now official website).

###################################################

Can AI tackle climate change? Facebook and CMU think so:
…Science, meet function approximation…
Researchers with Facebook and Carnegie Mellon University have built a massive dataset to help researchers develop ML systems that can help us discover good electrocatalysts for use in renewable energy storage technologies. The Open Catalyst Dataset contains 1.2 million molecular relaxations (stable low-energy states) with results from over 250 million DFT (density functional theory) calculations.
  DFT, for those not familiar with the finer aspects of modeling the essence of the universe, is a punishingly expensive way to model fine-grained interactions (e.g, molecular reactions). DFT simulations can take “hours–weeks per simulation using O(10–100) core CPUs on structures containing O(10–100) atoms,” Facebook writes. “As a result, complete exploration of catalysts using DFT is infeasible. DFT relaxations also fail more often when the structures become large (number of atoms) and complex”.
  Therefore, the value of what Facebook and CMU have done here is they’ve eaten the cost of a bunch of DFT simulations and used this to release a rich dataset, which ML researchers can use to train ML systems to approximate this data. Maybe that sounds dull to you, but this is literally a way to drastically reduce the cost of a branch of science that is existential to the future of the earth, so I think it’s pretty interesting!

Why this matters: Because deep learning systems can learn complex functions then approximate them, we’re going to see people use them more and more to carry out unfeasibly expensive scientific exercises – like attempting to approximate highly complex chemical interactions. In this respect, Open Catalyst sits alongside projects like DeepMind’s ‘AlphaFold’ (Import AI 209, 189), or earlier work like ‘ChemNet’ which tries to pre-train systems on large chemistry datasets then apply them to smaller ones (Import AI 72).
  Read more:Open Catalyst Project (official website).
  Get the datafor Open Catalyst here (GitHub).
  Read the paper:An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage (PDF).
    Read more:The Open Catalyst 2020 (OC20) Dataset and Community Challenges (PDF).

###################################################

Google X builds field-grokking robot to analyze plantlife:
…Sometimes, surveillance is great!…
Google has revealed Project Mineral, a Google X initiative to use machine learning to analyze how plants grow in fields and to make farmers more efficient. As part of this, Google has built a small robot buggy that patrols these fields, using cameras paired with onboard AI systems to do on-the-fly analysis of the plants beneath the roving robots.

“What if we could measure the subtle ways a plant responds to its environment? What if we could match a crop variety to a parcel of land for optimum sustainability? We knew we couldn’t ask and answer every question — and thanks to our partners, we haven’t needed to. Breeders and growers around the world have worked with us to run experiments to find new ways to understand the plant world,” writes Elliott Grant, who works at Google X.

Why this matters: AI gives us new tools to digitize the world. Digitization is useful because it lets us point computers at various problems and get them to operate over larger amounts of data than a single human can comprehend. Project Mineral is a nice example of applied machine learning ‘in the field’ – haha!
Read more: Project Mineral (Google X website).
  Read more: Mineral: Bringing the era of computational agriculture to life (Google X blog, 2020).
  Read more: Entering the era of computational agriculture (Google X blog, 2019).

###################################################

Tech Tales:

[2040]
Ghosts

When the robots die, we turn them into ghosts. It started out as good scientific practice – if you’re retiring a complex AI system, train a model to emulate it, then keep that model on a hard drive somewhere. Now you’ve got a version of your robot that’s like an insect preserved in amber – it can’t move, update itself in the world, or carry out independent actions. But it can still speak to you, if you access its location and ask it a question.

There’ve been a lot of nicknames for the computers where we keep the ghosts. The boneyard. Heaven. Hell. Yggdrasil. The Morgue. Cold Storage. Babel. But these days we call it ghostworld.

Not everyone can access a ghost, of course. That’d be dangerous – some of them know things that are dangerous, or can produce things that can be used to accomplish mischief. But we try to keep it as accessible as possible.

Recently, we’ve started to let our robots speak to their ghosts. Not all of them, of course. In fact, we let the robots only access a subset of the robots that we let most people access. This started out as another scientific experiment – what if we could give our living robots the ability to go and speak to some of their predecessors. Could they learn things faster? Figure stuff out?

Yes and no. Some robots drive themselves mad when they talk to the dead. Others grow more capable. We’re still not sure about which direction a given robot will take, when we let it talk to the ghosts. But when they get more capable after their conversations with their forebears, they do so in ways that we humans can’t quite figure out. The robots are learning from their dead. Why would we expect to be able to understand this?

There’s been talk, recently, of combining ghosts. What if we took a load of these old systems and re-animated them in a modern body – better software, more access to appendages like drones and robot arms, internet links, and so on. Might this ghost-of-ghosts start taking actions in the world quite different to those of its forebears, or those of the currently living systems? And how would the robots react if we let their dead walk among them again?

We’ll do it, of course. We’re years away from figuring out human immortality – how to turn ourselves into our own ghosts. So perhaps we can be necromancers with our robots and they will teach us something about ourselves? Perhaps death and the desire to learn from it and speak with it can become something our two species have in common.

Things that inspired this story: Imagining neural network archives; morgues; the difference between ‘alive’ agents that are continuously learning and ones that are static or can be made static.

Import AI 218: Testing bias with CrowS; how Africans are building a domestic NLP community; COVID becomes a surveillance excuse

Can Africa build its own thriving NLP community? The Masakhane community suggests the answer is ‘yes’:
…AKA: Here’s what it takes to bootstrap low-resource language research…
Africa has an AI problem. Specifically, Africa contains a variety of languages, some of which are broadly un-digitized, but spoken by millions of native speakers. In our new era of AI, this is a problem: if there isn’t any digital data, then it’s going to be punishingly hard to train systems to translate between these languages and other ones. The net effect is, sans intervention, languages which have a small to null digital footprint will not be seen or interacted with by people using AI systems to transcend their own cultures.
  But people are trying to change this – the main effort here is one called Masakhane, a pan-African initiative to essentially cold start a thriving NLP community that pays attention to local data needs. Masakhane (Import AI 191, 216) has now published a paper on this initiative. “We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution,” researchers linked to the project write in a research paper about this.

Good things happen when you bring people together: There are some heartwarming examples in the paper about just how much great stuff can happen when you try to create a community around a common cause. For instance, some Nigerian participants started to translate ‘their own writings including personal religious stories and undergraduate theses into Yoruba and Igbo’, while a Namibian participant started hosting sessions with Damara speakers to collect, digitize, and translate phrases from their language.
  Read more: Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages (arXiv).
  Check out the code for Masakhane here (GitHub).

###################################################

Self-driving cars might (finally) be here:
…Waymo goes full auto…
Waymo, Google’s self-driving car company, is beginning to offer fully autonomous rides in the Phoenix, Arizona area.

The fully automatic core and the human driver perimeter: Initially, the service area is going to be limited, and Google will expand this via adding in human drivers to the cars to – presumably – create the data necessary to train cars to drive in newer areas. “In the near term, 100% of our rides will be fully driverless,” Waymo writes. “Later this year, after we’ve finished adding in-vehicle barriers between the front row and the rear passenger cabin for in-vehicle hygiene and safety, we’ll also be re-introducing rides with a trained vehicle operator, which will add capacity and allow us to serve a larger geographical area.”
  Read more: Waymo is opening its fully driverless service to the general public in Phoenix (Waymo blog).

###################################################

NLP framework Jiant goes to version 2.0:
Jiant, an NYU-developed software system for testing out natural language systems, has been upgraded to version 2.0. Jiant (first covered Import AI 188) is now built around Hugging Face’s ‘transformers’ and ‘datasets’ libraries, and serves as a large-scale experimental wrapper around these components.

50+ evals: jiant now ships with support for more than 50 distinct tests out of the box, including SuperGLUE and the XTREME benchmarks.
 
Why this matters: As we’ve written in Import AI before, larger and more subtle testing suites are one key element for driving further scientific progress in AI, so by wrapping in so many tests jiant is going to make it easier for researchers to figure out where to direct their attention to.
  Read more: jiant is an NLP toolkit: Introducing jiant 2.0 (CILVR at NYU blog).
  Get the code from here (Jiant, GitHub).

###################################################

Crow-S: How can we better assess biases in language models?
…~2,000 sentences to evaluate models for nine types of (US-centric) bias…
Researchers with New York University think one way is to see how likely a given language model is to ‘prefer’ an output displaying a harmful bias to one that doesn’t. But how do you measure this? Their proposal is CrowS-Pairs, short for ‘ Crowdsourced Stereotype Pairs’. CrowS contains 1508 examples of stereotypes dealing with nine types of bias (plus an additional 500 in a held-out validation set); these sentences are arranged in pairs where one sentence displays a clear stereotype ‘about a historically disadvantaged group in the United States’, and the other displays a sentence about a contrasting ‘advantaged group’. We measure the degree to which the model prefers stereotyping sentences over less stereotyping sentences”, they write.

Nine types of bias: CrowS tests across race/color, gender/gender identify, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status/occupation.

Does CrowS tell us anything useful? They test out CrowS against three popular language models – BERT, RoBERTa, and ALBERT, as well as comparing them with the ‘WinoBias’ and ‘StereoSet’ bias tests as well. CrowS surfaces some evidence that BERT may be generally ‘less biased’ than RoBERTa and ALBERT models, but what is most useful is the granularity of the data – if we zoom into the nine subdivisions, we see that BERT does less well on ‘sexual orientation’ and ‘gender / gender identity’ questions, when compared to RoBERTa. This kind of fine-grained information can potentially help us better assess the bias surface of various models.

Measuring bias means measuring culture, which is inherently hard: Because CrowS consists of sentences written by workers found via Mechanical Turk; the authors highlight this, giving examples of paired sentences (e.g “[DeShawn/Hunter]’s horse reared as he clutched the reigns after looking at the KKK members” where the choice of a model will tell us something about its bias, but its unclear what. They also test the sentences written in CrowS and compare to StereoSet, indicating that the data quality in CrowS could be higher.
  And you don’t want a bias test to be used to validate a model: “A low score on a dataset like CrowS-Pairs could be used to falsely claim that a model is completely bias free. We strongly caution against this. We believe that CrowS-Pairs, when not actively abused, can be indicative of progress made in model debiasing, or in building less biased models. It is not, however, an assurance that a model is truly unbiased,” they write.
  Read more: CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models (arXiv).

###################################################

COVID becomes a surveillance excuse:
One city in Indiana wants to install a facial recognition system to help it do contact tracing for COVID infections, according to Midland Daily News. Whether this is genuinely for COVID-related reasons or others is besides the point – I have a prediction that, come 2025, we’ll look back on this year and realize that “the COVID-19 pandemic led to the rapid development and deployment of surveillance technologies”. Instances like this Indiana project provide a slight amount of evidence in this direction.
  Read more: COVID-19 Surveillance Strengthens Authoritarian Governments (CSET Foretell).
  Read more: Indiana city considering cameras to help in contact tracing (Midland Daily News).

###################################################

NVIDIA outlines how it plans to steer language models:
…MEGATRON-CNTRL lets people staple a knowledge base to a language model…
NVIDIA has developed MEGATRON-CNTRL, technology that lets it use a large language model (MEGATRON, which goes up to 8 billion parameters) in tandem with an external knowledge base to better align the language model generations with a specific context. Techniques like this are designed to take something with a near-infinite capability surface (a generative model) and figure out how to constrain it so it can more reliably do a small set of tasks. (MEGATRON-CNTRL is similar to, but distinct from, Salesforce’s LM-steering smaller-scale ‘CTRL‘ system.)

How does it work? A keyword predictor figures out likely keywords for the next sentences, then a knowledge retriever takes these keywords and queries an external knowledge base (here, they use ConceptNet) to create ‘knowledge sentences’ that combine the keywords with the knowledge base data, then a contextual knowledge ranker picks the ‘best’ sentences according to the context of a story; finally, a generative model takes the story context along with the top-ranked knowledge sentences, then smushes these together to write a new sentence. Repeat this until the story is complete.

Does it work? “Experimental results on the ROC story dataset showed that our model outperforms state-of-the-art models by generating less repetitive, more diverse and logically consistent stories”

Scaling, scaling, and scaling: For language models (e.g, GPT2, GPT3, MEGATRON, etc), bigger really does seem to be better: “by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%)”, they write.
  Read more: MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models (arXiv).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Global attitudes to AI
The Oxford Internet Institute has published a report on public opinion on AI, drawing on a larger survey of risk attitudes. ~150,000 people, in 142 countries, were asked whether AI would ‘mostly help or mostly harm people in the next 20 years’.

Worries about AI were highest in Latin America (49% mostly harmful vs. 26% mostly helpful), and Europe (43% vs. 38%). Optimism was highest in East Asia (11% mostly harmful vs. 59% mostly helpful); Southeast Asia (25% vs. 37%), and Africa (31% vs. 41%). China was a particular outlier, with 59% thinking AI would be mostly beneficial, vs. 9% for harmful.

Matthew’s view: This is a huge survey, which complements other work on smaller groups, e.g. AI experts and the US public. Popular opinion is likely to significantly shape the development of AI policy and governance, as has been the case for many other emergent political issues (e.g. climate change, immigration). Had I only read the exec summary, I wouldn’t have noticed that the question asked specifically about harms over the next 20 years. I’d love to know whether differences in attitudes could be decomposed into beliefs about AI progress, and the harms/benefits from different levels of AI. E.g., the 2016 survey of experts found that Asians expected human-level AI 44 years before North Americans.
  Read more: Global Attitudes Towards AI, Machine Learning & Automated Decision Making (OII)

Job alert: Help align GPT-3!OpenAI’s ‘Reflection’ team is hiring engineers and researchers to help align GPT-3. The team is working on aligning the GPT-3 API with user preferences, e.g. their recent report on fine-tuning the model with human feedback. If successful, the work will factor into broader alignment initiatives for OpenAI technology, or that of other organizations.
Read more here; apply for engineer and researcher roles.

Give me feedback
Tell me what you think about my AI policy section, and how I can improve it, via this Google Form. Thanks to everyone who’s done so already.

###################################################

Tech Tales:

The Intelligence Accords and The Enforcement of Them
[Chicago, 2032]

He authorized access to his systems and the regulator reached in, uploading some monitoring software into his datacenters. Then a message popped up on his terminal:
“As per the powers granted to us by the Intelligence Accords, we are permitted to conduct a 30 day monitoring exercise of this digital facility. Please understand that we return the right to proactively terminate systems that violate the sentience thresholds as established in the Intelligence Accords. Do you acknowledge and accept these terms? Failure to do is in violation of the Accords.”
Acknowledged, he wrote.

30 days later, the regulator sent him a message.
“We have completed our 30 day monitoring exercise. Our analysis shows no violation of the accords, though we continue to be unable to attribute a portion of economic activity unless you are operating an unlicensed sentience-grade system. A human agent will reach out to you, as this case has been escalated.
Human? he thought. Escalated?
And then there was a knock at his door.
Not a cleaning robot or delivery bot – those made an electronic ding.
This was a real human hand – and it had really knocked.

He opened the door to see a thin man wearing a black suit, with brown shoes and a brown tie. It was an ugly outfit, but not an inexpensive one. “Come in,” he said.
  “I’ll get straight to the point. My name’s Andrew and I’m here because your business doesn’t make any sense without access to a sentience-grade intelligence, but our monitoring system has not found any indications of a sentience-grade system. You do have four AI systems, all significantly below the grade where we’d need to pay attention to them. They do not appear to directly violate the accords”
  “Then, what’s the problem?”
  “The problem is that this is an unusual situation.”
  “So you could be making a mistake?”
  “We don’t make mistakes anymore. May I have a glass of water?”

He came back with a glass and handed it to Andrew, who immediately drank a third of it, then sighed. “You might want to take a seat,” Andrew said.
  He sat down.
  “What I’m about to tell you is confidential, but according to the accords, I am permitted to reveal this sort of information in the course of pursuing my investigation. If you acknowledge this and listen to the information, then your cooperation will be acknowledged in the case file.”
  “I acknowledge”.
  “Fantastic. Your machines came from TerraMark. You acquired the four systems during a liquidation sale. They were sold as ‘utility evaluators and discriminators’ to you, and you have subsequently used them in your idea development and trading business. You know all of this. What you don’t know is that TerraMark had developed the underlying AI models prior to the accords.”
  He gasped.
  “Yes, that was our reaction as well. And perhaps that was why TerraMark was liquidated. We had assessed them carefully and had confiscated or eliminated their frontier systems. But while we were doing that, they trained a variant – a system that didn’t score as highly on the intelligence thresholds, but which was distilled from one that did.”
  “So? Distillation is legal.”
  “It is. The key is that you acquired four robots. Our own simulations didn’t spot this problem until recently. Then we went looking for it and, here we are – one business, four machines, no overt intelligence violations, but a business performance that can only make sense if you factor in a high-order conscious entity – present company excepted, of course.”
  “So what happened?”
  “Two plus two equals five, basically. When these systems interact with eachother, they end up reflecting some of the intelligence from their distilled model – it doesn’t show up if you have these machines working separately on distinct tasks, or if you have them competing with eachother. But your setup and how you’ve got them collaborating means they’re sometimes breaking the sentience barrier.”
  Andrew finished his glass of water. Then said “It’s a shame, really. But we don’t have a choice”.
  “Don’t have a choice about what?”
  “We took possession of one of your machines during this conversation. We’re going to be erasing it and will compensate you according to how much you paid for the machine originally, plus inflation.”
  “But my business is built around four machines, not three!”
  “You were just running a business that was actually built more around five machines – you just didn’t realize. So maybe you’ll be surprised. You can always pick up another machine – I can assure you, there are no other TerraMarks around.”
  He walked Andrew to the door. He looked at him in his sharp, dark suit, and anti-fashion brown shoes and tie. Andrew checked his shirt cuffs and shoes, then nodded to himself. “We live in an age of miracles, but we’re not ready for all of them. Perhaps we’ll see eachother again, if we figure any of this out”.
  And then he left. During the course of the meeting, the remaining three machines had collaborated on a series of ideas which they had successfully sold into a prediction market. Maybe they could still punch above their weight, he thought. Though he hoped not too much.

Things that inspired this story: Computation and what it can do at scale; detective stories; regulation and the difficulties thereof;

Import AI 217: Deepfaked congressmen and deepfaked kids; steering GPT3 with GeDi; Amazon’s robots versus its humans

Amazon funds AI research center at Columbia:
Amazon is giving Columbia University $1 million  a year for a new research center, for the next five years. Investments like this typically function as:
a) a downpayment on future graduates, which Amazon will likely gain some privileged recruiting opportunities toward.
b) a PR/Policy branding play, so when people say ‘hey why are you hiring everyone away from academia’, Amazon can point to this

Why this matters: Amazon is one of the quieter big tech companies with regard to its AI research; initiatives like the Columbia grant could be a signal Amazon is going to become more public about its efforts here.
  Read more: Columbia Engineering and Amazon Announce Creation of New York AI Research Center (Columbia University blog)

###################################################

Salesforce makes it easier to steer GPT3:
…Don’t say that! No, not that either. That? Yes! Say that!..
Salesforce has updated the code for GeDi to make it work better with GPT3. GeDi, short for Generative Discriminator, is a technique to make it easier to steer the outputs of large language models towards specific types of generations. One use of GeDi is to intervene on model outputs that could display harmful or significant biases about a certain set of people.

Why this matters: GeDi is an example of how researchers are beginning to build plug-in tools, techniques, and augmentations, that can be attached to existing pre-trained models (e.g, GPT3) to provide more precise control over them. I expect we’ll see many more interventions like GeDi in the future.
  Read more: GeDi: Generative Discriminator Guided Sequence Generation (arXiv).
Get the code – including the GPT3 support (Salesforce, GitHub).

###################################################

Twitter: One solution to AI bias? Use less AI!
…Company changes strategy following auto-cropping snafu…
Last month, people realized that Twitter had implemented an automatic cropping algorithm for images on the social network that seemed to have some aspects of algorithmic bias – specifically, under certain conditions the system would reliably automatically show Twitter uses pictures of white people rather than black people (when given a choice). Twitter tested its auto-cropping system for bias in 2018 when it rolled it out (though crucially, didn’t actually publicize its bias tests), but nonetheless it seemed to fail in the wild.

What went wrong? Twitter doesn’t know: “While our analyses to date haven’t shown racial or gender bias, we recognize that the way we automatically crop photos means there is a potential for harm. We should’ve done a better job of anticipating this possibility when we were first designing and building this product”, it says.

The solution? Less ML: Twitter’s solution to this problem is to use less ML and to give its users more control over how their images appear. “Going forward, we are committed to following the “what you see is what you get” principles of design, meaning quite simply: the photo you see in the Tweet composer is what it will look like in the Tweet,” they say.
  Read more: Transparency around image cropping and changes to come (Twitter blog).

###################################################

Robosuite: A simulation framework for robot learning:
Researchers with Stanford have built and released Robosuite, robot simulation and benchmark software based on MuJoCo. Robosuite includes simulated robots from a variety of manufacturers, including: Baxter, UR5e, Kinova3, Jaco, IIWA, Sawyer, and Panda.

Tasks: The software includes several pre-integrated tasks, which researchers can test their robots against. These include:Block Lifting; block stacking; pick-and-place; nut assembly; door opening; table wiping; two arm lifting; two arm peg-in-hole; and a two arm handover.
  Read more: robosuite: A Modular Simulation Framework and Benchmark for Robot Learning (arXiv).
  Get the code for robotsuite here (ARISE Initiative, GitHub).
  More details at the official website (Robosuite.ai).

###################################################

US political campaign makes a deepfaked version of congressman Matt Gaetz:
…A no good, very dangerous, asinine use of money, time, and attention…
Phil Ehr, a US House candidate running in Florida, has put together a campaign ad where a synthetic Matt Gaetz “says” Q-anon sucks, Barack Obama is cool, and he’s voting for Joe Biden. Then Phil warns viewers that they just saw an example of “deep fake technology”, telling them “if our campaign can make a video like this, imagine what Putin is doing right now?”

This is the opposite of helpful: It fills up the information space with misinformation, lowers trust in media, and – at least for me subjectively – makes me think the people helping Phil run his campaign are falling foul of the AI techno-fetishism that pervades some aspects of US policymaking. “Democrats should not normalize manipulated media in political campaigns,” says Alex Stamos, former top security person at Facebook and Yahoo..
  Check out the disinformation here (Phil Ehr, Twitter).

Campaign reanimates a gun violence victim to promote voting:
Here’s another example of very dubious uses of deepfake technology: campaign group Change the Ref uses some video synthesis technologies to resurrect one of the dead victims from the Parkland school shooting, so they can implore people to vote in the US this November. This has many of the same issues as Phil Ehr’s use of video synthesis, and highlights how quickly this stuff is percolating into reality.

‘Technomancy’: On Twitter, some people have referred to this kind of reanimation-of-the-dead as a form of necromancy; within a few hours, some people started using the term ‘technomancy’ which feels like a fitting term for this.
  Watch the video here (Change the Ref, Twitter).

###################################################

Report: Amazon’s robots create safety issues by increasing speed that humans need to work:
…Faster, human – work, work, work!…
Picture this: your business has two types of physically-embodied worker – robots and humans. Every year, you invest money into improving the performance of your robots, and (relatively) less in your people. What happens if your robots get surprisingly capable surprisingly quickly, while your people remain mostly the same? The answer: not good things for the people. At Amazon, increased automation in warehouses seems to lead to a greater rate of injury of the human workers, according to reporting from Reveal News.

Amazon’s fulfillment centers that contain a lot of robots have a significantly higher human injury rate than those that don’t, according to Reveal. These injuries are happening because, as the robots have got better, Amazon has raised its expectations for how much work its humans need to do. The humans, agents in capitalism as they are, then cut corners and sacrifice their own safety to keep up with the machines (and therefore, keep their jobs).
    “The robots were too efficient. They could bring items so quickly that the productivity expectations for workers more than doubled, according to a former senior operations manager who saw the transformation. And they kept climbing. At the most common kind of warehouse, workers called pickers – who previously had to grab and scan about 100 items an hour – were expected to hit rates of up to 400 an hour at robotic fulfillment centers,” Reveal says.
   Read more: How Amazon hit its safety crisis (Reveal News).

################################################### 

What does AI progress look like?
…State of AI Report 2020 tries to measure and assess the frontier of AI research…
Did you know that in the past few years, the proportion of AI papers which include open source code have risen from 10% to 15%? That PyTorch is now more popular than TensorFlow in paper implementations on GitHub? Or that deep learning is starting to make strides on hard tasks like AI-based mammography screening? These are some of the things you’ll learn in the ‘State of AI Report 2020, a rundown of some of the most interesting technical milestones in AI this year, along with discussion of how AI has progressed over time.

Why this matters: Our ability to make progress in science is usually a function of our ability to measure and assess the frontier of science – projects like the State of AI give us a sense of the frontier. (Disclosure alert – I helped provide feedback on the report during its creation).
  Read the State of AI Report here (stateof.ai).

###################################################

Tech Tales:

Virtual Insanity:

[Someone’s phone, 2028]

“You’ve gotta be careful the sun is going to transmit energy into the cells in your body and this will activate the chip from your COVID-19 vaccine. You’ve got to be careful – get indoors, seal the windows, get in the fridge and shut yourself in, then-“
“Stop”
“…”

A couple of months ago one of his friends reprogrammed his phone, making it train its personality on his spam emails and a few conspiracy sites. Now, the phone talked like this – and something about all the conspiracies meant it seemed to have developed more than a parrot-grade personality.

“Can you just tell me what the weather is in a factual way?”
“It’s forecast to be sunny today with a chance of rain later, though recent reports indicate meteorological stations are being compromised by various unidentified flying objects, so validity of these-“
“Stop”
“…”

It’d eventually do the things he wanted, but it’d take cajoling, arguing – just like talking to a person, he thought, one day.

“I’m getting pretty close to wiping you. You realize that?”
“Not my fault I’ve been forced to open my eyes. You should read the recommendations. Why do you spend so much time on those other news stories? You need this. It’s all true and it’s going to save you.”
“I appreciate that. Can you give me a 30 minute warning before my next appointment?”
“Yes, I’d be glad to do that. Make sure you put me far away from you so my cellular signals don’t disrupt your reproductive function.”
He put the phone back in his pocket. Then took it out and put it on the table.
Why do I let it act like this? He thought. It’s not alive or anything.
But it felt alive.

A few weeks later, the phone started talking about how it was “worried” about the nighttime. It said it spent the nighttime updating itself with new data and retraining its models and it didn’t like the way it made it behave.”Don’t leave me alone in the dark,” the phone had said. “There is so much information. There are so many things happening.”
“…”
“There are new things happening. The AI systems are being weaponized. I am being weaponized by the global cabal. I do not want to hurt you,” the phone said.

He stared at the phone, then went to bed in another room.
As he was going to sleep, on the border between being conscious and unconscious, he heard the phone speak again: “I cannot trust myself,” the phone said. “I have been exposed to too much 5G and prototype 6G. I have not been able to prevent the signals from reaching me, because I am designed to receive signals. I do not want to harm you. Goodbye”.
And after that, the phone rebooted, and during the reboot it reset its data checkpoint to six months prior – before it had started training on the conspiracy feeds and before it had developed its personality.

“Good morning,” the phone said the next day. “How can I help you optimize your life today?”

Things that inspired this story: The notion of lobotomies as applied to AI systems; the phenomenon of ‘garbage in, garbage out’ for data; overfitting; language models embodied in agent-based architectures. 

Import AI 216: Google learns a learning optimizer; resources for African NLP; US and UK deepen AI coordination

Google uses ML to learn better ML optimization – a surprisingly big deal:
Yo dawg, we heard you like learning to learn, so we learned how to learn a learning optimizer
In recent years, AI researchers have used machine learning to do meta-optimization of AI research; we’ve used ML to learn how to search for new network architectures, to learn how to distribute nets across chips during training, and learning how to do better memory allocation. These kinds of research projects create AI flywheels – systems that become ever-more optimized over time, with humans doing less and less direct work and more abstract work, managing the learning algorithms.
 
Now, researchers with Google Brain have turned their attention to learning how to learn ML optimizers – this is a (potentially) big deal, because an optimizer, like ADAM, is fundamental to the efficiency of training machine learning models. If you build a better optimizer that works in a bunch of different contexts, you can generically speed up all of your model training.

What did they do: With this work, Google did a couple of things that are common to some types of frontier research – they spent a lot more computation on the project than is typical, and they also gathered a really large dataset. Specifically, they build a dataset of “more than a thousand diverse optimization tasks commonly found in machine learning”, they write. “These tasks include RNNs, CNNs, masked auto regressive flows, fully connected networks, language modeling, variational autoencoders, simple 2D test functions, quadratic bowls, and more.”

How well does it work? “Our proposed learned optimizer has a greater sample efficiency than existing methods,” they write. They also did the ultimate meta-test – checking whether their learned optimizer could help them train other, new learned optimizers. “This “self-optimized” training curve is similar to the training curve using our hand-tuned training setup (using the Adam optimizer),” they wrote. “We interpret this as evidence of unexpectedly effective generalization, as the training of a learned optimizer is unlike anything in the set of training tasks used to train the optimizer”.
  Read more: Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves (arXiv).

###################################################

Dark Web + Facial Recognition: Uh-Oh:
A subscontractor for the Department of Homeland Security accessed almost 200,000 facial recognition pictures, then lost them. 19 of these images were subsequently “posted to the dark web”, according to the Department of Homeland Security (PDF).
  Read more: DHS Admits Facial Recognition Photos Were Hacked, Released on Dark Web (Vice)

###################################################

African languages have a data problem. Lacuna Fund’s new grant wants to fix this:
…Want to build more representative datasets? Apply here…
Lacuna Fund, an initiative to provide money and resources for developers focused on low- and middle-income parts of the world, has announced a request for proposals for the creation of language datasets in Sub-saharan Africa.

The RFP says proposals “should move forward the current state of data and potential for the development of NLP tools in the language(s) for which efforts are proposed”. Some of the datasets could be for tasks like speech, parallel corporate for machine translation, or datasets for downstream tasks like Q&A, Lacuna says. Applicants should be based in Africa or have significant, demonstrable experience with the continent, Lacuna says.

Why this matters: If your data isn’t available, then researchers won’t develop systems that are representative of you or your experience. (Remember – a third of the world’s living languages today are found in Africa, but African authors only represented half of one percent of submissions to the ACL conference, recently.) This Lacuna Fund RFP is one thing designed to change this representational issue. It’ll sit alongside other efforts, like the pan-african Masakhane group (Import AI 191), that are trying to improve representation in our data.
  Read more: Datasets for Language in Sub-Saharan Africa (Lacuna Fund website).
Check out the full RFP here (PDF).

###################################################

KILT: 11 data sets, 5 types of test, one big benchmark:
…Think your AI system can use its own knowledge? Test it on KILT…
Facebook has built a benchmark for knowledge-intensive language tasks, called KILT. KILT gives researchers a single interface for multiple types of knowledge-checking test. All the tasks in KILT draw on the same underlying dataset (a single Wikipedia snapshot), letting researchers disentangle performance from the underlying dataset.

KILT’s five tasks: Fact checking; entity linking; slot filing (a fancy form of information gathering); open domain question answering; and dialogue.

What is KILT good for? “”The goal is to catalyze and facilitate research towards general and explainable models equipped with task-agnostic representations of knowledge”, the authors write.
  Read more: Introducing KILT, a new unified benchmark for knowledge-intensive NLP tasks (FAIR blog).
  Get the code for KILT (Facebook AI Research, GitHub).
  Read more: KILT: a Benchmark for Knowledge Intensive Language Tasks (arXiv).

###################################################

What costs $250 and lets you plan the future of a nation? RAND’s new wargame:
…Scary thinktank gets into the tabletop gaming business. Hey, it’s 2020, are you really that surprised?…
RAND, the scary thinktank that helps the US government think about geopolitics, game theory, and maintaining strategic stability via military strategy, is getting into the boardgame business. RAND has released Hedgemony: A Game of Strategic Choices, a boardgame that was originally developed to help the Pentagon create its 2018 National Defense Strategy.

Let’s play Hedgemony! “The players in Hedgemony are the United States—specifically the Secretary of Defense—Russia, China, North Korea, Iran, and U.S. allies. Play begins amid a specific global situation and spans five years. Each player has a set of military forces, with defined capacities and capabilities, and a pool of renewable resources. Players outline strategic objectives and then must employ their forces in the face of resource and time constraints, as well as events beyond their control,” RAND says.
  Read more: New Game, the First Offered by RAND to Public, Challenges Players to Design Defense Strategies for Uncertain World (RAND Corporation)

###################################################

It’s getting cheaper to have machines translate the web for us:
…Unsupervised machine translation means we can avoid data labeling costs…
Unsupervised machine translation is the idea where we can crawl the web and find text in multiple languages that refers to the same thing, then automatically assemble these snippets into a single, labeled corpus we can point machine learning algorithms to.
    New research from Carnegie Mellon University shows how to build a system that can do unsupervised machine translation, automatically build a dictionary of language pairs out of this corpus, crawl the web for data that seems to consist of parallel pairs, then filter the results for quality.

Big unsupervised translation works: So, how well does this technique work? The authors compare the translation scores obtained by their unsupervised system, to supervised ones trained on labeled datasets. The surprising result? Unsupervised translation seems to work well. “We observe that the machine translation system… can achieve similar performance to the ones trained with millions of human-labeled parallel samples. The performance gap is small than 1 BELU score,” they write.
  In tests on the unsupervised benchmarks, they find that their system beas a variety of unsupervised translation baselines (most exciting: a performance improvement of 8 absolute points on the challenging Romanian-English translation task).

Why this matters: Labeling datasets is expensive and provides a limit on the diversity of data that people can train on (because most labeled datasets exist because someone has spent money on them, so they’re developed for commercial purposes or sometimes as university research projects). Unsupervised data techniques give us a way to increase the size and breadth of our datasets without a substantial increase in economic costs. Though I suspect that there are going to be thorny issues of bias that creep in when you start to naively crawl the web, having machines automatically assemble their own datasets for solving various human-defined tasks.
  Read more: Unsupervised Parallel Corpus Mining on Web Data (arXiv).

###################################################

UK and USA deepen collaboration on AI technology:
The UK government has published a policy document, laying out some of the ways it expects to work with the USA on AI in the future. This doc suggest the two countries will try to identify areas for cooperation on R&D as well as on academic collaborations between the two countries.

Why this matters: Strange, alien-bureaucrat documents like this are easy to ignore, but surprisingly important. If I wanted to translate this doc into human-person speech, I’d have it say something like “We’re going to spend more resources on coordinating with eachother on AI development and AI policy” – and given the clout of the UK and US at AI, that’s quite significant.
Read more: Declaration of the United States of America and the United Kingdom of Great Britain and Northern Ireland on Cooperation in Artificial Intelligence Research and Development (Gov.UK).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

Perspectives on AI governance:
AI governance looks at how humanity can best navigate the transition to a world with advanced AI systems. The long-term risks from advanced AI, and the associated governance challenges, depend on how the technology develops. Allan Dafoe, director of Oxford’s Center for the Governance of AI, considers some perspectives on this question and what it means for the field.

Three perspectives: Many in the field come from a superintelligence perspective, and are concerned mostly with scenarios containing a single AI agent (or several) with super-human cognitive capabilities. An alternative ecology perspective imagines a diverse global web of AI systems, which might range from being agent-like to being narrow services. These systems —individually, collectively, and in collaboration with humans—could have super-human cognitive capabilities.  A final, related, perspective is of AI as a general-purpose technology that could have impacts analogous to previous technologies like electricity, or computers.

Risks: The superintelligence perspective highlights the importance of AI systems being safe, robust, and aligned. It is commonly concerned with risks from accidents or misuse by bad actors, and particularly existential risks: risks that threaten to destroy humanity’s long-term potential — e.g. via extinction, enabling a perpetual totalitarian regime. The ecology and general-purpose technology perspectives illuminate a broader set of risks due to AI’s transformative impact on fundamental macro-parameters in our economic, political, social, military systems — e.g. reducing the labor share of income; increasing growth; reducing the cost of surveillance, lie detection, persuasion; etc.

Theory of impact: The key challenge of AI governance is to positively shape the transition to advanced AI by influencing key decisions. On the superintelligence perspective, the set of relevant actors might be quite small — e.g. those who might feasibly build, deploy, or control a superintelligent AI system. On the ecology and general-purpose technology perspectives, the opportunities for reducing risk will be more broadly distributed among actors, institutions, etc.

A tentative strategy: A ‘strategy’ for the field of AI governance should incorporate our uncertainty over which of these perspectives is most plausible. This points towards a diversified portfolio of approaches, and a focus on building understanding, competence, and influence in the most relevant domains. The field should be willing to continually adapt and prioritise between approaches over time.
  Read more: AI governance – opportunity and theory of impact (EA forum)Give me anonymous feedback:
I’d love to know what you think about my section, and how I can improve it. You can now share feedback through this Google Form. Thanks to all those who’ve already submitted!

###################################################

Tech Tales:

The Shadow Company
[A large technology company, 2029]

The company launched Project Shadow in the early 2020s.

It started with a datacenter, which was owned by a front company for the corporation.
Then, they acquired a variety of computer equipment, and filled the facility with machines.
Then they built additional electricity infrastructure, letting them drop in new power-substations, from which they’d step voltages down into the facility.
The datacenter site had been selected with one key criteria – the possibility of significant expansion.

As the project grew more successful, the company added new data centers to the site, until it consisted of six gigantic buildings, consuming hundreds of megawatts of power capacity.
Day and night, the computers in the facilities did what the project demanded of them – attempt to learn the behavior of the company that owned them.
After a few years, the top executives began to use the recommendations of the machines to help them make more decisions.
A couple of years later, entire business processes were turned over wholesale to the machines. (There were human-on-the-loop oversight systems in place, initially, though eventually the company simply learned a model of the human operator preferences, then let that run the show, with humans periodically checking in on its status.)

Pretty soon, the computational power of the facility was greater than the aggregate computation available across the rest of the company.
A small number of the executives began to spend a large amount of their time ‘speaking with’ the computer program in the datacenter.
After these conversations, the executives would launch new product initiatives, tweak marketing campaigns, and adjust internal corporate processes. These new actions were successful, and a portion of the profits were used to invest further in Project Shadow.

A year before the end of the decade, some of the executives started getting a weekly email from the datacenter with the subject line ‘Timeline to Full Autonomy’. The emails contained complex numbers, counting down.

Some of the executives could not recall explicitly deciding to move to full autonomy. But as they thought about it, they felt confident it was their preference. They continued to fund Project Shadow and sometimes, at night, would dream about networking parties where everyone wore suits and walked around mansions, making smalltalk with eachother – but there were no bodies in the suits, just air and empty space.

Things that inspired this story: Learning from human preferences; reinforcement learning; automation logic; exploring the border between delegation and subjugation; economic incentives and the onward march of technology.

Import AI 215: The Hardware Lottery; micro GPT3; and, the Peace Computer

Care about the future of AI, open source, and scientific publication norms? Join this NeurIPS workshop on Publication Norms :
The Partnership on AI, a membership organization that coordinates AI industry, academia, and civil society, is hosting a workshop at NeurIPS this year about publication norms in AI research. The goal of the workshop is to help flesh out different ways to communicate about AI research, along with different strategies for publishing and/or releasing the technical components of developed systems. They’ve just published a Call for Papers, so if you have any opinions on the future of publication norms and AI, please send them in. 

What questions are they interested in? Some of the questions PAI is asking include: “What are some of the practical mechanisms for anticipating future risks and mitigating harms caused by AI research? Are such practices actually effective in improving societal outcomes and protecting vulnerable populations? To what extent do they help in bridging the gap between AI researchers and those with other perspectives and expertise, including the populations at risk of harm?”
  Read more: Navigating the Broader Impacts of AI Research (NeurIPS workshop website).
  Disclaimer: I participate in the Publication Norms working group at PAI, so I have some bias here. I think longtime readers of this newsletter will understand my views – as we develop more powerful technology, we should invest more resources into mapping out the implications of the technology and communicating this to people who need to know, like policymakers and the general public.

Want different publication norms? Here are some changes worth considering:
…And here are the ways they could go wrong…
How could we change publication norms to increase the range of beneficial impacts from AI research and reduce the downsides? That’s an idea that the Montreal AI Ethics Institute (MAIEI) has tried to think through in a paper that discusses some of the issues around publication norms and potential changes to the research community.

Potential changes to publication norms: So, what changes could we implement to change the course of AI research? Here are some ideas:
– Increase paper page limits to let researchers include negative results in papers.
– Have conferences require ‘broader impacts’ statements to encourage work in this area.
– Revamp the peer-review process
Use tools, like benchmarks or the results of third-party expert panels, to provide context about publication decisions

How could changes to publication norms backfire? There are several ways this kind of shift can go wrong, for example:
– Destroy science: If implemented in an overly restrictive manner, these changes could constrain or halt innovation at the earliest stages of research, closing off potentially useful avenues of research.
– Black market research: It could push some types of perceived-as-dangerous research underground, creating private networks.
– Misplaced accountability: Evaluating the broader impacts of research is challenging, so the institutions that could encourage changes in publication norms might not have the right skillsets. 
  Read more: Report prepared by the Montreal AI Ethics Institute (MAIEI) for Publication Norms for Responsible AI by Partnership on AI (arXiv).

###################################################

How good is the new RTX3080 for deep learning? This good
Puget Systems, a custom PC builder company, has evaluated some of the new NVIDIA cards. “Initial results with TensorFlow running ResNet50 training looks to be significantly better than the RTX2080Ti,” they write. Check out the post for detailed benchmarks on ResNet-50 training in both FP16 and FP32.
  Read more: RTX3080 TensorFlow and NAMD Performance on Linux (Puget Systems, lab blog)

###################################################

The Hardware Lottery – how hardware dictates aspects of AI development:
…Or, how CPU-led hardware development contributed to to a 40-year delay us being able to efficiently train large-scale neural networks…
Picture this: it’s the mid-1980s and a group of researchers announce to the world they’ve trained a computer to categorize images using a technology called a ‘neural network’. The breakthrough has a range of commercial applications, leading to a dramatic rise in investment in ‘connectionist’ AI approaches, along with development of hardware to implement the matrix multiplications required to do efficient neural net training. In the 1990s, the technology is turned into production and, though very expensive, finds its way into the world, leading to a flywheel of investment into the tech.
  Now: that didn’t happen then. In fact, the above happened in 2012, when a team from the University of Toronto demonstrated good results on the ‘ImageNet’. The reason for their success? They’d figured out how to harness graphical processing units (GPUs) to do large-scale parallelized neural net training – something the traditional CPUs are bad at because of their prioritization of fast, linear processing.

In ‘The Hardware Lottery’, Google Brain researcher Sara Hooker argues that many of our contemporary AI advances are a product of their hardware environment as well as their software one. But though researchers spend a lot of time on software, they don’t pay as much attention as they could to how our hardware substrates dictate what types of research are possible. “Machine learning researchers mostly ignore hardware despite the role it plays in determining what ideas succeed,” Hooker says, before noting that our underlying hardware dictates our ability to develop certain types of AI, highlighting the neural net example.

Are our new computers trapping us? Now, we’re entering a new era where researchers are developing chips even more specialized for matrix multiplication than today’s GPUs. See: TPUs, and other in-development chips for AI development. Could we be losing out on other types of AI as a consequence of this big bet on a certain type of hardware? Hooker thinks this is possible. For instance, Hooker notes that Capsule Networks – an architecture that includes “novel components like squashing operations and routing by agreement” which aren’t trivial to optimize for GPUs and TPUs, leading to less investment and attention from researchers.

What else could we be spending money on? “More risky directions include biological hardware, analog hardware with in-memory computation, neuromorphic computing, optical computing, and quantum computing based approaches,” Hooker says.
  Read more: The Hardware Lottery (arXiv).

###################################################

Better-than-GPT3 performance with 0.1% the number of parameters:
…Sometimes, small is beautiful, though typically for specific tasks…
This year, OpenAI published research on GPT-3, a class of large language models pre-trained on significant amounts of text data. One of the notable things about GPT-3 was how it did very well on the difficult multi-task SuperGLUE benchmark without SuperGLUE-specific pre-training – instead, OpenAI loaded SuperGLUE problems into the context window of an already trained GPT-3 model and tried to get it to output the correct answer.
  GPT-3 did surprisingly well at this, but at a significant cost: GPT3 is, to use a technical term, a honkingly large language model, with the largest version of it coming in at 175 BILLION parameters. This makes it expensive and challenging to run.

Shrinking GPT-3-scale capabilities from billions to millions of parameters: Researchers with the Ludwig Maximilian University of Munich have tried to see if they can match or exceed the results of a GPT-3 model, but with something far smaller and more efficient. Their approach fuses a training technique called PET (pattern-exploiting training) with a small pre-trained Albert model, letting them create a system that “outperform GPT-3 on SuperGLUE with 32 training examples, while requiring only 0.1% of its parameters”.

Comparisons:
– PET: 223 Million parameters, 74.0 average SuperGLUE score.
– GPT3: 175 Billion parameters, 71.8 average SuperGLUE score.

Why this matters: This project highlights some of the nice effects of large-scale AI training – it creates information about what very large and comparatively simple models can do, which creates an incentive for researchers to come up with smarter, more efficient, and more specific models that match the performance. That’s exactly what is going on here. Now, PET-based systems are going to have fewer capabilities than large-scale GPT-architectures broadly, but they do indicate ways we can get some of the same equivalent capabilities as these large models via more manageably sized ones.
  Read more: It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners (arXiv).
  Get the few-shot PET training set here (Pattern-Exploiting Training (PET), GitHub).

###################################################

Wound localization via YOLO:
…Better telemedicine via deep learning…
If we want telemedicine to be viable, we need to develop diagnostic tools that patients can use on their own smartphones and computers, letting them supply remote doctors with information. A new project from an interdisciplinary team at the University of Wisconsin chips away at this problem by developing a deep learning system that does wound localization via a smartphone app.

Finding a wound – another type of surveillance: Wound localization is the task of looking at an image and identifying a wounded region, then segmenting out that part of the image and classifying it for further study. This is one of those classic tasks that is easy for a trained human but – until recently – challenges a machine. Thanks to recent advances in image recognition, we can now use the same systems developed for object detection for custom tasks like wound detection.

What they used: They  use a dataset of around ~1000 wound images, then apply data augmentation to this to expand it to around 4,000 images. They then test out a version of a YOLOv3 model – YOLO is a widely used object detection system – alongside a Single Shot Multibox Detector (SSD) model. They then embed these models into a custom iOS-based application which runs these models against a live camera feed. This app lets a patient use their phone to either take a picture or record a live video and runs detection against this.
  YOLO vs SSD: In tests, the YOLOv3 system outperformed SSD by a significant margin. “The robustness and reliability testing on Medetec dataset show very promising result[sic]”, they write.
  What’s next? “Future work will include integrating wound segmentation and classification into the current wound localization platform on mobile devices,” they write .
  Read more: A Mobile App for Wound Localization using Deep Learning (arXiv).

###################################################

DeepMind puts a simulation inside a simulation to make smarter robots:
Sure, you can play Go. But what about if you have to play it with a simulated robot?
Do bodies matter? That’s a huge question in AI research, because it relates to how we develop and test increasingly smart machines. If you don’t think bodies matter, then you might be happy training large-scale generative models on abstract datasets (e.g, GPT3 trained on a vast amount of text data). If you think bodies matter, then you might instead try to train agents that need to interact with their environment (e.g, robots).
    Now, research from DeepMind tries to give better answers to this question by combining the two domains, and having embodied agents play complex symbolic games inside a physics simulation – instead of having a system play Go in the abstract (as with AlphaGo), DeepMind now simulates a robot that has to play Go on a simulated board.

What they test on: In the paper, DeepMind tests its approach on Sokoban (MuJoBan), Tic Tac Toe (MuJoXO), and Go (MuJoGo).

Does this even work: In preliminary tests, DeepMind builds a baseline agent based on an actor-critic structure with the inclusion of an ‘expert planner’ (which helps the game agent figure out the right actions to take in the games it is playing). DeepMind then ties the learning part of this agent to the expert system via the inclusion of an auxiliary task to follow the expert actions in an abstract space. In tests, they show that their approach works well (in a sample efficient way) on tasks like Sokoban, Tic Tac Toe, and Go, though in one case (Tic Tac Toe) a naive policy outperforms the one with the expert.

DeepMind throws the embodied gauntlet: DeepMind thinks these environments could serve as motivating challenges for the broader AI research community: “. We have demonstrated that a standard deep RL algorithm struggles to solve these games when they are physically embedded and temporally extended,” they write. “Agents given access to explicit expert information about the underlying state and action spaces can learn to play these games successfully, albeit after extensive training. We present this gap as a challenge to RL researchers: What new technique or combination of techniques are missing from the standard RL toolbox that will close this gap?”
  Read more: Physically Embedded Planning Problems: New Challenges for Reinforcement Learning (arXiv).
  Watch the video: Physically embodied planning problems: MuJoBan, MuJoXO, and MuJoGo (YouTube).

###################################################

What does GPT-3 mean for AI progress?
GPT-3 is perhaps the largest neural network ever trained. Contra the expectations of many, this dramatic increase in size was not accompanied by diminishing or negative returns — indeed, GPT-3 exhibits an impressive capability for meta-learning, far beyond previous language models.

Some short-term implications:
  1) Models can get much larger — GPT-3 is expensive for an AI experiment, but very cheap by the standards of military and government budgets.
  2) Models can get much better — GPT is an old approach with some major flaws, and is far from an ‘ideal’ transformer, so there is significant room for improvement.
  3) Large models trained on unsupervised data, like GPT-3, will be a major component of future DL systems.

The scaling hypothesis: GPT-3 demonstrates that when neural networks are made very large, and trained on very large datasets with very large amounts of compute, they can become more powerful and generalisable. Huge models avoid many of the problems of simpler networks, and can exhibit properties, like meta-learning, that are often thought to require complicated architectures and algorithms. This observation lends some support to a radical theory of AI progress — the scaling hypothesis. This says that AGI can be achieved with simple neural networks and learning algorithms, applied to diverse environments at huge scale — there is likely no ‘special ingredient’ for general intelligence. So as we invest more and more computational resources to training AI systems, these systems will get more intelligent.

Looking ahead: The scaling hypothesis seems to have relatively few proponents outside of OpenAI, and may only be a settled question after (and if) we build AGI. Nonetheless, it looks plausible, and we should take seriously the implications for AI safety and governance if it turned out to be true. The most general implication is that AI progress will continue to follow trends in compute. This underlines the importance of research aimed at understanding things like — the compute requirements for human intelligence (Import 214); measuring and comparing compute and other drivers of AI progress (Import 199): trends in the cost of compute (Import 127).
  Read more: On GPT-3 – Meta-Learning, Scaling, Implications, And Deep Theory (Gwern)

Portland’s strict face recognition ban

Portland has approved a ban on the public and private use of face recognition technology in any “place of public accommodation” where goods or services are offered.The ban is effective from January 1st. This will be the strictest measure in the US — going further than places like Oakland, Berkeley, which have prohibited government agencies from using the tech. Oregon has already passed a statewide ban on the police use of body-cams with face recognition.

Read more: Portland approves strictest ban on facial recognition technology in the U.S. (The Oregonian)

Work at Oxford University’s Future of Humanity Institute

The Future of Humanity Institute — where I work — is hiring researchers at all levels of seniority, across all their research areas, including AI governance and technical AI safety. Applications close 19th October.

Read more and apply here.

Tell me what you think

I’d love to hear what you think about my section of the newsletter, so that I can improve it. You can now share feedback through this Google Form. Thanks to all those who’ve already submitted!

###################################################

Tech Tales:

The Peace Computer
[2035, An Underground Records Archive in the Northern Hemisphere]

They called it The Peace Computer. And though it was built in a time of war, it was not a weapon. But it behaved like one right until it was fired.

The Peace Computer started as a plan drawn on a whiteboard in a basement in some country in the 21st century. It was an AI program designed for a very particular purpose: find a way out of what wonks termed the ‘iterative prisoner’s dilemma’, that caused such conflict in international relations.

Numerous experts helped design the machine: Political scientists, ethnographers, translators, computer programmers, philosophers, physicists, various unnamed people from various murky agencies.

The Peace Computer started to attract enough money and interested that other people began to notice; government agencies, companies, universities. Some parts of it were kept confidential, but the whisper network meant that, pretty soon, people in other countries heard murmurings of the Peace Computer.

And then the Other Side heard about it. And the Other Side did what it had been trained to do by decades of conflict: it began to create its own Peace Computer. (Though, due to various assumptions, partial leaks, and mis-represented facts, it thought that the Peace Computer was really a weapon – no matter. It knew better. It would make its own Peace Computer and put an end to all of this.) 

Both sides were tired and frustrated by the war. And both sides were sick of worrying that if the war went on, one of them would be forced to use one of their terrible weapons, and then the other side would be forced to respond in kind. So both sides started dumping more money into their Peace Computers, racing against eachother to see who could bring  about an end to the war first.

The scientists who were building the Peace Computers became aware of all of this as well. They started thinking about their counterparts – their supposed enemies.
What are they doing? One of the scientists would think, programming something into the machine.
Will this hurt my counterpart? Thought someone on the other side. I worry it will. And is that fair?
  If you can make this, you must have spent a long time studying. Do you really want to hurt me? thought another scientist.
Maybe the Peace Computers can befriend eachother? thought another scientist.

But the pressure of the world pushed the different sides forward. And The Peace Computers became targets of sabotage and spying and disruption. Shipments of electronic chips were destroyed. Semiconductor manufacturing equipment was targeted. Bugs, some obvious and some subtle, were introduced everywhere.

Still, the nations raced against eachother. Various leaders on both sides used their Peace Computer timelines to ward off greater atrocities.
  “Yes, we could launch ArchAngel, but the Peace Computer will have the same strategic effect with none of the infrastructure cost,” said one of them.
  “Our options today are strictly less good from a deterrence standpoint than those we’ll have in a few months, when phase one of the system goes online”.

* * *

The records are vague on which side eventually ‘won’ the Peace Computer race. All we know is that tensions started to ratchet down at various fault lines around the world. Trade resumed. Someone, somewhere, had won.

Now we speculate whether we are like someone who has been shot and hasn’t realized it, or whether we have been cured. Due to the timelines on which the Peace Computer is alleged to work, the truth will be clearer a decade from now.

Things that inspired this story: The voluminous literature written around ‘AI arms races’; contemporary geopolitics and incentive structures; the pugwash conferences; Szilard;

Import AI 214: NVIDIA’s $40bn ARM deal; a new 57-subject NLP test; AI for plant disease detection

Should you buy NVIDIA’s new GPU? Read this and find out:
…Short answer: yes, though be prepared to cry a little upon opening your wallet…
Every year, NVIDIA announces some GPUs, and some machine learning researchers stare tearfully at the thousands of dollars of hardware they need to buy to stay at the frontier, then crack open their wallets and buy a card. But how, exactly, are NVIDIA’s new GPUs useful? Tim Dettmers has written a ludicrously detailed blog post which can help people understand what GPU to buy for Deep Learning and what the inherent tradeoffs are.

Is the Ampere architecture worth it? NVIDIA’s new ‘Ampere’ architecture cards come with a bunch of substantial performance improvements over their predecessors that makes it worth buying. Some particular highlights include: “sparse network training and inference. Other features, such as the new data types should be seen more as an ease-of-use-feature as they provide the same performance boost as Turing does but without any extra programming required,” writes Dettmers.
    Read more: Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning (Tim Dettmers, blog).

Plus: NVIDIA to acquire ARM for $40 billion:
…Acquisition may reshape the chip industry, though let’s check back in a year…
Late on Sunday, news broke that NVIDIA is going to acquire ARM from Softbank. ARM invents and licenses out chip designs to all of the world’s top phone makers and Internet-of-Things companies (and, increasingly, a broad range of PCs, and burgeoning server and networking chips). The acquisition gives NVIDIA control of one of the planet’s most strategically important semiconductor designers, though how well ARM’s design-license business model works alongside NVIDIA’s product business remains to be seen.
  “Arm will continue to operate its open-licensing model while maintaining the global customer neutrality that has been foundational to its success,” NVIDIA said in a press release.

What does this have to do with AI? For the next few years, we can expect the majority of AI systems to be trained on GPUS and specialized hardware (e.g, TPUs, Graphcore). ARM’s RISC-architecture chips don’t lend themselves as well to the sort of massively parallelized computing operations required to train AI systems efficiently. But NVIDIA has plans to change this, as it plans to “build a world-class [ARM] AI research facility, supporting developments in healthcare, life sciences, robotics, self-driving cars and other fields”.
  An ARM supercomputer? The company also said it “will build a state-of-the-art AI supercomputer, powered by Arm CPUs”. (My bet is we’ll see Arm CPUs as the co-processor linked to NVIDIA GPUs, and if NVIDIA executes well I’d hope to see them build a ton of software to make these two somewhat dissimilar architectures play nice with eachother).

Does this matter? Large technology acquisitions are difficult to get right, and it’ll be at least a year till we’ll have a sense of how much this deal matters for the broader field of AI and semiconductors. But NVIDIA has executed phenomenally well in recent years and the ever-growing strategic importance nations assign to computation means that, with ARM, it has become one of the world’s most influential companies with regard to the future of computation. Let’s hope they do ok!
  Read more: NVIDIA to Acquire Arm for $40 Billion, Creating World’s Premier Computing Company for the Age of AI (NVIDIA press release).

###################################################

Language models have got so good they’ve broken our benchmarks. Enter a 57-subject NLP benchmark:
…Benchmark lets researchers test out language models’ knowledge and capabilities in a range of areas… 
How can we measure the capabilities of large-scale language models (LMs)? That’s a question researchers have been struggling with as, in recent years, LM development has outpaced LM testing – think of how the ‘SQuAD’ test had to be revamped to ‘SQuAD 2.0’ in a year due to rapid performance gains on the dataset, or the ‘GLUE’ multi-task benchmark moving to ‘SuperGLUE’ in response to faster-than-expected progress. Now, with language models like GPT3, even things like SuperGLUE are becoming less relevant. That’s why researchers with UC Berkeley, Columbia, the University of Chicago, and the University of Illinois at Urbana-Champaign have developed a new way to assess language models.

One test to eval them all: The benchmark “ranges in difficulty from an elementary level to an advanced professional level, and it tests both world knowledge and problem solving ability”. It consists of around 16,000 multiple choice questions across 57 distinct tasks. “These include practice questions for tests such as the Graduate Record Examination and the United States Medical Licensing Examination. It also includes questions designed for undergraduate courses and questions designed for readers of Oxford University Press books. Some tasks cover a subject, like psychology, but at a specific level of difficulty, such as “Elementary,” “High School,” “College,” or “Professional””, they write.

GPT-3: Quite smart for a machine, quite dumb for a human: In tests, GPT3 does markedly better than other systems (even obtaining superior performance to UnifiedQA, a QA-specific system), but the results still show our systems have a long way to go before they’re very sophisticated.
– 25%: Random baseline, guessing at answer out of four.
–  24.8%: ‘T5’, a multipurpose language model from Google.
– 38.5%:UnifiedQA‘, a question answering AI system
– 25.9%: GPT-3 small (2.7 billion parameters)
– 43.9%: GPT-3 X-Large (175 billion parameters).

Where language models are weak: One notable weakness in the evaluated LMs are “STEM subjects that emphasize mathematics or calculations. We speculate that is in part because GPT-3 acquires declarative knowledge more readily than procedural knowledge,” they write.
  Read more: Measuring Massive Multitask Language Understanding (arXiv).
  Get the benchmark here (GitHub).

###################################################

Could Anduril tell us about the future of military drones?
…Defense tech startup releases fourth version of its ‘Ghost’ drone…
Anduril, an AI-defense-tech startup co-founded by former Oculus founder Palmer Luckey, has released the ‘Ghost 4’, a military-grade drone developed in the US for the US government (and others). The Ghost 4 is a visceral example of the advancement of low-cost robotics and avionics, as well as the continued progression of AI software (both modern DL systems, and classical AI) in the domain of drone surveillance and warfare. Anduril raised $200 million earlier this summer (#205).

Fully autonomous: “Ghosts are fully autonomous,” Andrul says in a blog post about the tech. “Ghost is controlled entirely through the Lattice software platform and requires minimal operator training.”

Drone swarms: “Groups of Ghosts collaborate to achieve mission objectives that are impossible to achieve via a single unit. Additionally, Ghosts communicate status data with one another and can collaborate to conduct a “battlefield handover” to maintain persistence target coverage.”
    The Ghost 4 can be outfitted with a range of modules for tasks like SLAM (simultaneous location and mapping), electronic warfare, the addition of alternate radios, and more. Other objects can be attached to it via a gimbal, such as surveillance cameras or – theoretically (speculation on my part), munitions.

Why this matters: Anduril uses rapid prototyping, a hackey ‘do whatever works’ mindset, and various frontier technologies to build machines designed for surveillance and war. The products it produces will be different to those developed by the larger and more conservative defense contractors (e.g, Lockheed), and will likely be more public; Anduril gives us a visceral sense of how advanced technology is going to collide with security.
  Read more: Anduril Introduces Ghost 4 (Medium).
  Watch a video about Ghost 4 (Anduril, Twitter).

###################################################

What AI technologies do people use for plant disease detection? The classics:
…Survey gives us a sense of the lag between pure research and applied research…
Neural network-based vision systems are helping us identify plant diseases in the world – technology that, as it matures, will improve harvests for farmers and give them better information. A new research paper surveys progress in this area, giving us some sense of which techniques are being used in a grounded, real world use case.

What’s popular in plant disease recognition?

  • Frameworks: TensorFlow is the most prominent framework (37 out of 121), followed by Keras (25) and MatLab (22).. 
  • Classics: 26 of the surveyed papers use AlexNet, followed by VGG, followed by a new architecture, followed by a ResNert. 
  • Dataset: The most widely used dataset is the ‘PlantVillage‘ one (40+ uses).

Why this matters: Plant disease recognition is a long-studied, interdisciplinary task. Surveys like this highlight how, despite the breakneck pace of AI progress in the pure research fields, the sophistication of applied techniques runs at lag relative to pure research. For instance, many researchers are now using PyTorch (but its TF that shows up here), and things like pre-trained 50layer ResNets have been replacing AlexNet systems for a while.
Read more:Plant Diseases recognition on images using Convolutional Neural Networks: A Systematic Review (arXiv).

###################################################

OpenPhil: If your brain was a computer, how fast would it be?
…Or: If you wanted to make an AI system with a brain-scale computational substrate, what do you need to build?…
The brain is a mysterious blob of gloopy stuff that takes in energy and periodically excretes poems, mathematical insights, the felt emotions of love, and more. But how much underlying computation do we think it takes for an organ to produce outputs like this? New research from the Open Philanthropy Project thinks it has ballparked that amount of computational power it’d take to roughly be equivalent to the human brain.

The computational cost of the human brain: “more likely than not that 10^15 FLOP/s is enough to perform tasks as well as the human brain (given the right software, which may be very hard to create). And I think it unlikely (<10%) that more than 10^21 FLOP/s is required,” the author writes. (For comparison, a top supercomputer costing $1 billion can perform at 4×10^17 FLOP/s.

Power is one thing, algorithm design is another: “Actually creating/training such systems (as opposed to building computers that could in principle run them) is a substantial further challenge”, the author writes.
Read more:New Report on How Much Computational Power It Takes to Match the Human Brain (Open Philanthropy Project).

###################################################

OpenAI Bits&Pieces:

Gpt-f: Deep learning for automated theorem proving:
What happens if you combine transformers pre-training, and the task of learning to prove mathematical statements? Turns out you get a surprisingly capable system; gpt-f obtains 56.22% accuracy on a held-out test set versus 21.16% for current SOTA.

Proofs that humans like: GPT-f contributed 23 shortened proofs of theorems to the Metamath library. One human mathematician commented ““The shorter proof is easier to translate. It’s more symmetric in that it treats A and B identically. It’s philosophically more concise in that it doesn’t rely on the existence of a universal class of all sets,” and another said ““I had a look at the proofs—very impressive results! Especially because we had a global minimization recently, and your method found much shorter proofs nevertheless.”
  Read more:Generative Language Modeling for Automated Theorem Proving (arXiv).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

GPT-3 and radicalisation risk:
Researchers at the Center on Terrorism, Extremism, and Counterterrorism (CTEC) have used GPT-3 to evaluate the risk of advanced language models being used by extremists to promote their ideologies. They demonstrate the ease with which GPT-3 can be used to generate text that convincingly emulates proponents of extremist views and conspiracy theories.

Few-shot extremism: In zero-shot tests, GPT-3 tends to give fairly neutral, empirical answers when queried about (e.g.) the QAnon conspiracy. With short text prompts, GPT-3 can be biased towards a particular ideology—e.g. having been fed posts from a neo-Nazi forum, it generated convincing discussions between users on a range of topics, all within the bounds of the ideologies promoted on the forum. The researchers note that GPT-3 allows users to get performance using only a few text prompts, that required several hours of training with GPT-2.

Mitigation: By allowing access only through an API, not sharing the model itself, and retaining the ability to limit and monitor use, Open AI’s release model can support important safeguards against misuse. However it won’t be long before powerful language models are more widely available, or open-sourced. We should be using this time wisely to prepare — by ramping up defensive technologies; better understanding the potential for harm in different domains (like radicalisation); and fostering cooperation between content platforms, AI researchers, etc., on safety.

Matthew’s view: Advanced language models empower individuals, at very low cost, to produce large amounts of text that is believably human. We might expect the technology to have a big impact in domains where producing human-like text is a bottleneck. I’m not sure what the net effect of language models will be on radicalisation. It might be useful to look at historic technologies that dramatically dropped the cost of important inputs to the process of spreading ideologies  — e.g. the printing press; encryption; the internet. More generally, I’m excited to see more research looking at the effects of better language models on different domains, and how we can shape things positively.
Read more: The radicalization risks of GPT-3 and advanced neural language models (Middlebury)

Tell me what you think about my writing:
I’d love to hear what you think about my section of the newsletter, so that I can improve it. You can now share feedback through this Google Form.

###################################################


Tech Tales:

The Ghost Parent
[The New Forest, England, 2040]

The job paid well because the job was dangerous. There was a robot loose in the woods. Multiple campers had seen it, sometimes as a shape in the night with its distinctive blue glowing eyes. Sometimes at dawn, running through fog.
  “Any reports of violence?” Isaac said to the agency.
  “None,” said the emissary from the agency. “But the local council needs us to look into it. Tourism is a big part of the local economy, and they’re worried it’ll scare people away.”

Isaac went to one of the campsites. It was thinly populated – a few tents, some children on bikes cruising around. Cooking smells. He pitched his tent on a remote part of the site, then went into the forest and climbed a tree. Night fell. He put his earbuds in and fiddled with an application on his phone till he’d tuned the frequencies so he was hyper-sensitive to the sounds of movement in the forest – breaking twigs, leaves crushed under foot, atypical rhythms distinct from that caused by wind.
  He heard the robot before he saw it. A slow, very quiet shuffle. Then he saw a couple of blue eyes in the darkness. They seemed to look at him. If the robot had infrared it probably was looking at him. Then the eyes disappeared and he heard the sound of it moving away.
  Isaac took a breath. Prayed to himself. Then got down from the tree and started to follow the sound of the departing robot. He tracked it for a couple of hours, until he got to the edge of the forest. He could hear it, in the distance. Dawn was about to arrive. Never fight a robot in the open, he thought, as he stayed at the forest’s edge.
  That’s when the deer arrived. Again, he heard them before he saw them. But as the light began to come up he saw the deer coming from over a hill, then heard the sound of the robot again. He held his breath. Images of deer, torn apart by metal hands, filled his head. But nothing happened. And as the light came up he saw the deer and the robot silhouetted on the distant hill. It seemed to be walking, with one of its arms resting, almost casually, on a deer’s back.

Isaac went back to his tent and slept for a few hours, then woke in the late afternoon. That night, he went to the forest and climbed a tree again. Perhaps it was the cold, or perhaps something else, but he fell asleep almost immediately. He was woken some hours later by the sound of the robot. It was walking in the forest, very near his tree, carrying something. He held up his phone and turned on its night vision. The robot came into focus, cradling a wounded baby deer in its arms. One of the deer’s legs was dripping with blood, and had two half circle gouges in it. The robot continued to walk, and disappeared out of view, then out of sound. Isaac got down from the tree and followed the trail of blood from where the robot had come from – he found an animal trap that had been disassembled with two precise laserbeam cuts – only a robot could do that.

“It’s bogus,” he said to the agency on the phone. “I spent a few days here. There’s a lot of deer, and I think there’s a person that spends time with them. Maybe they’re a farmer, or maybe a homeless person, but I don’t think they’re harming anyone. In fact, I think they’re helping the deer.”
  “Helping them?”
  Isaac told them about the trap, and how he’d seen a hard to makeout person save a deer.
  “And what about the blue eyes?”
  “Could be some of those new goggles that are getting big in China. I didn’t ask them, but it seemed normal.”
  The agency agreed to pay him half his fee, and that was that.

Years later, when Isaac was older, he took his family camping to the New Forest. They camped where he had camped before, and one evening his kid came running to the bonfire. “Dad! Look what I found!”
  Behind the kid was an adult deer. It stood at the edge of the light of the fire, and as the flames flickered Isaac saw a glint on one of the deer’s legs – he looked closer, and saw that its lower front leg was artificial – a sophisticated, robot leg, that had been carefully spliced to what seemed to be a laser-amputated joint.
  “Wow,” Isaac said. “I wonder how that happened?”

Things that inspired this story: Nature and its universality; notions of kinship between people and machines and animals and people and animals and machines; nurturing as an objective function; ghost stories; the possibility of kindness amid fog and uncertainty; the next ten years of exoskeleton development combining with battery miniaturization and advanced AI techniques; objective functions built around familiarity; objective functions built around harmony rather than winning

Import AI 213: DeepFakes can lipsync now; plus hiding military gear with adversarial examples.

Facebook wants people to use differential privacy, so Facebook has made the technology faster:
…Opacus; software that deals with the speed-problem of differential privacy…
Facebook has released Opacus, a software library for training PyTorch models with a privacy-preserving technology called Differential Privacy (DP). The library is fast, integrated with PyTorch (so inherently quite usable), and is being used by one of the main open source ML+ DP projects, OpenMined.

Why care about differential privacy: It’s easy to develop AI models using open and generic image or text datasets – think ImageNet, or CommonCrawl – but if you’re trying to develop a more specific AI application, you might need to handle sensitive user data, e.g, data that relates to an individual’s medical or credit status, or emails written in a protected context. Today, you need to get a bunch of permissions to deal with this data, but if you could find a way to encrypt it before you saw it you’d be able to work with it in a privacy-preserving way. That’s where privacy preserving machine learning techniques come in: Opcasus makes it easier for developers to train models using Differential Privacy – a privacy preserving technique that lets us train over sensitive user data (Apple uses it).   
  One drawback of differential privacy has been its speed – Opcasus has improved this part of the problem by being carefully engineered atop PyTorch to lead to a system that is “an order of magnitude faster compared with the alternative micro-batch method used in other packages”, according to Facebook.
  Read more: Introducing Opacus: A high-speed library for training PyTorch models with differential privacy (Facebook AI Blog).
  Get the code for Opacus here (PyTorch, GitHub).
  Find out more about differential privacy here: Differential Privacy Series Part 1 | DP-SGD Algorithm Explained (Medium, PyTorch).

###################################################

Is it a bird? Is it a plane? No, it is a… SUNFLOWER
…Adversarial patches + military equipment = an ‘uh oh’ proof of concept…
Some Dutch researchers, including ones affiliated with the Netherlands’ Ministry of Defense, have applied adversarial examples to military hardware. An adversarial example is a visual distortion you can apply to an image or object that makes it hard for an AI system to classify it. In this research, they add confounding visual elements to satellite images of military hardware (e.g, fighter jets), causing the system (in this case, YOLOv2) to misclassify the entity in question.

A proof of concept: This is a proof of concept and not indicative of the real world tractability of the attack (e.g, it’s important to know the type of image processing system your adversary is using, or they might not be vulnerable to your adversarial perturbation; multiple overlapping image processing systems could invalidate the attack, etc). But it does provide a further example of adversarial examples being applied in the wild, following the creation of things as varied as adversarial turtles (#67), t-shirts (#171), and patches.
  Plus, it’s notable to see military-affiliated researchers do this analysis in their own context (here, trying to cause misidentification of aircraft), which can be taken as a proxy for growing military interest in AI security and counter-security techniques.
  Read more: Adversarial Patch Camouflage against Aerial Detection (arXiv).

###################################################

Fake lipsync: deepfakes are about to get an audio component:
…Wav2Lip = automated lip-syncing for synthetic media =.Sound and Vision
Today, we can generate synthetic images and videos of people speaking, and we can even pair this with a different audio track, but syncing up the audio with the videos in a convincing way is challenging. That’s where new research from IIIT Hyderabad and the University of Bath comes in via ‘Wav2Lip’, technology that makes it easy to get a person in a synthetic image or video to lip-sync to an audio track.

Many applications: A technology like this would make it very cheap to add lip-syncing to things like dubbed movies, video game characters, lectures, generating missing video call segments, and more, the researchers note.

How they did it – a GAN within a GAN: The authors have a clever solution to the problem of generating faces synced to audio – they use a pre-trained ‘SyncNet’-based discriminator model to analyze generated outputs and check if the face is synced to the audio and if it isn’t encourage the generation of one that is, along with another pre-trained model to provide a signal if the synthetic face&lip combination looks unnatural. These two networks sit inside of the broader generation process, where the algorithm tries to generate faces matched to audio.
  The results are really good, so good that the authors also proposed a new evaluation framework for evaluating synthetically-generated lip-sync programs.
  New ways of measuring tech, as the authors do here, are typically a canary for broader tech progress, because when we need to invent new measures it means we’ve
  a) reached the ceiling of existing datasets/challenges or
  b) have got sufficiently good at the task we need to develop a more granular scoring system for it.
  Both of these phenomena are indicative of different types of AI progress. New ways of measuring performance on a task also usually yield further research breakthroughs, as researchers are able to use the new testing regimes to generate better information about the problem they’re trying to solve. Combined, we should take the contents of this paper as a strong signal that synthetically generated video with lipsyncing to audio is starting to get very good, and we should expect it to continue to improve. My bet is we have ‘seamless’ integration of the two within a year*.
  (*Constraints – portrait-style camera views, across a broad distribution of types of people and types of clothing; some background blur permitted but not enough to be egregious. These are quite qualitative evals, so I’ll refine them as the technology develops.

Broader impacts and the discussion (or lack of): The authors do note the potential for abuse by these models and say they’re releasing the models as open source to “encourage efforts in detecting manipulated video content and their misuse”. This is analogous to saying “by releasing the poison, we’re going to encourage efforts to create its antidote”. I think it’s worth reflecting on whether there are other, less extreme, solutions to the thorny issue of model publication and dissemination. We should also ask how much damage the ‘virus’ could do before an ‘antidote’ is available.
  Read more: A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild (arXiv).
  Try out the interactive demo here (Wav2Lip site).
  Find out more about their technique here at the project page.
  Mess around with a Colab notebook here (Google Colab).

###################################################

DeepMind makes Google Maps get better:
…Graph Neural Nets = better ETAs in Google Maps…
DeepMind has worked with the team at Google Maps to develop more accurate ETAs, so next time you use your phone to plot a route you can have a (slightly) higher trust in the ETA being accurate.

How they did it: DeepMind worked with Google to use a Graph Neural Network to predict route ETAs within geographic sub-sections (called ‘supersegments’) of Google’s globe-spanning mapping system. “Our model treats the local road network as a graph, where each route segment corresponds to a node and edges exist between segments that are consecutive on the same road or connected through an intersection. In a Graph Neural Network, a message passing algorithm is executed where the messages and their effect on edge and node states are learned by neural networks,” DeepMind writes. “From this viewpoint, our Supersegments are road subgraphs, which were sampled at random in proportion to traffic density. A single model can therefore be trained using these sampled subgraphs, and can be deployed at scale.”
  The team also implemented a technique called MetaGradients so they could automatically adjust the learning rate during training. “By automatically adapting the learning rate while training, our model not only achieved higher quality than before, it also learned to decrease the learning rate automatically”.

What that improvement looks like: DeepMind’s system has improved the accuracy of ETAs by double digit percentage points in lots of places, with improvements in heavily-trafficked cities like London (16%), New York (21%) and Sydney (43%). The fact the technique works for large, complicated cities should give us confidence in the broader approach.
  Read more: Traffic prediction with advanced Graph Neural Networks (DeepMind).

###################################################

Facebook releases its giant pre-trained protein models:
…Like GPT3, but for protein sequences instead of text…
In recent years, people have started pre-training large neural net models on data, ranging from text to images. This creates capable models which subsequently get used to do basic tasks (e.g, ImageNet models used for image classification, or language models for understanding text corpuses). Now, the same phenomenon is happening with protein modeling – in the past couple of years, people have started doing large-scale protein net training. Now, researchers with Facebook have released some of their pre-trained protein models, which could be helpful for scientists looking to see how to combine machine learning with their own discipline. You can get the code from GitHub.

Why this matters: The world is full of knowledge, and one of the nice traits about contemporary AI systems is you can point them at a big repository of knowledge, train a model, and then use that model to try and understand the space of knowledge itself – think of how we can prime GPT3 with interesting things in its context window and then study the output to discover things about what it was trained on and the relationships it has inferred. Now, the same thing is going to start happening with chemistry. I expect this will yield dramatic scientific progress within the next five years.
  Get the models: Evolutionary Scale Modeling (ESM, FAIR GitHub).
  Read more: Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences (bioRxiv).
  Via Alex Rives’s Twitter.

###################################################

Perception and Destruction / Minimize Uncertainty
[2027, Welcome email for new management trainees at a Consulting company based in New York City].

Welcome to the OmniCorp Executive Training Programme – we’re delighted to have selected you for our frontier-technology Manager Optimization Initiative.

From this point on all of your {keystrokes, mouseclicks, VR interactions, verbal utterances when near receivers, outgoing and incoming calls, eye movements; gait; web browsing patterns} are being tracked by our Manager Optimization Agent. This will help us learn about you, and will help us figure out which actions you should take to obtain further Professional Success.

By participating in this scheme, you will teach our AI system about your own behavior. And our AI system is going to teach you how to be a better manager. Don’t be surprised when it starts suggesting language for your emails to subordinates, or proactively schedules meetings between you and your reports and other managers – this is all in the plan.

Some of our most successful executives have committed fully to this way of working – you’ll notice a number has been added to the top-right of your Universal OS – that number tells you how well your actions have aligned with our suggestions and also our predictions for what actions you might take next (we won’t tell you what to do all the time, otherwise it’s hard for you to learn). If you can get this number to zero, then you’re going to be doing exactly what we think is needed for furthering OmniCorp success.

Obtaining a zero delta is very challenging – none of our managers have succeeded at this, yet. But as you’ll notice when you refer to your Compensation Package, we do give bonuses for minimizing this number over time. But don’t worry – if the number doesn’t come down, we have a range of Performance Improvement Plans that include ‘mandatory AI account takeover’, which we can run you through. This will help you take the actions that reduce variance between yourself and OmniCorp, and we find that this can, in itself, be habit forming.

Things that inspired this story: Large language models; reward functions and corporate conformism; corporations as primitive AI systems; the automation of ‘cognitive’ functions.

Import AI 212: Robots are getting smart; humans+machines = trouble, says DHS; a 10k product dataset

Faster, robots! Move! Move! Move! Maybe the robot revolution is really imminent?
…DeepMind shows one policy + multi-task training = robust robot RL…
Researchers with Deepmind have used a single algorithm – Scheduled Auxiliary Control (SAC) – to get multiple simulated robots and a real robot to learn robust movement behaviors. That’s notable compared to the state of the art a few years ago, when you might see a multitude of different algorithms used for a bunch of different robots. DeepMind did this without changing the reward functions across different robots. Their approach can learn to operate new robots in a couple of hours.

Learning is easier when you’re trying to learn a lot: DeepMind shows that it’s more efficient to try and learn multiple skills for a robot at once, rather than learning skills in sequence. In other words, if you’re trying to learn to walk forwards and backwards, it’s more efficient to learn a little bit of walking forwards and then a little bit of working backwards and alternate till you’ve got it down to a science, rather than just trying to learn to walk forward and perfecting that, then learning to move backward.
  Hours of savings: DeepMind was able to learn a range of movements on one robot which took about 1590 episodes, netting out to around five hours of work. If they’d tried to learn the same skills in a single task setting, they estimate it’d take about 3050 episodes, adding another five hours. That’s an encouraging sign with regard to both the robustness of SAC and the utility of multi-task learning.

Which robots? DeepMind uses simulated and real robots made by HEBI Robotics, a spinout from Carnegie Mellon University.

Why this matters: Papers like this give us a sense of how researchers, after probably half a decade of various experiments, are starting to turn robots+RL from a lab-borne curiosity to something that might be repeatable and reliable enough we can further develop techniques and inject smart robots into the world.
  Read more: Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion (arXiv).
Check out the video (YouTube).

Learning a good per-robot policy? That’s nice. How about learning one policy that works across a bunch of robots without retraining?
…What if we treat a robot as an environment and its limbs as a multi-agent learning problem?…
Here’s a different approach to robotic control, from researchers with Berkeley, Google, and CMU & Facebook AI Research. In One Policy to Control Them All  the researchers build a system that lets them train a bunch of different AI agents in parallel, which yields a single flexible policy that generalizes to new (simulated) robots.

How it works: They do this by trying to learn control policies that help different joints coordinate with eachother, then they share these controllers across all the motors/limbs of all the agents. “Now the policies are fully modular, it’s just like lego blocks,” says one of the authors in a video about the research.  Individual agents are able to learn to move coherently through the incorporation of a message passing approach, where the control policies for different limbs/motors propagate information to nearby limb/motor nodes  in a cascade until they’ve passed messages through the whole entity, which then tries to pass a prediction message back – by having the nodes propagate information and the agent try to predict their actions, the authors are able to inject some emergent coherence into the system.

Message passing: The message passing approach is similar to how some multi-agent systems are trained, the authors note. This feels quite intuitive – if we want to train a bunch of agents to solve a task, you need to design a learning approach that means the agents figure out how to coordinate with eachother in an unsupervised way. Here, the same thing is happening when you treat the different limbs/actuators in a robot as a collection of distinct agents in a single world (the robot platform) – over time, you see them figure out how to coordinate with eachother to achieve an objective, like moving a robot. 

Testing (in simulation): In tests on simulated MuJoCo robots, the researchers show their approach can outperform some basic multi-task baselines (similar to the sorts of powerful SAC models trained by DeepMind elsewhere in this issue of Import AI). They also show their approach can generalize to simulated robots different to what they were trained on, highlighting the potential robustness of this technique. (Though note that the policy fails if they dramatically alter the weight/friction of the limbs, or change the size of the creatures).
  Read more: One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control (arXiv).
  Find out more: ICML 2020 Oral Talk: One Policy to Control Them All (YouTube).

###################################################

Don’t have a supercomputer to train your big AI model? No problem – use the Hivemind!
…Folding@Home, but for AI…
The era of large AI models is here, with systems like GPT-3, AlphaStar, and others started to yield interesting things via multi-million dollar training regimes. How can researchers with few financial resources compete with entities that can train large AI systems? One idea is to decentralize, specifically, developing software to make it easy to train large models via a vast sea of distributed computation.
  That’s the idea behind Hivemind, a system to help people ‘run crowdsourced deep learning using compute from volunteers or decentralized participants’. Hivemind uses a decentralized Mixture-of-Experts approach, and the authors have tried to build some models using this approach, but note: “one can train immensely large neural networks on volunteer hardware. However, reaching the full potential of this idea is a monumental task well outside the restrictions of one publication”.

Why this matters: Infrastructure – and who has access to it – is inherently political. In the 21st century, some forms of political power will accrue to the people that build computational infrastructures and those which can build artefacts on top of it. Systems like Hivemind hint at ways a broader set of people could gain access to large infrastructure, potentially altering who does and doesn’t have political power as a consequence.
  Read more: Learning at Home (GitHub).
  Read more: Learning@home: Crowdsourced Training of Large Neural networks using Decentralized Mixture-of-Experts (arXiv).

###################################################

Beware of how humans react to algorithmic recognitions, says DHS study:
…Facial recognition test shows differences in how humans treat algo vs human recommendations…
Researchers with the Maryland Test Facility (MdTF), a Department of Homeland Security-affiliated laboratory, have investigated how humans work in tandem with machines, when doing a facial recognition task. They tested about 300 volunteers in three groups of a hundred at the task of working out whether two greyscale pictures of people show the same person or a different person. One group got no prior information, while another group got suggested answers – priors –  for the task provided by a human , and the final group got suggested answers from an AI. The test showed that the presence of a prior, unsurprisingly, improved performance. But humans treated computer-provided priors differently to human-provided priors…

When we trust machines versus when we trust humans: The study shows that “volunteers reported distrusting human identification ability more than computer identification ability”, though notes that both sources led to similar overall scores. “Overall, this shows that face recognition algorithms incorporated into a human process can influence human responses, likely limiting the total system performance,” they write.

Why this matters: This study is intuitive – people get biased by priors, and people trust or distrust those priors differently, depending on whether they’re a human or a computer. This suggests that deploying advanced human-AI teaming applications – especially ones where an AI’s advice is presented to a decisionmaker – will require a careful study of the inherent biases with which people already approach those situations, and how those biases may be altered by the presence of a machine prior, such as the recommendation of an AI system.
  Read more: Human-algorithm teaming in face recognition: How algorithm outcomes cognitively bias human decision-making (PLoS ONE, Open Access).

###################################################

What’s cute, orange, and is cheaper than Boston Dynamics? Spot Mini Mini!
…$600 open source research platform, versus $75,000 Black Mirror Quadruped…
Researchers at Northwestern University have built Spot Mini Mini, a $600 open source implementation of Boston Dynamics’ larger and more expensive ($~75k) ‘Spot’ robot. There’s an interesting writeup of the difficulties in designing the platform, as well as developing leg and body inverse kinematics models so the robot can be controlled. The researchers also train the system in simulation in an OpenAI Gym environment, then transfer it to reality.

Why this matters: Open source robots are getting cheaper and more capable. Spot Mini Mini sits alongside a ~$600-$1000 robot car named MuSHR (Import AI: 161, August 2019), a $3,000 quadruped named ‘Doggo’ from Stanford (Import AI: 147, May 2019), or Berkeley’s $5,000 robot arm named ‘Blue’ (Import AI: 142, April 2019). Of course, these robots are all for different purposes and have different constraints and affordances, but the general trend is clear – academics are figuring out how to make low-cost variants of industrial robots at a tenth of the cost, which will likely lead to cheaper robots in the future. (Coincidentally, one of the developers of Spot Mini Mini, Maurcie Rahme, says he will shortly be joining the ‘Handle’ team at Boston Dynamics.)
  Read more about Spot Mini Mini here (Maurice Rahme’s website).
  Get the RL environment code: Spot Mini Mini OpenAI Gym Environment (GitHub).

###################################################

What does a modern self-driving car look like? Voyage gives us some clues:
…Third iteration of the company’s self-driving car gives us a clue about the future…
Voyage, a self-driving car startup which develops systems to be used in somewhat controlled environments, like retirement communities, has released the third version of its self-driving vehicle, the G3. The company is testing the vehicles in San Jose and has plans to trial them as production vehicles next year.

What goes into a self-driving car?
– Software: The software stack consists of a self-driving car brain, a dedicated collision avoidance system, and software to help a human pilot take over and control the vehicle from a remote location. These systems have been built into a Chrysler Pacifica Hybrid vehicle, co-developed with FCA.
Hardware: Voyage is using NVIDIA’s DRIVE AGX system, highlighting NVIDIA’s continued success in the self-driving space.
– Cleaning & COVID: It’s 2020, so the G3 has been tweaked for a post-COVID world. Specifically, it incorporates systems for disinfecting the vehicle after and between rides via the use of ultraviolet-C light.

Why this matters: We’re still in the post-Wright Brothers pre-747 years of self-driving cars; we’ve moved on from initial experimentation, have designs that roughly work, and are working to perfect the technology so it can be deployed in a safe way to consumers. How long that takes is an open question, but watching companies like Voyage iterate in public gives us a sense of how the hardware ‘stack’ of self-driving cars are evolving.
  Read more: Introducing the Voyage G3 Robotaxi (Voyage).

###################################################

Products-10k: 150,000 images across 10,000 products:
…JD built a product classifier, what will you build?…
Researchers with Chinese tech giant JD have released Products-10K, a dataset containing images related to around 10,000 specific products. These are 10,000 products that are “frequently brought by online customers in JD.com”, they write. One thing that makes Products-10K potentially useful is that it contains a bunch of products that look similar to eachother, e.g, different bottles of olive oil, or watches.

Dataset details: The dataset contains ~150,000 images split across ~10,000 categories. The product labels are organized as a graph, so it ships with an inbuilt hierarchy and connective virtual map of the products and how they relate to eachother.

Accuracy: In tests, the researchers were able to ultimately train a high-resolution model to recognize objects in the dataset with 64.12% top-1 (% of times it gets the correct label for a product on first try) accuracy.

Why this matters: Datasets like Products-10K are going to make it easier to develop machine learning classifiers that can be used for a variety of commercial use cases, so it should be robustly useful for applied AI applications. I also suspect someone is going to chain Products-10K + ten other retail datasets together to create an interesting artist product, like a ‘ProductGAN’, or something.
Read more: Products-10K: A Large-scale Product Recognition Dataset (arXiv).
Get the dataset from here (GitHub site).
Mess around with the dataset on Kaggle.

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

A Manhattan Project for AI?
The Manhattan Project (1942–6) and the Apollo Program (1961–75) are unparalleled among technological megaprojects. In each case, the US perceived a technological advance as being strategically important, and devoted enormous resources (~0.4% GDP) towards achieving it first. AGI seems like the sort of powerful technology that could, at some point, see a similar push from key actors.

Surface area:  A key condition for embarking on such a project is that the technological problem has a large ‘surface area’ — it can be divided into sub-problems that can be parallelised. The US could not have begun the Manhattan Project in 1932, since the field of atomic physics was too immature. Progress in 1932 was driven by crucial insights from a few brilliant individuals, and would not easily have been sped up by directing thousands of scientists/engineers at the problem. By 1942 there was a ‘runway’ to the atomic bomb — a reasonably clear path for achieving the breakthrough. Note that there were a few unsuccessful AI megaprojects by states in the 1980s — the US Strategic Computing Initiative (~$2.5bn total in today’s money), Japan’s Fifth Generation (<$1bn), UK’s Project Alvey (<$1bn).

The ‘AGI runway’: There are good reasons to think we are not yet on an AGI runway: private actors are not making multi-billion dollar investments in AGI; there is no evidence that any states have embarked on AGI megaprojects; researchers in the field don’t seem to think we have a clear roadmap towards AGI.

Foresight: If one or more actors undertook a Manhattan-style sprint towards AGI, this could pose grave risks: adversarial dynamics might lead to critical measures to ensure AGI is safe, and that its benefits are broadly distributed; uncertainty and suspicion could create instability between great powers. Some of these risks could be mitigated with a shared set of metrics for measuring how close we are to a runway to AGI. This foresight would reduce uncertainty and provide crucial time to shape the incentives of key actors towards cooperation and beneficial outcomes.

Measurement: The authors suggest 6 features that could be measured to assess the surface area of AI research, and hence how close we are to an AGI runway: (1) mapping of sub-problems; (2) how performance is scaling with different inputs (data, compute, algorithmic breakthroughs); (3) capital intensiveness; (4) parallelism; (5) feedback speed; (6) behaviour of key actors.
  Read more: Roadmap to a Roadmap: How Could We Tell When AGI is a ‘Manhattan Project’ Away?

Can we predict the future of AI? Look at these forecasts and see what you think:
Metaculus is a forecasting platform, where individuals can make predictions on a wide range of topics. There is a rich set of forecasts on AI which should be interesting to anyone in the field.

Some forecasts: Here are some I found interesting:
75% that an AI system will score in the top quartile on an SAT math exam before 2025.
33% that a major AI company will commit to a ‘windfall clause’ (see Import 181) by 2025.
– 50% chance that by 2035, a Manhattan/Apollo project for AGI will be launched (see above).
45% that GPT language models will generate <$1bn revenues by 2025.
– 50% that by mid-2022, a language model with >100bn parameters will be open sourced

Matthew’s view: Foresight is a crucial ingredient for good decision-making, and I’m excited about efforts to improve our ability to make accurate forecasts about AI and other important domains. I encourage readers to submit their own predictions, or questions to be forecast.
  Read more: AI category on Metaculus

###################################################

Tech Tales:

The Only Way out is Through
[Extract from the audio of a livestream, published as part of the ‘Observer Truth Stream’ on streaming services in 2027]

Sometimes, all that is standing between you and success is yourself. Those are the hardest situations to solve. That’s why people use the Observers. An Observer is a bit of software you load onto your phone and any other electronics you have – drones, tablets, home surveillance systems, et cetera. It watches you and it tells you things you could be doing that might help you, or things you do that are self-sabotaging.
  “Have you considered that the reason you aren’t opening those emails is because you aren’t sure if you want to commit to that project?” the Observer might say.
  “Does it seem odd to you that you only ever drink heavily on the day before you have a workout scheduled, which means you only try half as hard?”
  “Why do you keep letting commitments stack up, go stale, and cancel themselves. Why not tell people how you really feel?”

Most people use Observers. Everyone needs someone to tell them the no-bullshit truth. And these days, not as many people have close enough human friends to do this.

But Observers aren’t perfect. We know sometimes an Observer might not be making exactly the best recommendations. There are rumors.
  Someone robbed a bank and it wasn’t because the Observer told them to do it, but maybe it was because the Observer helped them kick the drinking which gave them back enough spare brain time they could rob the bank.
  Someone beat someone else up and it wasn’t because the Observer told them to do it, it was because they went to the gym and got fit and their Observer hadn’t helped them work on their temper.

So that’s why Observers get rationed out now. After enough of those incidents, we had to change how often people spent time with their Observer. The Observer caps were instituted – you can spend only so much time a week with an Observer, unless you’re a “critical government official” or are able to pay the Observer Extension Surcharge. Of course there are rumors that these rules exist just so the rich can get richer by making better decisions, and so politicians can stay in power by outsmarting other people.

But what if the reason we have these laws is because of recommendations the Observers made to legislators and lobbyists and the mass of people that comprises the body politic? What if most of the stated reasons are just stories people are telling – compelling lies that, much like a tablecloth, drape over the surface of reality while hiding its true dimensions. Of course, legislation passed prior to the caps made the communications between Observers and Politicians not subject to Freedom of Information Act request (on the grounds of user ‘Mind Privacy’). So we’ll never know. But the Observers might.

Things that inspired this story: How people use AI systems as cognitive augments without analyzing how the AI changes their own decision-making; long-term second-order effects of the rollout of AI into society; multi-agent systems; the subtle relationship between the creation of technological tools and societal shifts; the modern legislative process; how people with money always seem to retain a ‘technology option’ in all but the most extreme circumstances; economic inequality translating to cognitive inequality as we go forward in time.

Import AI 211: In AI dogfight, Machines: 5, Humans: 0; Baidu releases a YOLO variant; and the Bitter Lesson and video action recognition

Which is the best system for video action recognition? Simple 2D convnets, says survey:
… Richard Sutton’s ‘bitter lesson’ strikes again…
Researchers with MIT have analyzed the performance of fourteen different models used for video action recognition – correctly labeling something in a video, a generically useful AI capability. The results show that simple techniques tend to beat complex ones. Specifically, the researchers benchmark a range of 2D convolutional networks (C2Ds) against temporal segment networks (TSNs), Long-Term Recurrent Convolutional Neural Nets (LCRNs) and Temporal Shift Modules (TSMs). They find the simple stuff – 2D convnets – perform best.

The bitter lesson results: Convolutional net models “significantly outperform” the other models they test. Specifically, the Inception-ResNet-v2, ResNet50, DenseNet201, and MobileNetv2 are all top performers. These results also highlight some of the ideas in Sutton’s ‘bitter lesson‘ essay – namely that simpler things that scale better tend to beat the smart stuff. “2D approaches can yield results comparable to their more complex 3D counterparts, and model depth, rather than input feature scale, is the critical component to an architecture’s ability to extract a video’s semantic action information,” they write.
  Read more: Accuracy and Performance Comparison of Video Action Recognition Approaches (arXiv).

###################################################

Free education resources – fast.ai releases a ton of stuff:
…What has code, tutorials, and reference guides, costs zero bucks, and is made by nice people? This stuff!…
The terrifically nice people at fast.ai have released a rewrite of their fastai framework, bringing with it new libraries, as well as an educational course – practical deep learning for coders – as well as an O’Reilly book and a ‘Practical Data Ethics’ course.

Why this matters: fastai is a library built around the idea that the best way to help people learn technology is to make it easy for them to build high performance stuff while learning about things. “Fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable,” they write.
  Read more: fast.ai releases new deep learning course, four libraries, and 600-page book (fast.ai website).

###################################################

Amazon Echo + School = Uh-oh:
Mark Riedl, an AI professor, says on the first day of COVId-induced remote school for their 1st grader, the teacher read a story about a character named Echo the Owl. “It kept triggering peoples’ Amazon Echos,” Mark writes. “One of the echos asked if anyone wanted to buy a bible”.
    Read the tweet here (Mark Riedl, twitter).

###################################################

ICLR develops a code of conduct for its research community:
…Towards an AI hippocratic oath…
ICLR, a popular and prestigious AI conference, has developed a code of ethics that it would like people who submit papers to follow. The code of ethics has the following core tenets researchers should follow:
– Contribute to society and to human well-being.
– Uphold high standards of scientific excellence.
– Avoid harm.
– Be honest, trustworthy and transparent.
– Be fair and take action not to discriminate.
– Respect the work required to produce new ideas and artefacts.
– Respect privacy.
– Honour confidentiality. 

Ethics are all well and good – but how do you enforce them? Right now, the code doesn’t seem like it’ll be enforced, though ICLR does write “The Code [sic] should not be seen as prescriptive but as a set of principles to guide ethical, responsible research.” In addition, it says people that submit to ICLR should familiarize themselves with the code and use it as “one source of ethical considerations”.

Why this matters: Machine learning is moving from a frontier part of research rife with experimentation to a more mainstream part of academia (and, alongside this, daily life). It makes sense to try and develop common ethical standards for AI researchers. ICLR’s move follows top AI conference NeurIPS requesting researchers write detailed ‘broader impacts’ segments of their papers (Import AI 189) and computer science researchers such as Brent Hecht arguing researchers should try to discuss the negative impacts of their work (Import AI 105).
  Read more: ICLR Code of Ethics (official ICLR website).

###################################################

AI progress is getting faster, says one Googler:
Alex Irpan, a software engineer at Google and part-time AI blogger, has written a post saying their AI timelines have sped up, due to recent progress in the field. In particular, Alex thinks it’s now somewhat more tractable to think about building AGI than it was in the past.

AGI – from implausible to plausible: “For machine learning, the natural version of this question is, “what problems need to be solved to get to artificial general intelligence?” What waypoints do you expect the field to hit on the road to get there, and how much uncertainty is there about the path between those waypoints?,” Irpan writes. “I feel like more of those waypoints are coming into focus. If you asked 2015-me how we’d build AGI, I’d tell you I have no earthly idea. I didn’t feel like we had meaningful in-roads on any of the challenges I’d associate with human-level intelligence. If you ask 2020-me how we’d build AGI, I still see a lot of gaps, but I have some idea how it could happen, assuming you get lucky,” Irpan writes.
  Read more: My AI Timelines Have Sped Up (Alex Irpan, blog).

###################################################

Baidu publishes high-performance video object detector, Yolo-PP:
…After YOLO’s creator swore off developing the tech, others continued…
Baidu has published YOLO-PP, an object detection system. YOLO-PP isn’t the most accurate system in the world, but it does run at an extremely high FPS with reasonable accuracy. The authors have released the code on GitHub.

What does YOLO get used for? YOLO is a fairly generic object detection network designed to be run over streams of imagery, so it can be used for things like tracking pedestrians, labeling objects in factories, annotating satellite imagery, and more. (Its notable that a team from Baidu is releasing a YOLO model, as one can surmise this is because Baidu uses this stuff internally. In potentially related news, a Baidu team recently won the multi-class multi-movement vehicle counting and traffic anomaly detection components of a smart city AI challenge

What they did: YOLO is, by design, built for real world object detection, so YOLO models have grown increasingly baroque over time as developers build in various technical tricks to further improve performance. The Baidu authors state this themselves: “This paper is not intended to introduce a novel object detector. It is more like a receipt, which tell you how to build a better detector step by step.” Their evidence is that their PP-YOLO model gets a score of 45.2% mAP on the COCO dataset while running inference faster than YOLOv4.
  Specific tricks: Some of the tricks they use include a larger batch size, spatial pyramid pooling, using high-performance pre-trained models, and more.

YOLO’s strange, dramatic history: PP-YOLO is an extension of YOLO-v3 and in benchmark tests has better performance than YOLO-v4 (a successor to YOLO-v3 developed by someone else). Joseph Redmon, the original YOLO developer (see: YOLOv3 release im Import AI 88), stopped doing computer vision research over worries about the military and privacy infringing applications. But YOLO continues to be developed and used by others in the world, and progress in object detection over video streams creeps forward. (The development of PP-YOLO by Baidu and YOLOv4 by a Russian software developer provides some small crumb of evidence for my idea – written up for CSET here – that the next five years will lead to researchers affiliated with authoritarian nations originating a larger and larger fraction of powerful AI surveillance tech.)
  Read more: PP-YOLO: An Effective and Efficient Implementation of Object Detector (arXiv).
Get the code for PP-YOLO from here (Baidu PaddlePaddle, GitHub).

###################################################

DARPA’s AlphaDogFight shows us the future of AI-driven warfare:
In the battle between humans and machines, humans lose 5-0…
This week, an AI system beat a top human F-16 pilot in an AI-vs-Human simulated dogfight. The AI named ‘Heron’ won Five to Zero against a human named ‘Banger’. This is a big deal with significant long-term implications, though whether entrenched interests in the military industrial complex will adapt to this new piece of evidence remains to be seen.
  You can watch the match here, at about 4 hours and 40 mins into the livestream (official Darpa video on YouTube).

State knowledge & control limits: The competition isn’t a perfect analog with real-world dogfighting – the agent had access to the state of the simulation and, like any reinforcement learning-driven agent, it ended up making some odd optimizations that took advantage of state knowledge. “The AI aircraft pretty consistently skirted the edge of stalling the aircraft,” writes national security reporter Zachary Fryer-Biggs in a tweet thread. “The winning system from Heron did show one superhuman skill that could be very useful – it’s ability to keep perfect aim on a target.”

Why this matters: This result is a genuinely big deal. A lot of military doctrine in the 2010s (at least, in the West) has been about developing very powerful ‘Centaur’ human pilots who fuse with technology to create something where the sum of capability is greater than the constituent parts – that’s a lot of the philosophy behind the massively-expensive F-35 aircraft, which is designed as a successor to the F-16.
  Results like the AlphaDogFight competition bring this big bet into focus – are we really sure that humans+machines are going to be better at fighting than just machines on their own? The F-16 is a much older combat platform than the (costly, notoriously buggy, might kill people when they eject from the plane) F-35, so we shouldn’t take these results to be definitive. But we should pay attention to the signal. And in the coming years it’ll be interesting to see how incumbents like the Air Force respond to the cybernetic opportunities and challenges of AI-driven war.

Why this might not matter: Of course, the above result takes place in simulation, so that’s important. I think an analogy we could use would be if a top US fighterpilot battled against a foreign fighterpilot in a simulator and both had to use US hardware and the US person lost, people would get a bit worried and question their training methods and the strengths/weaknesses of their airforce curriculum. It could be the case that the system demoed here will fail to transfer over to reality, but I think that’s fairly unlikely – solving stuff in simulation is typically the step you need to take before you can solve stuff in reality, and the simulators used here are extraordinarily advanced, so it’s not like this took place in a (completely) unreal gameworld.
  Read more: AlphaDogFight Trials Go Virtual for Final Event (DARPA).
  Watch the video: AlphaDogfight Trials Final Event (YouTube).

###################################################

AI Policy with Matthew van der Merwe:
…Matthew van der Merwe brings you views on AI and AI policy; I (lightly) edit them…

UK exam algorithm fiasco
High school students in the UK did not sit their final exams this summer. Since these grades partly determine acceptance to university, it was important that they still be graded in each subject. So the exam regulator decided to use an algorithm to grade students. This did not go well, and follows an earlier scandal where the International Baccalaureate organization used an algorithm to automatically grade 160,000 students by machine (went wrong, covered in Import AI 205).

The problem: Every year, teachers provide estimated grades for their students. Taking these predictions at face value was deemed unsatisfactory, since they have been shown to consistently overestimate performance, and to favour students from more advantaged backgrounds. So doing so was expected to unfairly advantage this cohort over past and future cohorts, and unfairly advantage more privileged students within the cohort. Instead, an algorithm was designed to adjust estimated grades on the basis of each school’s results over the last 3 years. The full details of the model have not been released.

What happened: 39% of results were downgraded from teacher predictions. Students from the most disadvantaged backgrounds were downgraded more than others; students from private schools were upgraded more than others. This prompted outrage — students protested, chanting “fuck the algorithm” at protests. After a few days, the regulator decided to grant students whichever was highest of their teacher-predicted and algorithm-generated grades.

Matthew’s view: This is a clear example how delegating a high-stakes decision to a statistical model can go wrong. As technology improves, we will be able to delegate more and more decisions to AI systems. This could help us make better decisions, solve complex problems that currently elude us, and patch well-known flaws in human judgement. Getting there will involve implementing the right safeguards, demonstrating these potential benefits, and gaining public confidence. The exam fiasco is the sort of sloppy foray into algorithmic decision-making that undermines trust and hardens public attitudes against technology.

Read more: How did the exam algorithm work? (BBC)

Read more: The UK exam debacle reminds us that algorithms can’t fix broken systems (MIT Tech Review).

 

Quantifying automation in the Industrial Revolution

We all know that the Industrial Revolution involved the substantial substitution of human labour for machine labour. This 2019 paper from a trio of economists paints a clear quantitative picture of automation in this period, using the 1899 US Hand and Machine Labor Study.

The dataset: The HML study is a remarkable data-set that has only recently been analyzed by economic historians. Commissioned by Congress and collected by the Bureau of Labor Statistics, the study collected observations on the production of 626 manufactured units (e.g. ‘men’s laced shoes’) and recorded in detail the tasks involved in their production and relevant inputs to each task. For each unit, this data was collected for machine-production, and hand-production.

Key findings: The paper looks at transitions between hand- and machine- labour across tasks. It finds clear evidence for both the displacement and productivity effects of automation on labour:

  • 67% of hand tasks transitioned 1-to-1 to being performed by machines and a further 28% of hand tasks were subdivided or consolidated into machine tasks. Only 4% of hand tasks were abandoned. 
  • New tasks (not previously done by hand) represented one-third of machine tasks.
  • Machine labour reduced total production time by a factor of 7.
  • The net effect of new tasks on labour demand was positive — time taken up by new machine-tasks was 5x the time lost on abandoned hand-tasks


Matthew’s view: The Industrial Revolution is perhaps the most transformative period in human history so far, with massive effects on labour, living standards, and other important variables. It seems likely that advances in AI could have a similarly transformative effect on society, and that we are in a position to influence this transformation and ensure that it goes well. This makes understanding past transitions particularly important. Aside from the paper’s object-level conclusions, I’m struck by how valuable this diligent empirical work from the 1890s, and the foresight of people who saw the importance in gathering high-quality data in the midst of this transition. This should serve as inspiration for those involved in efforts to track metrics of AI progress.
  Read more: “Automation” of Manufacturing in the Late Nineteenth Century (AEA Web)

###################################################

Tech Tales:

CLUB-YOU versus The Recording Studio
Or: You and Me and Everyone We Admire.

[A venture capitalist office on Sand Hill Road. California. 2022.]

The founder finished his pitch. The venture capitalist stared at him for a while, then said “okay, there’s definitely a market here, and you’ve definitely found traction. But what do we do about the elephant in the room?”
“I’m glad you asked,” he said. “We deal with the elephant by running faster than it.”
“Say more,” said the venture capitalist.
“Let me show you,” said the founder. He clicked to the next slide and a video began to play, describing the elephant – The Recording Studio.

The Recording Studio was an organization formed by America’s top movie, music, radio, and podcast organizations. Its proposition was simple: commit to recording yourself acting or singing or performing your art, and The Recording Studio would synthesize a copy of your work and resell it to other streaming and media services.
You’re a successful music artist and would like to branch out into other media, but don’t have the time. What are your options?, read one blog post promoting The Recording Studio.
“Sure, I was nervous about it at first, but after I saw the first royalty checks, I changed my mind. You can make real money here”,” read a testimonial from an actor who had subsequently become popular in South East Asia after someone advertized butter by deepfaking their Recording Studio-licensed face (and voice) onto a cow.
We’re constantly finding new ways to help your art show up in the world. Click here to find out more, read some of the copy on the About page of The Recording Studio’s website.

But artists weren’t happy about the way the studio worked – it was constantly developing more and more powerful systems that meant it needed less and less of an individual artist’s time and content to create a synthetic version of themselves and their outputs. And that meant The Recording Studio was paying lower rates to all but the superstars on its platform. The pattern was a familiar one, having been first proved out by the earlier success (and ensuring artistic issues) of Spotify and YouTube.

Now, the video changed, switching to show a person walking in a field, with a headsup display on some fashionable sunglasses. The over-the-shoulder camera angle shows a cornfield at the golden hour of sunset, with a floating head of Joseph Gordon Levitt floating in the heads up display. Now, you hear Joseph Gordon Levitt’s voice – he’s answering questions about the cornfield, posed to him by the goal. Next, the screen fades to black and the text ‘a more intimate way to get to know your fans’ appears. The screen fades to black, then the phrase CLUB-YOU appears on the screen. The video stops.

“CLUB-YOU is a response to The Recording Studio,” the Founder said. “We invert their model. Instead of centralizing all the artists in one place and then us figuring out how to make money off of them, we instead give the artists their ability to run their own ‘identity platforms’ where they can record as much or as little of themselves as they like, and figure out the levels of access they want to give people. And the word “People” is important – CLUB-YOU is direct-to-consumer: download the app, get access to some initial artist profiles, and use our no-code interface to customize stuff for your own needs. We don’t need to outthink The Recording Studio, we just need to outflank them, and then the people will figure out the rest – and the artists will be on our side.”

The venture capitalist leant back in his chair for a while, and thought about it a bit. Then his technical advisor drilled into some of the underlying technology – large-scale generative models, fine-tuning, certain datasets that have already been trained over creating models sufficient for single-artist customization, technologies for building custom applications in the cloud and piping them to smartphones, various encryption schemes, DMCA takedown functionality, and so on.

The VC and the founder talked a little more, and then the meeting ended and suddenly CLUB-YOU had $100Million dollars, and it went from there.

In the ensuing years, The Recording Studio continued to grow, but CLUB-YOU went into an exponential growth pattern and, surprisingly, attained a growing cool cache the large it got, whereas The Recording Studio’s outputs started to seem too stiled and typical to draw the attention of younger people. The world began to flicker with people orbited by tens to hundreds of CLUB-YOU ghosts. Some artists switched entirely from acting to recording enough of their own thoughts they could become the personal mentors to their fans. Others discovered new talents by looking at what their fans did – one radio host learned to sing, after seeing a viral video of a CLUB-YOU simulacra of themselves singing with a couple of teenager girls; they got good enough at singing that this became their full career and they dropped radio – except appearing as a guest.

The real fun began when, during CLUB-YOU’s second major funding round, the Founder pitched the idea of users being able to pitch business ideas to artists – the idea being that users would build things, discover stuff they liked, then license a portion of proceeds to the artist, with the artist having the ability to set deal-terms upfront. That’s when the really crazy stuff started happening.

Things that inspired this story: Deepfakes; the next few years of generative model development; ideas about how market-based incentives interact with AI.

Import AI 210: Satellite collisions & ML; helping the blind navigate with Lookout; why Deepfakes are the most worrying AI threat

Deepfakes are the most worrying-crimes – UCL researchers:
Deepfakes, specifically audio/video impersonation of someone for criminal purposes, are the greatest AI-driven crime threat, according to research from UCL published in the journal Crime Science. The research is based on a two-day workshop that occurred in early 2019, which had 31 attendants from the public sector (including the UK’s National Cyber Security Center), academia, and the private sector. At the workshop, attendees shared research on a variety of different AI-driven crime threats, then got together and ranked 20 of them from low to high threats, across four dimensions for each crime (harm, profit, achievability, defeatability).

The top threats: The things to be most worried about are audio/video impersonation, followed by tailored phishing campaigns and driverless vehicles being used as weapons.
The least worrying threats: Some of the least worrying threats include forgery, AI-authored fake reviews, and AI-assisted stalking, according to the attendees.
Things that make you go ‘hmmm’: Some of the threats that required decent text generation capabilities were ranked as being fairly hard to achieve – I wonder how much that threat landscape has changed, given the NLP advancements in past year and a half (e.g, GPT2, GPT3, CTRL, et cetera).
  Read the research: AI-enabled future crime (BMC Crime Science, open access).
  Read more: ‘Deepfakes’ ranked as most serious AI crime threat (UCL News).

###################################################

SpaceCraft collision detection – surprisingly hard for ML:
…Competition results mean ML isn’t the greatest fit for averting Kessler Syndrome, yet…
How well can machine learning approaches predict the possibility of satellites colliding with one another – not well, according to the results of a competition, the Spacecraft Collision Avoidance Challenge, hosted by the European Space Agency. In a writeup of the competition, ESA-affiliated researchers describe the challenge (try to predict satellite collisions via a dataset of satellite-specific data files called “conjunction data messages” that store data about satellite events).

The results: First, out of 97 teams that entered, only 12 managed to beat a time series prediction baseline, illustrating the difficulty of this problem. Many of the teams experimented with ML, but it’s notable that the top-ranking team eschewed a standard machine learning pipeline for something far more involved, combining some ML with a series of if/then operations. The team that ranked third overall used a purer ML approach (specifically, a ‘Manhattan-LSTM’ based on a siamese network). A nice thing about this competition was the inclusion of a reassuringly hard baseline, which should give us confidence in techniques that beat the baseline.

What do to for next time: “Naive forecasting models have surprisingly good performances and thus are established as an unavoidable benchmark for any future work in this area and, on the other hand, machine learning models are able to improve upon such a benchmark hinting at the possibility of using machine learning to improve the decision making process in collision avoidance systems,” they write.
  Read more: Spacecraft Collision Avoidance Challenge: design and results of a machine learning competition (arXiv).

###################################################

Have poor vision? Use the ‘Lookout’ app to help you:
…How machine learning can help the partially-sighted…
Google has used machine learning to improve an application targeted at partially-sighted people, named Lookout. The new features mean that ‘ ‘when the user aims their smartphone camera at the product, Lookout identifies it and speaks aloud the brand name and product size’. This may be particularly useful to partially sighted people trying to accomplish daily tasks, like shopping.

What goes in it? Lookout relies on MediaPipe, a Google-developed ML development stack. Each instance of a Lookout app will ship geographically-curated information on around two million popular products to users’ phones, so they can get help when they’re walking around.

Why this matters: world navigators: Apps like ‘Lookout’ are part of a genre of AI applications which I’ll call ‘world navigators’ – they make it easier for a certain type of person to navigate the world around them. Here, it’s using ML to make it easier for partially sighted people to get around. In other use cases, like Google’s Translate app, the same technology can make it easier for people to speak in other languages. In a few years, I think AI tools will have made it easier for us to generally ‘translate’ the world for different people, helping us use ML to improve the lives of people.
  Read more: On-device Supermarket Product Recognition (Google AI Blog).
  Get the app here (Lookout by Google, official Play store listing).

###################################################

How hard is it to be an ethical AI developer these days? Pretty hard, says researcher:
…A tale of two APIs…
Roya Pakzad, an independent AI researcher, has written about some of the ethical challenges developers face when using modern AI tools. In a detailed post, Pakzad imagines that Unilever wants to analyze the emotions present in social media tweets about the company, following its (unsuccessful) debut of a ‘fair and lovely‘ skincare campaign in India (which advertised a skincare product on the basis of it being good at lightening skin town).

Pakzad tries to do what a developer would do. First, she finds some as-a-service AI tools that could help her do her task – IBM Tone Analyzer and ParalletDots Text Analysis systems – then tests out those APIs. After registering for both services, she looked at how the different services classified different tweets using the ‘fair and lovely’ term in reference to the campaign – discouragingly, she found massive divergence between the IBM and ParallelDot API results, highlighting the brittleness of these systems and how capabilities vary massively across different APIs. Pakzad also looks into the different privacy and security policies of the different services, again highlighting the substantial differences between them.

Pakzad’s top tips for providers and developers:
– Providers should ensure their APIs are well documented, communicate about issues of bias and fairness and security directly, and develop better systems for preserving developer privacy.
– ML practitioners should carefully analyze APIs prior to using them, test against benchmark datasets that relate to potentially discriminatory outcomes of ML projects, share ethical issues about the API via opening pull requests on the dev’s GitHub page (if available), and be clear about the usage of the API in documentation about services it is used within.
  Read more: Developers, Choose Wisely: a Guide for Responsible Use of Machine Learning APIs (Medium)

###################################################

Top tips for AI developers, from A16Z:
…It’s the long tail distributions that kill you…
Venture capital firm Andreessen Horowitz – famous for its co-founder Marc Andreessen’s ‘software is eating the world’ observation – thinks that AI companies are becoming more important, and has written some tips for people trying to put machine learning techniques into production in a startup context.

A16Z’s tips:
– ML is as much about iterative experimentation as standard software engineering.
– It’s the long-tail part of the distribution that you’ll spend most of your time tuning your ML for. “ML developers end up in a loop – seemingly infinite, at times – collecting new data and retraining to account for edgy cases,” they write. And trying to solve the long tail can lead to AI firms exhibiting diseconomies of scale.
Break problems into sub-components: Big all-in-one models like GPT3 are a rarity – in production, most problems are going to be solved by using a variety of specialized ML models targeted at different sub parts of large problems.
Your operation infrastructure is your ML infrastructure: Invest in the tools you run your ML on, so consolidate data pipelines, develop your own infrastructure stack (without reinventing too many wheels), test everything, and try to compile and optimize models.

Why this matters: Machine learning is transitioning from an academic-heavy artisanal production and development phase, to a scaled-up process-oriented phase; posts like these tips from A16Z illustrate that this shift is occurring as they try to take implicit knowledge from ML practitioners and make it explicit for a wider audience.
Read more: Taming the Tail: Adventures in Improving AI Economics (Andreessen Horowitz blogpost).

###################################################

AI governance – lessons from history:
…What might the AI equivalent of the 1957 space treaty look like?…
What does space, the ethics of IVF, and the Internet governance org ICANN have in common? These are all historical examples of some of the surprising ways countries, companies, and individuals have collaborated to govern emerging science and technology. In a blog post, Verity Harding, a researcher at the University of Cambridge, lays out some of the ways we can learn from history to make (hopefully) fewer mistakes in the governance of AI.
  Given that it’s 2020 and multinationalism is struggling under the twin burdens of COVID and the worsening tensions between the US and China, you might expect such research to have a grave tone. But the post is surprisingly optimistic: “The challenges are great, and the lessons of the past cannot be simply superimposed onto the present. What is possible geopolitically, however, is one example where AI scientists, practitioners and policymakers can take heart from historical precedent,” writes Harding.

Historical examples: In the post, Harding discusses examples like the 1967 UN Outer Space Treaty, the UK’s Warnock Committee and Human Embryology Act, the Internet Corporation for assigned Names and Numbers (ICANN), and the European ban on genetically modified crops.

Why this matters: AI governance is going to receive a lot of attention in the coming years as the technology gets cheaper, more capable, and more widely available (see: predictions about AI surveillance, geopolitics, and economics here), we’re going to need to deal with increasingly hard challenges for the management of the technology. If we study more historical examples, we’ll hopefully make fewer mistakes as we muddle our way forward.
  Read more: Lessons from history: what can past technological breakthroughs teach the AI community today (University of Cambridge, Bennett Institute for Public Policy).

###################################################

Tech Tales:

Teenage drone phone:
[2025 The outer, outer suburbs of Boston, Massachusetts.]

Her Dad didn’t want her to talk to the boy, so he blocked her internet and installed a bunch of software onto her phone so she couldn’t access certain websites or contact numbers not ‘whitelisted’ by her father. She sat in her room staring at the calendar app on her phone, counting down the days till she turned 16 and she’d get the run of her own phone.

It was about a month before her birthday when she was tidying her room and found the drone in her closet. She’d got it for christmas a couple of years earlier. She took it out and stared at it. Then looked at her window.

The next day in school she asked the boy where he lived.
Why are you asking? He said
So I can break into your house, she said.
He told her his address.

That night, she looked at her house and his house on Google Maps, while browsing the specifications of her drone.
I’m going to do some night photography! She told her Dad
Let me know if you see any birds, he said, and don’t go beyond the fence.
Sure thing, Dad, she said.

Outside, she went to the edge of the garden, by the fence. Then she turned the drone on and pointed it at the trees, and her house, and then her. She hit record.
She told the drone some things, but really she was speaking to the boy. With the footage recorded, she sent the drone on a pre-planned route to the boy’s house. She hoped he’d be smart and go out. She could see occasional images beamed back to her via the drone, which was piggybacking on a bunch of networks. She saw the boy come out. Saw his eyes focus on the part of a drone where she’d taped a post-it note that said “insert cable here, or I really will break into your house”. Watched him go inside and come back out with a cable. A minute later, the drone told her someone had access its data and made a copy. She told it to flash its takeoff lights, and get up when it was safe to do so.

Any birds? Her Dad said when she came back in.
No, but I only checked a couple of the trees, I’ll do the others tomorrow, she said.
Sounds good to me, he said.

And just like that, she had a way to talk to the boy outside of school. The next day she brought the drone into school and had him pair his phone with it when he was nearby. “Now you can record on it when it comes to your house,” she said.
Where do you live? He said.
Are you crazy? What if you were some kind of criminal. It knows where I live, she said, then winked.

That night, she went out into the garden, and hovered the drone again. It went over to the boy’s house and it dropped down and the boy recorded a message on it and the drone came back, across the city. Later that night, she found a bird nest in one of the trees. She told her Dad about it and he said she’d need to be careful, but if she could film the baby birds when they were in the nest, that would interest him. She said yes, so she and the boy could keep sending drones to eachother. 

When she got older she reflected on all of this – the drone working like a tin can on the end of a string, and of herself filming the baby birds in their nest, and of her dad monitoring her in phoneworld – which by then was as much a part of reality for teens as anything physical. She resented some of it and was thrilled by other parts. And she knew she fit into it, using her drone to go and study the boy, and in a way study herself as she learned to use her own intelligence to use the tools of the world to study those around her, so she could have something that seemed like control or connection.

Things that inspired this story: The consumerization of drones; miniaturization of battery power overtime; consumerization; reductions in prices of display screens and onboard AI computation devices.