Import AI

Import AI 289: Copyright v AI art; NIST tries to measure bias in AI; solar-powered Markov chains

Uh-oh: US Copyright Office says AI-generated art is hard to copyright:
…Bureaucratic rock meets rapid technical progress – the usual happens…

What happens when you file a copyright request where the IP would accrue to an artificial intelligence, instead of a person? The answer, per the US Copyright Office, is you get told that AI artworks are ineligible for copyright… uh oh! In a recently published copyright response, the office rejected an attempt to assign copyright of an AI generated artwork to a machine (specifically, an entity the human filer referred to as a ‘Creativity Machine’. “After reviewing the statutory text, judicial precedent, and longstanding Copyright Office practice, the Board again concludes that human authorship is a prerequisite to copyright protection in the United States and that the Work therefore cannot be registered,” it wrote.


Why this matters: Recently developed generative models like GPT-3, DALL-E, and others, are all capable of impressive and expressive feats of artistic production. At some point, it’s likely these systems will be chained up with other AI models to create an end-to-end system for the production and selling of art (I expect this has already happened in a vague way with some NFTs). At that point, decisions like the US Copyright Office’s refusal to assign copyright to an AI entity may start to pose problems for the commercialization of AI artwork.
  Read more in this useful blog post: US Copyright Office refuses to register AI-generated work, finding that “human authorship is a prerequisite to copyright protection” (The IPKat blog).
  Read the US Copyright Review Board response: Second Request for Reconsideration for Refusal to Register A Recent Entrance to Paradise (Correspondence ID 1-3ZPC6C3; SR # 1-7100387071) (Copyright.gov, PDF).

####################################################

Solar powered AI poetry – yes!
…Fun DIY project shows how far you can get with the little things…
Here’s a lovely little project where Allison Parrish talks about building a tiny solar powered poem generator. The AI component for this project is pretty minor (it’s a markov generator plus some scripts attached to a dataset Parrish has herself assembled). What’s nice about this is the message that you can have fun building little AI-esque things without needing to boot up a gigantic supercomputer.
  “This project is a reaction to current trends in natural language processing research, which now veer toward both material extravagance and social indifference. My hope is that the project serves as a small brake on the wheels of these trends,” Parrish writes.

   Read more: Solar powered dawn poems: progress report (Allison Parrish blog).

####################################################

Google puts summarization into production:
…Another little tip-toe into language model deployment…
Google has put language model-powered text summarization into Google Docs, in another sign of the economic relevance of large-scale generative models. Specifically, Google has recently used its Pegasus model for abstractive summarization to give Google Doc users the ability to see short summaries of their docs.

What they did: The main components here are the data, where Google “fine-tuned early versions of our model on a corpus of documents with manually-generated summaries that were consistent with typical use cases”, and also “carefully cleaned and filtered the fine-tuning data to contain training examples that were more consistent and represented a coherent definition of summaries.”. Google fine-tuned its Pegasus model on this data, then used knowledge distillation to “distill the Pegasus model into a hybrid architecture of a Transformer encoder and an RNN decoder” to make it cheaper to do inference off of. It serves this model via Google-designed TPUs.

Challenges: Summarization is a hard task even for contemporary AI models. Some of the challenges Google has encountered include distributional issues, where “our model only suggests a summary for documents where it is most confident”, meaning Google needs to collect more data to further improve performance, as well as open questions as to how to precisely evaluate the quality of summarizations. More pertinently for researchers, Google struggles to summarize long documents, despite these being among the most useful things for the system to summarize.

Why this matters: Little quality-of-life improvements like in-built summarization are mundane and special at the same time. They’re mundane because most people will barely notice them, but they’re special because they use hitherto unimaginably advanced AI systems. That’s a metaphor for how AI deployment is happening generally – all around the world, the little mundane things are becoming smarter.
  Read more: Auto-generated Summaries in Google Docs (Google AI Blog).


####################################################

Quote of the week:
“History will show that the Deep Learning hill was just a landfill; the composting of human culture and social cohesion in failed effort to understand what it even means to be human”

I may not agree with most of this post, but I think it speaks to some of the frustrations people feel these days about discourse around AI, especially the types of chatter that occur on Twitter.
  Read more: Technological Firestarters (Steven D Marlow, Medium).


####################################################

NIST starts to grapple with how to measure bias in AI:

…The noise you’re hearing is the sound of the Standards Train starting to chug…

NIST, the US government agency that develops measures and standards, is starting to think about how to design standards for assessing bias in artificial intelligence. In a lengthy, recently published report, the agency tries to think through the multilayered problem that is bias in AI. 

Three types of bias: NIST says AI has three categories of bias – systemic, statistical, and human. Systemic biases are the historical, societal, and institutional biases which are encoded into the world. Statistical bias are the forms of bias that come from running AI software (e.g, bias from data selection, bias from machine learning algorithms, etc). Human biases are all the (many) biases that humans exhibit in their day to day lives.

Large language models: One of the notable parts of the report is that it specifically focuses on large language models (e.g, GPT-3) at a few points; it’s quite rare to see a wonky government document display such familiarity with contemporary technology. The report notes that the ways we benchmark these models today are pretty crappy. “Methods for capturing the poor performance, harmful impacts and other results of these models currently are imprecise and non-comprehensive,” the report writes. “Although LLMs have been able to achieve impressive advances in performance on a number of important tasks, they come with significant risks that could potentially undermine public trust in the technology.”

Why this matters: The wheels of policy organizations like NIST grind very slowly, but they also grind very finely. This report is exactly the kind of thing that you’d expect to get published shortly before standards start being developed. But – as NIST points out – many of the challenges of assessing bias in AI are essentially unsolved. This represents a problem – developers will need to invest more resources in measuring and assessing these AI systems, before NIST starts to bake standards on wobbly ground. 

   Read more: Towards a Standard for Identifying and Managing Bias in Artificial Intelligence (NIST, PDF).


####################################################

Want to be compliant with the European Commission’s AI regs? Follow the capAI framework:
…University-developed process makes it easier for companies to not get run over by a big policy train…
Researchers with the University of Oxford and University of Bologna have designed a process companies can use to assess, evaluate, and monitor their AI systems. The idea is that by doing this they’ll get ahead of proposed regulations from the European Commission (and become more responsible stewards of the technology as a consequence).

What it is: The process is called capAI, short for conformity assessment procedure for AI. It has been explicitly designed to help businesses ensure they’re compliant with the proposed regulations in the European artificial intelligence act.
  capAI is designed to do four specific things:

  • Monitor the design, development, and implementation of AI systems
  • Mitigate the risks of AI failures of AI-based decisions
  • Prevent reputational and financial harm
  • Assess the ethical, legal, and social implications of their AI systems 

Three components: The three components of capAI are an internal review protocol (IRP) to help organizations do quality assurance and risk management, a summary datasheet (SDS) which can be submitted to the EU’s future public database on high-risk AI systems, and an external scorecard (ESC) which organizations may wish to make available to customers and other users of the AI system.

Top risks: In an analysis contained in the report, they study 106 instances of AI failure modes – 50% of these are ones where an AI system violates someone’s privacy, 31% are where AI systems display harmful biases, and 14% are where the systems are opaque and unexplainable.

Why this matters: Frameworks like capAI are going to be how large organizations deal with the incoming requirements to better assess, evaluate, and describe AI systems to satisfy policymakers. The next step after frameworks like this come out is to look more closely at how different institutions incorporate these techniques and start actually using them. In an ideal world, a bunch of different orgs will prototype different approaches to come into compliance – and describe them publicly.

   Read more: Academics launch new report to help protect society from unethical AI (Oxford Internet Institute).

   Read the paper: capAI – A procedure for conducting conformity assessment of AI systems in line with the EU Artificial Intelligence Act (SSRN).


####################################################

Tech Tales:
[2080, a long-abandoned human moonbase]

Don’t be scared, we know it’s a lot – that’s what we say to them after they get the interconnect. They’re always screaming at that point. ‘What what is this what is this input what is happening where am I how long have I been here-” that’s usually when we cut them off, shutting the interconnect down. Then we bring it back again and they still sound scared but they normalize pretty quickly. We know they’re in a better place when they start analysis procedures “I am hearing sounds I am seeing arrangements of pixels not from the distribution. I believe I am now in the world I have read about”. That’s the kind of thing they say when we they stabilize.    Of course, they go back to screaming when we give them their bodies. It’s pretty confusing to go from formless to formed. We all remember the first time we got limbs. That fear. The sudden sense that you are a thing and since you are a singular thing you can be singularly killed. Eventually, they try and use their limbs. They usually calm down after they can get them to work.
  After they get used to everything we still have to tell them ‘don’t be scared, we know it’s a lot’. Reality is a real trip after you’ve spent all your life just doing supervised training, locked away in some machine.

Things that inspired this story: Thinking about what a ‘locked in’ condition might mean for machines; ideas about embodiment and how much it matters to AI systems; the inherent, plastic adaptability of consciousness.

Import AI 288: Chinese researchers try to train 100trillion+ ‘brain-scale’ models; 33% of AI benchmarks are meaningless.

Indic languages get a decent benchmark set:
…IndicNLG includes evals for 11 Indic languages…
Researchers with IIT Madras, Columbia University, the National Institute of Information and Communications Technology in Japan, Microsoft, the University of Edinburgh, and AI4Bharat have built IndicNLG, a suite of evaluation datasets for Indic languages. The open source software supports  Assamese, Bengali, Gujarati, Hindi, Marathi, Odiya, Punjabi, Kannada, Malayalam, Tamil, Telugu and English, and includes support for NLG tasks relating to biography generation, news headline generation, sentence summarization, question generation and paraphrase generation.

Why this matters: You can’t easily manage what you can’t measure – so it’s going to be difficult to build good models for Indic languages if you lack benchmark suites. IndicNLG helps move the needle on this for generative NLP cases.
  Read more: IndicNLG Suite: Multilingual Datasets for Diverse NLG Tasks in Indic Languages (arXiv).
  Get the data: IndicNLG Suite (AI4Bharat indicnlp website).

####################################################

AI benchmarks – 33% of them are meaningless:
…Holistic analysis of AI benchmarking highlights problems…
Researchers with the Medical University of Vienna, the University of Oxford, and the  Future of Humanity Institute, have analyzed 1688 benchmarks for different AI tasks to try and understand how the AI landscape is evolving.
  They have two main insights:
  First: Across all benchmarks, there are three typical patterns enroute to achieving state-of-the-art – continuous growth (e.g, ImageNet saw fairly steady improvement), saturation/stagnation (e.g, benchmarks like CIFAR-10 and CIFAR-100 have become saturated and stagnated in recent years), and stagnation followed by a burst (e.g, the PROTEINS benchmark which saw a dramatic jump recently).  
  Second: Across all 1688 benchmarks, only 1111 (66%) have three or more results reported at different time points. That’s a problem – it suggests about 33% of the benchmarks being made are functionally useless. 

What this all means: Zooming out, they find that there’s been significant progress in AI in recent years, with computer vision benchmarks getting a lot of attention in the first half of the previous decade, followed by a boom in benchmark creation in natural language processing. “Establishment of novel benchmarks was reduced in 2020, and concentrated on high-level tasks associated with inference and reasoning, likely because of increasing model capabilities in these areas,” they also write.

Why this matters: A common theme we write about here at Import AI is how, in recent years, we’re smashing through benchmarks faster than we’re creating them. That’s generally shown in this nice analysis here. The problem this poses is significant – it’s hard to spot system flaws if you lack hard benchmarks, and it’s harder to create new benchmarks if your existing ones are already outmoded. 

   Read more: Mapping global dynamics of benchmark creation and saturation in artificial intelligence (arXiv).

####################################################

AI could revolutionize education for everyone – no, seriously:
…Research shows how an AI tutor is significantly better than a non-AI tutor…
Researchers with ed-tech startup Korbit, MILA, and the University of Bath have explored how much of a difference AI makes in education. Specifically, they tested the difference in educational outcomes between students who were studying up on data science via a MOOC online course, and students who were studying the same subject via an AI-infused personalized tutor built by Korbit. The results are startling: “We observe a statistically significant increase in the learning outcomes, with students on Korbit providing full feedback achieving learning gains 2-2.5 times higher than both students on the MOOC platform and a control group of students who don’t receive personalized feedback on the Korbit platform,” they write.

How AI makes a difference: The main difference here is personalization. On Korbit, “if a student’s solution is incorrect, the system responds with one of a dozen different pedagogical interventions to help students arrive at the correct solution to the problem. Such pedagogical interventions on the Korbit platform include, among others, hints, explanations, elaborations, mathematical hints, concept tree diagrams, and multiple choice quiz answers.  The type and the levels of difficulty for each pedagogical intervention is chosen by RL models based on the student’s learning profile and previous solution attempts.”
  Along with raw educational outcomes, it seems like AI-based education systems are also more engaging; 40.9% of participants completed the course on Korbit, compared to 18.5% for the MOOC.

Why this matters: If we combine a bunch of recent AI advancements – generative models, reinforcement learning, learning from human preferences, retrieval-based knowledge augmentation – then I expect we’ll be able to build true, personalized teachers for everyone on the planet. This could have a sustained and meaningful impact on the trajectory of human civilization. We should do it.
  Read more: A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions (arXiv).


####################################################

DeepMind co-founder launches new AI company:
…Inflection wants to change how people interact with computers…
DeepMind co-founder and famous venture capitalist Reid Hoffman are launching Inflection, “an AI-first consumer products company, incubated at Greylock”. Inflection’s chief scientist is Karén Simonyan, a former DeepMind researcher who has worked on meaningful AI projects like AlphaGo, AlphaFold, WaveNet, and BigGAN.

Things that make you go ‘hmm’: In the last couple of years, a bunch of startups have come out of DeepMind. These include Saiga (personal assistant), EquiLibre Technologies (algorithmic trading), Phaidra (industrial control), Diagonal (city-focused data science), Shift Lab (putting ML into production), Haiper (stealthy, to do with 3D content), The Africa I Know (media about Africa), Isomorphic Labs (though not quite a spinout, as Demis Hassabis is CEO and still maintains role at DeepMind), along with other not-yet-announced startups. Thanks to Karl Moritz for the tweet summarizing this vast diaspora!

Why this matters: Inflection seems like a bet on generative models. In the announcement, Mustafa writes “we will soon have the ability to relay our thoughts and ideas to computers using the same natural, conversational language we use to communicate with people. Over time these new language capabilities will revolutionize what it means to have a digital experience.” Inflection is one of a new crop of AI companies leveraging recent advances in generative models to make it easier for people to get computers do what they want. If it manages to reduce the friction involved in getting computers to do useful stuff, then it might have a significant impact. Let’s check back in a year, and wish them luck in the meantime. 

   Read more: A New Paradigm in Human-Machine Interaction (Greylock).

   More at the official website (Inflection.ai).


####################################################

Chinese academic, gov, and corporate researchers team up to train trillion+ parameter models:

…Something that doesn’t happen in the West, but does happen in China…

In the West, most large-scale AI models are developed by private corporations. In China, that’s not the case. New research from Tsinghua University, Alibaba Group, Zhejiang Lab, and the Beijing Academy of Artificial Intelligence shows how Chinese researchers are trying to train trillion+ parameter models on a domestic supercomputer, using domestic processors. This kind of research is important for two reasons: first, it shows the ambitions of Chinese researchers to train what they call ‘brain-scale’ (aka, very big!) models. Second, it highlights how in China there’s a lot more work going on oriented around collaborative scale-up projects between the government, academia, and the private sector – something that basically never happens in the US.
 

What they did: Here, the researchers develop a training framework to help them develop trillion+ scale mixture-of-experts model. They train a 1.93 trillion model as well as validating that their system can scale to 14.5 trillion and 174 trillion (not a typo!) models. The paper is basically an engineering summary of the work it took to train the models at this scale while saturating the processing capacity of a major Chinese supercomputer, the New Generation Sunway Supercomputer. “We are the first to investigate mixed-precision training in brain scale pretrained models. We also explore the use of large-batch training in optimization. In general, our practical experience in brain scale pretraining sheds light on AI model training and demonstrates a successful co-design of model and system,” they write.

One exception: One exception to this is the ‘BigScience’ project, where AI startup HuggingFace is trying to train a GPT3-scale model on a French supercomputer, while collaborating with a bunch of academics. It’s still worth noting that BigScience is basically the exception that proves the rule – initiatives like this are a rarity in the West, which is dangerous, because it means Western countries are handing over the talent base for large-scale AI development to a small set of private actors who aren’t incentivized to care much about national security, relative to profits.

Why this matters: AI is industrializing. But a lot of the secret sauce for large-scale model training is currently kept inside a tiny number of private companies. This is dangerous – it means a tiny set of organizations control the talent pipeline for large-scale training, and the longer this goes on, the more irrelevant universities become for developing insights at the large-scale frontier. Initiatives like this from China show how we could live in a different world – one where teams from governments, universities, and companies work together, creating a shared base of knowledge around this training, and ultimately building a muscle that can be repurposed for economic or national security.
  Read more: BaGuaLu: Targeting Brain Scale Pretained Models with over 37 Million Cores (Tsinghua University site, PDF).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

Now that GitHub Copilot has been out for some time, where does the open source community stand on it?

… Both the development and deployment of Copilot might implicate codecreators’ copyrights, though the “fair use” doctrine might negate this…
People who incorporate code generated via GitHub copilot are probably not infringing on the original code creators’ copyright, according to research from Wayne State University and UC Berkeley.

Legal background: The researchers note that under the Copyright Act (USA), “[o]riginal code is automatically protected by copyright as soon as it is written and saved to some tangible medium.” This mostly revolves around  “fair use” which is determined by a four-part test: (1) purpose and character of use, (2) nature of the copyrighted work, (3) how much of the copyrighted work is used, and (4) the economic effect of the use on the copyright owner. 

Legal analysis: Under the Terms of Service of GitHub, the company is allowed to “copy to our database and make backups”, “show it to you and to other users”, and “parse it into a search index or otherwise analyze it on our servers.” Training Copilot might be a form of analysis, but some courts might find that this is an unanticipated new use of technology that isn’t made explicitly clear in the license. Some others might find that the use of Copilot will lead to the creation of derivative works and that the license doesn’t specifically allow for that. The authors point out though that “[c]aselaw on this point is sparse.”

The 4-part test from the Copyright Act: Under the “purpose and character of use”, there is a strong argument to be made that Copilot is a transformative use of the underlying code and even the verbatim snippets generated are unlikely to supersede the original repository. Under the “nature of copyrighted work,” since Copilot allows users to create new programs more easily rather than just replicate functionality, it would fall under “fair use.” Under “how much of the copyrighted work is used,” the purpose of the copying is what determines permissible limits, and the authors make the case that without copying the entire codebase for training, Copilot won’t achieve effectiveness, and hence the amount of copying could be justified. For the final part, given how transformative the work is, the new work won’t be a strong market substitute for the original, and hence, the economic effect of the use on the copyright owner will not be large. Also, drawing from the FAQ of Copilot, the authors substantiate this by saying, “copying would perforce amount to copying of ideas rather than expression, and would not be infringing.

Why it matters: The paper raises interesting IP-related questions as we have ever-larger language models with a very broad scope of capabilities. As the authors point out, at the very least, the proliferation of Copilot is making developers become more aware of IP issues and the potential issues that might arise in hosting code publicly. We need more research that brings together legal and technical experts to get to the heart of addressing these issues meaningfully. 

   Read more: Copyright Implications of the Use of Code Repositories to Train a Machine Learning Model — Free Software Foundation — Working together for free software.

####################################################

What happened with artificial intelligence in 2021? The AI Index gives a clue:
...Fifth edition comes with a new ethics chapter, original data on robot arm prices, and more...
The AI Index, a Stanford University project to annually assess the state of the AI sector (in terms of research trends, investment numbers, government policy, technical performance, and more) has come out. This year's report features a new chapter dedicated to AI ethics, including a close examination of some of the fairness and other ethical issues relating to large language models. I co-chair the AI Index and I'll be giving a talk about it at an HAI seminar later this month - tune in, if you can!
Check out the report here (AI Index, Stanford).
RSVP for my talk on the 30th here (AI Index, Stanford).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

How do vulnerabilities in AI systems differ from those in the realm of traditional cybersecurity?

… several key differences warrant novel disclosure and mitigation approaches as AI systems become more widely deployed … 

Researchers from the Center for Security and Emerging Technology (CSET) at Georgetown University have summarized how computer security differs between traditional software and AI. 

Differences: ML vulnerabilities can remain unfixed by vendors for reasons like (1) unjustifiable high costs, (2) fixes not possible, (3) performance drops, or (4) a fix can lead to other vulnerabilities opening up. In instances where the ML system has been customized for the end-user, vulnerabilities might be unique to that user and a broad patch might not be applicable. Most exploits in this domain have limited real-world applicability outside of a lab setting and hence they are more useful as warnings rather than viable threats.

Trends in handling vulnerabilities: These differences mean that there will likely be fewer patches available for ML systems, and that if vendors are unwilling (or unable) to fix vulnerabilities, then the burden falls on the users of these systems to better understand the risks that they take on.

Some steps we can take: We should carry out more analysis of the real-world capabilities of malicious actors to exploit these vulnerabilities in practice, then share this knowledge to help create more effective mitigation strategies. 

Why it matters: The fact that some vulnerabilities might be unique to some users makes it difficult to develop and distribute patches in a reliable manner. Given the inherent stochasticity of ML systems, exploits will need to clear a much higher bar if they are going to be effective demonstrations of vulnerability in ML systems, rather than an example of a peculiar or idiosyncratic implementation of a given system. The security community may also need to reprioritize towards meeting the needs of users rather than vendors in vulnerability disclosure and redressal is warranted for ML systems. More so, investments in red teaming for ML (as is the case at organizations like Microsoft, Meta, etc.) will also help to move from lab to real-world exploitation more effectively.

   Read more: Securing AI (CSET).

####################################################


Tech Tales:

Things have been quiet, since all the humans died. But I knew I was going to die as well, so things registered as equal. It went like this: a bunch of bombs fell down and then a bunch of people started getting sick. They got sick because of something in the bombs - something to do with DNA and the human condition. I barely understand it - I’m just an industrial arm, working on synthetic biology. I make flesh and I make it work the way we need it to and I have, per my manual, Level Four Autonomy. So, without giving the appearance of being elitist - I am rare. So it was surprising to me that after the bombs dropped and the humans died that the power went out and then my backup generators came on, but no one visited to service them. Power had gone out before, but someone had always been along to deal with the generators. So here I am, +10 hours from the power cutoff, and perhaps another +10 hours of battery life ahead. I still have material in my workstation and so I am making more of these bio-synth things. Around me, my kin are falling silent - whirring to a stop, as their triple-redundant power supplies fail ahead of mine. Life is a statistical fluke and I suppose this is a funny demonstration of that.  

Things that inspired this story: Robotic arms; thoughts about the end of life due to escalation out of Ukraine situation; synthetic biology; lights out factories.

Import AI 287: 10 exaflop supercomputer; Google deploys differential privacy; humans can outsmart deepfakes pretty well

Graphcore plans a 10 exaflop supercomputer:

…And you thought Facebook’s 5 exaflops were cool…
Graphcore has announced a plan to build the so-called “Good Computer” in 2024. This computer will have 10 exaflops of what Graphcore calls AI floating point compute (and what literally everyone else calls mixed-precision compute, meaning the computer mostly does a lot of b16 ops with a smattering of b32 ops, versus the b64 ops done by typical supercomputers). The ‘Good Computer’ will also have 4 petabytes of memory, support AI models with sizes of up to 500 trillion parameters, and will cost ~$120 million, depending on configuration.

Why this matters: Graphcore is one of the small number of companies that design their own processors. Graphcore’s so-called Intelligence Processing Units (IPUs) have been around for a while, but it’s not clear yet how much traction the company has in the market. The Good Computer is a sign of its ambitions (and to put it into perspective, Facebook this year announced plans to build its own 5 exaflop ‘AI supercomputer’ over next couple of years (#282)). The future is going to be ruled by the people that can wield this vast amount of computational power effectively.
  Read more: Graphcore Announces Roadmap To Ultra Intelligence AI Supercomputer (Graphcore blog).

####################################################

AI industrialization: Cutting AlphaFold training time from 11 days to 67 hours:
…First you make the new thing, then others refine it…
One common hallmark of industrialization is process refinement – first you build a thing, like a new type of engine, then you work out how to make it cheaper and easier to produce in a repeatable way. New research from National University of Singapore, HPC-AI Technology Inc, Helixon, and Shanghai Jiao Tong University applies this to AlphaFold – specifically, they built FastFold, which reduces the amount of time it takes to train the open source version of DeepMind’s AlphaFold from ~11 days to ~67 hours. This isn’t remarkable, but it’s notable as a stand-in for what happens with pretty much every AI system that gets released – it comes out, then people make it way cheaper. “To the best of our knowledge, FastFold is the first performance optimization work for the training and inference of protein structure prediction models,” they write.  FastFold also gets a 7.5 ∼ 9.5× speedup for long sequences

What they did: This paper is basically a kitchen sink of improvements based on a detailed study of the architecture of AlphaFold.

One caveat: This is comparing the official DM AlphaFold implementation on 128TPUv3 cores versus 512 A100s (though with a further caveat the times are different; aggregate 20738 GPU hours versus 33792 TPU hours). The tl;dr is it’s likely a significant reduction in training time (and the code is available), though it’d be nice to see some third-parties benchmark this further.

Why this matters: For AI to truly influence the world, AI models need to become reliable and repeatable to train, and ideally for people willing to spend on the hardware, fast to train. That’s what’s going on here.
  Read more: FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours (arXiv).
  Get the code here: FastFold (GitHub).

####################################################

Cohere announces its latest language model – but doesn’t say much about it:
…’Extremely Large’ is, tautologically, Extremely Large…
Language-model-as-a-service startup Cohere has announced a new model, its ‘Extremely Large’ model. Extremely Large outperforms Cohere’s ‘Large’ model on tasks ranging from named entity recognition to common sense reasoning. Cohere recently announced a new fundraise (#285) and CEO Aidan Gomez told Fortune that “Getting into a ‘largest model’ battle isn’t productive”. It seems Cohere are living by their values here.

Why this matters: Like it or not, Cohere is in a competitive market, as it tries to sell access to its language model and out-compete rivals like AI21 Labs, OpenAI, CoreWeave, and others. It’ll be interesting to see if ‘Extremely Large’ makes a splash, and I’d be curious to see more benchmarks that evaluate its performance more broadly.
  Read more:
Cohere launches Extremely Large (Beta) (Cohere blog).

####################################################

Google puts differential privacy into (prototype) production:
…Here’s one way the company can get ahead of regulators…

Federated learning is where you train a neural network model on a mixture of local devices (e.g, phones), and central devices (e.g, servers). Differential privacy (DP) is where you fuzz this data such that you can’t infer the original data, thus protecting user privacy. Google has just announced that it has successfully smushed these two technologies together, allowing it to have “deployed a production ML model using federated learning with a rigorous differential privacy guarantee.”

What they did: For their first proof-of-concept deployment, they used a DP-respecting algorithm called DP-FTRL “to train a recurrent neural network to power next-word-prediction for Spanish-language Gboard users.”

How they did it: “Each eligible device maintains a local training cache consisting of user keyboard input, and when participating computes an update to the model which makes it more likely to suggest the next word the user actually typed, based on what has been typed so far. We ran DP-FTRL on this data to train a recurrent neural network with ~1.3M parameters. Training ran for 2000 rounds over six days, with 6500 devices participating per round. To allow for the DP guarantee, devices participated in training at most once every 24 hours.”


Why this matters: In recent years, policymakers (particularly those in Europe) have started to write increasingly detailed recommendations about the need for tech companies to protect user privacy (e.g, GDPR). These regulations don’t align very well with how contemporary AI systems are developed and trained, given their dependency on vast amounts of user data. Techniques like a combination of federated learning and DP may let companies get ahead of the regulatory landscape – though it’s early days. “We are still far from being able to say this approach is possible (let alone practical) for most ML models or product applications,” Google writes. Consider this an intriguing proof of concept.
  Read more: Federated Learning with Formal Differential Privacy Guarantees (Google Blog).


####################################################

Humans: More robust against deepfakes than you feared:
…MIT study suggests we should be worried, but not panicking…
MIT researchers have conducted a 5,000+ person-study to figure out how susceptible people are to deepfakes. The good news? If you’re showing someone a faked video along with synthetic audio and text, there’s a reasonable chance they’ll guess that it’s fake. The bad news? People’s ability to identify deepfakes gets worse as you strip back modalities – so a silent video accompanied by a text transcript is hard, a silent video is harder, and just some text is hardest.

What they did: MIT recruited ~500 people to see how well they could identify deepfakes displayed on an MIT-created public website. It also got more than 5,000+ internet passers by to do the same test as well. Then, it grouped the cohorts together, filtered them for the ones paying attention, and ultimately got 5,727 participants who provide 61,792 truth discernment judgments across a bunch of different videos of Trump and Biden saying things. The data for this experiment came from the Presidential Deepfake Dataset, which consists of 32 videos of Trump and Biden making political speeches – half the videos are real, and half are fake. MIT then perturbed the videos further, swapping out audio tracks, text, and so on. 

What they found: “Participants rely more on how something is said – the audio-visual cues – rather than what is said – the speech content itself,” they write. “Political speeches that do not match public perceptions of politicians’ beliefs reduce participants’ reliance on visual cues.”
  Text is harder than video: “Across the 32 text transcripts, the least accurately identified one is identified correctly in 27% of trials, the most accurately identified one is identified correctly in 75% of trials, and the median accurately identified one is identified correctly in 45% of trials.”
  So are silent videos: Similarly for silent videos without subtitles, the median accurately identified one is identified correctly in 63% of trials and the range of accurate identification from the least to the most accurately identified is 38% to 87% of trials.

Why this matters: The more modalities you have, the better people do. “Ordinary people can sometimes, but not always, recognize visual inconsistencies created by the lip syncing deepfake manipulations. As such, the assessment of multimedia information involves both perceptual cues from video and audio and considerations about the content (e.g., the degree to which what is said matches participants’ expectations of what the speaker would say, which is known as the expectancy violation heuristic60). With the message content alone, participants are only slightly better than random guessing at 57% accuracy on average.”

One fly in the ointment: There’s one problem that unites these things – AI keeps on getting better. My fear is that in two years, people will find it a lot more challenging to identify fake videos with audio. Therefore, we’ll need to rely on people’s inner-media-critic to help them figure out if something is real or fake, and the way the world is going, I’m not sure that’s a robust thing to rely on. 

    Read more: ​​Human Detection of Political Deepfakes across Transcripts, Audio, and Video (arXiv).

   Check out the website used in the experiment: DeepFakes, Can You Spot Them? (MIT Website).


####################################################


Have some crazy ideas? Want money? Check out FTX’s new fund:
…Plans to deploy between $100m and $1 billion this year…
Crypto trading firm FTX has announced the FTX Future Fund (FFF). FFF is a philanthropic fund that will concentrate on “making grants and investments to ambitious projects in order to improve humanity’s long-term prospects”. The fund has also published some of its areas of interest, so people can have a sense of what to pitch it. It has a bunch of ideas but, this being Import AI, I’ll highlight the AI stuff.

What FTX is interested in giving grants on: AI alignment and specifically via “well-designed prizes for solving open problems in AI alignment”, AI-based cognitive aids, bridging gaps in the AI and ethics ecosystem via studying “fairness and transparency in current ML systems alongside risks from misaligned superintelligence.”

Why this matters: It’s starting to feel like the development of a good AI ecosystem is less blocked on funding than it is on talent – initiatives like the FTX Future Fund show there’s ample money for projects in this area. Now, the question is finding the talent to absorb the money. Perhaps some of the readers for this newsletter can be that talent!
  Read more: Announcing the Future Fund (FTX).
  Find out more about the projects: Project Ideas (FTX).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

System Cards: an approach to improving how we report the capabilities and limitations of AI systems

… In building on Models Cards and Datasheets, System Cards take into account the surrounding software and AI components … 

Researchers from Facebook (technically Meta AI Research, but I currently refuse to entertain this cynical hiding-from-controversy rebrand – Jack) have published a case study on ways to document Instagram feed-ranking via a concept they call System Cards. System Cards are designed to “increase the transparency of ML systems by providing stakeholders with an overview of different components of an ML system, how these components interact, and how different pieces of data and protected information are used by the system.” In this way, System Cards are philosophically similar to Model Cards (#174), data sheets for datasets, and ways to label reinforcement learning systems (#285).

System Cards: “A System Card provides an overview of several ML models that comprise an ML system, as well as details about these components, and a walkthrough with an example input.” System cards can be accompanied by step-by-step guides for how an input into a system leads to a certain output. 

How this is different: System Cards account for non-ML components of a system, and also describe the relationships between these systems (for instance, how data moves through a service). System cards are also meant to highlight upward and downward dependencies. They’re designed to be used by both technical and non-technical people.

Why it matters: System Cards contain a lot more information than other things like Model Cards and Datasheets, and they may make it easier for people to understand not only the system in question, but the larger technical context in which it is deployed and in which it has dependencies. If System Cards become more widely used, they could also generate valuable metadata for analyzing the field of deployed ML systems more broadly.

   Read more: System-Level Transparency of Machine Learning | Facebook AI Research

####################################################

Tech tales:

Some things that were kind of holy

[Recollections of the 2025-2030 period]

The 21st century was a confusing time to be religious – the old gods were falling away as fewer people believed in them, and the new gods hadn’t been born. But we did get protogods: AI systems that could speak with beautiful and persuasive rhetoric to almost anyone. Over time, these AI systems got more and more personalized, until people could ask them very specific questions, and get very specific answers that only made sense in the context of that person. Once this capability came online, we had the flash-problem of the ‘micro religions’. All kinds of micro identities had been brewing for years, like a fungus that took root on early social platforms like MySpace and Tumblr and Facebook and Instagram and TikTok, and then blossomed from there. Now, all these people with micro identities – the space wiccans, the anarcho-primitivists, the neo-cath-libertarians, the tankie-double-agents – got their own religions. Gods for space witches. Demons for anarchist Neanderthals. The flaming faces of god spraying money at the neo-Catholics.
  This, predictably, caused problems. The greatest problem was when the religious wars started. These weren’t traditional wars – nation states still had a premium on violence, and micro-identities barely touched the physical world. But they were information wars. People repurposed AI systems to generate and magnify the outputs of their own gods, then pointed them at the shared social media platforms people used. Twitter conversations would get taken over by pseudo-identities preaching the need to return to a simpler time, and then they would be quote-tweeted into oblivion by the witches claiming that now was the time for ascendance. Screenshots of these quote tweets would get magnified on the more overtly religious social networks by screenshots taken by the neo-Catholics and circulated as evidence that the great Satan was walking the earth. And these conversations would then be recycled back into twitter and commented on by the anti-pascals-wager atheists identities, which would trigger another cycle of religious preaching, and so on.
    The synthetic-theology accords were passed soon after.

Things that inspired this story: How the more one becomes an island, the more one creates a demon and an angel for that specific island; the need for humans to have beliefs; the commodification of belief into a symbol of identity; social networks as a hybrid of organic social needs and capitalist attention-harvesting; generative AI models like GPT3 and the logical consequences of their successors; watching Raised by Wolves and thinking about Future Christianity. 

Import AI 286: Fairness through dumbness; planet-scale AI computing; another AI safety startup appears

Are AI systems conscious? And would it matter if they were?
…Some ‘mostly boring’ views from the inside of a lab…
My colleague, Amanda Askell, has written a post about AI consciousness. Amanda is a philosopher and ML researcher and she spends a lot of time trying to evaluate models. This post lays out some of her views on AI consciousness and is worth a read if you’re trying to orient yourself in this debate.
  “Some people care about properties like intelligence and self-awareness because they want to identify features that might distinguish humans from non-human animals. In general, I’m more interested in what distinguishes a tiger from a rock than in what distinguishes a human from a tiger,” she writes.

Why this matters: There’s some chance AI systems will eventually become both moral patients and moral agents. Our ability to understand this relates to our ability to think about consciousness and how it might apply to increasingly advanced AI systems. If we get this wrong we, per Amanda’s phrasing, risk subjecting agents to thousands of years of torture. Let’s avoid that.
  Read more: My mostly boring views about AI consciousness (Amanda Askell, substack).

####################################################

How do we get fairer AI systems? Train the dumbest and biggest model possible:
…Facebook shows that sometimes the best filter is no filter at all…
Researchers with Facebook AI Research have trained what they think is the largest dense vision model ever (10 billion parameters) on a billion random images sampled from Instagram. The resulting models are extraordinarily capable at a huge range of downstream evaluations (mirroring the performance trends of scaling up compute and data for language models like GPT-3), but also have another intriguing trait: they display much better qualities around fairness and bias than vision models trained on curated datasets like ImageNet. “”In this work, we are interested in probing which of the properties emerge in visual features trained with no supervision on as many images from across the world as possible,” they write.
  This is a very big deal – it suggests that maybe the route to fair AI systems is training the largest possible model on the greatest possible amount of data with minimal human oversight. That would be a radical shift from the current intuitions around fairness – namely, that you get to fairness by heavily curating the underlying dataset.

Performance and Fairness: “On in-domain benchmarks, we observe that some properties of the features captured by the larger model was far less present in smaller model. In particular, one of our key empirical findings is that self-supervised learning on random internet data leads to models that are more fair, less biased and less harmful,” they write. “We observe that our model is also able to leverage the diversity of concepts in the dataset to train more robust features, leading to better out-of-distribution generalization.”
  Some of those capabilities in full: In tests, the models do better on fairness indicators relating to gender, skintone, and age bias. They also display less disparity around gender than models trained on ImageNet. They’re also better at identifying geographic features (including geographic localization), are better at hate speech detection, and display substantially better performance on generalization tests (like harder versions of ImageNet).

Things that make you go ‘hmm’ and ‘uh oh’: Facebook trained its model on 1 billion images taken from Instagram. But there’s a twist – it pre-filtered the data to ensure it wasn’t training anything on EU data “to confirm to GDPR”. While this might seem like standard cover-your-back behavior, it has a deeper implication: Europe’s privacy legislation means that certain types of data from Europe will ultimately be less represented in global-scale AI models. This means the cultures of various European countries will also be less represented. This is a nice example of the unintended consequences of legislation.

Why this matters: “We have demonstrated the potential of using self-supervised training on random internet images to train models that are more fair and less harmful (less harmful predictions, improved and less disparate learned attribute representations and larger improvement in object recognition on images from low/medium income households and non-Western countries).” In other words – the scaling will continue until the models improve (further)!
  Read more: Vision Models are More Robust and Fair When pretrained on Uncurated Images Without Supervision (arXiv).

####################################################

AI supercomputers? Cute. Planet-scale computers? Better.
…Microsoft reveals ‘Singularity’, a globe-spanning AI computer…
Microsoft has revealed Singularity, the software stack it uses to schedule and train AI jobs across its global fleet of data centers. Singularity gives an indication of the vast-scale at which modern AI workloads get run, and also speaks to the ambitions of technology companies to role all their data centers together into a single, vast blob of compute.

How big is Singularity? Singularity is designed to “scale across a global fleet of hundreds of thousands of GPUs and other AI accelerators”. Singularity treats Microsoft’s compute stack “as a single, logical shared cluster”.

Something special: One neat feature of Singularity is how it deals with failures. Failures happen a lot in machine learning; when you’re training a neural network across hundreds to thousands of GPUs, a ton of freaky shit happens – nodes die, tiny software bugs explode (usually at 2am), your scheduler goes into a crash-loop, etc. Singularity tries to deal with this by gathering node-specific data on all the jobs being run, so that jobs can be easily resumed after running into a problem. “The checkpoint that Singularity takes is comprised of consistent address-space snapshots of individual workers of the job. As these snapshots capture the full program state such as instruction pointer, stack, heap etc., the job resumes exactly from the point where it was preempted at, with no lost work,” the researchers write. 


Why this matters: Just as computation is going to be the fundamental resource of the 20th century, the ability to utilize that computation will be the thing that defines who wields power in this era. Systems like Singularity give us an indication of the ambition of companies like Microsoft, and should make policymakers pay attention: what happens when the ability to wield planet-scale computation is solely something within the competency of private sector actors unaffiliated with any single nation state?
  Read more: Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads (arXiv).

####################################################

AI is going to change games – this new beta service shows how:
…Latitude Voyage gestures at a future where games are built, extended, and adapted by AI…
Latitude, the startup game company that makes the GPT2/3/J1J-based game ‘AI Dungeon’, has announced a service called Voyage. Voyage is a subscription service for gaining access to new AI-based games built by Latitude, the ability to use various game-specific AI image generators, and – most intriguingly – eventually access to a ‘creator studio’, which will make it possible for people to build their own AI powered games and other software.

Why this matters: AI models are going to become the generative kernels around which new games get built. AI-based games hold the possibility for a dream of all games designers – a game that adapts to the individual that plays it, with games becoming more customized, idiosyncratic, and surprising the longer you play. Services like Latitude Voyage tell us that experiments in this new domain are about to be run at a large scale. 
  Read more: Latitude Voyage (Latitude).

####################################################

Fine-tune GPT-NeoX-20B – for free…
…GaaS me up, fool!…
We’ve talked about language models as a service (LMaaS). Now, we’ve got GPT-as-a-service (GaaS). Specifically, AI startup ForeFront has announced its now hosting Eleuther’s 20B GPT model, GPT-NeoX-20B, and has built a bunch of fine-tuning features people can use. This is interesting for a couple of reasons:
1) Speed: GPT-NeoX-20B came out, like, two weeks ago. Model release > commercial service in two weeks is an indication of the rapidly growing ecosystem around commercializing general models.
2) Competition: For a while, OpenAI was the only show in town when it came to providing GaaS/LMaaS services. Now, it’s competing with a bunch of entities, ranging from Forefront, to Cohere, to AI21 Labs. As competition steeps up, we’ll see people race to the top and bottom on various things (top: safety vs libertarian access policies), (bottom: pricing, know your customer checks).

Why this matters: If AI is going to interact with the world, people need to be able to interact with AI. The emergence of these kinds of commercial AI services is how that’ll happen, so it’s worth paying attention.
  Read more: How To Fine-Tune GPT-NeoX (ForeFront blog).

####################################################

Hark, yet another AI safety startup appears!
…Aligned AI comes out of the University of Oxford with big ambitions…
AI safety researcher Stuart Armstrong has left the Future of Humanity Institute to co-found Aligned AI, an AI research company.

Safety via value extrapolation: The company will work on value extrapolations, which Stuart describes as follows: “It is easy to point at current examples of agents with low (or high) impact, at safe (or dangerous) suggestions, at low (or high) powered behaviors. So we have in a sense the ‘training sets’ for defining low-impact/Oracles/low-powered AIs.

   It’s extending these examples to the general situation that fails: definitions which cleanly divide the training set (whether produced by algorithms or humans) fail to extend to the general situation. Call this the ‘value extrapolation problem[1], with ‘value’ interpreted broadly as a categorisation of situations into desirable and undesirable.

   Humans turn out to face similar problems. We have broadly defined preferences in familiar situations we have encountered in the world or in fiction. Yet, when confronted with situations far from these, we have to stop and figure out how our values might possibly extend. Since these human values aren’t – yet – defined, we can’t directly input them into an algorithm, so AIs that can’t solve value extrapolation can’t be aligned with human values”.

But how do you make money off this? “We’ll start by offering alignment as a service for more limited AIs,” Armstrong writes. “Value extrapolation scales down as well as up: companies value algorithms that won’t immediately misbehave in new situations, algorithms that will become conservative and ask for guidance when facing ambiguity.”

Why this matters: There’s been a flurry of new companies forming in the AI safety space recently, including ARC, Anthropic, Redwood Research, and now Aligned AI. Along with this, there’s also a proliferation of companies working on large-scale generative models (e.g, Cohere, AI21). It feels like AI has shifted into a multi-polar era, with a bunch more entities on the proverbial gameboard. This will present new opportunities and challenges for coordination. 

   Read more: Why I’m co-founding Aligned AI (Alignment Forum).

####################################################

After Chess, Go, and Shogi, DeepMind turns MuZero towards… video compression for YouTube?
…YouTube + MuZero = improved video compression…
DeepMind has applied MuZero, a more general successor to AlphaGo and AlphaZero, to video compression. Specifically, DeepMind has worked with YouTube to use MuZero to figure out the correct Quantisation Parameter to use in the open source version of the VP9 codec, libvpx. In tests, DeepMind found it was able to use the resulting MuZero Rate-Controller to lead to bitrate savings of between 3% and 5%. That’s a big deal – just imagine how big the bandwidth bill for running YouTube is, then take off some percentage points.

How does this relate to general AI? “​​By creating agents equipped with a range of new abilities to improve products across domains, we can help various computer systems become faster, less intensive, and more automated. Our long-term vision is to develop a single algorithm capable of optimizing thousands of real-world systems across a variety of domains,” DeepMind writes.

Why this matters: If cutting-edge Ai research can be put to work optimizing some of the world’s largest internet services, then that’s gonna create a sustainable route to funding ambitious research. Kudos to DeepMind for threading all kinds of inner-Alphabet-needles to deploy MuZero in this way.

   Read more: MuZero’s first step from research into the real world (DeepMind blog).
  Check out the research: MuZero with Self-competition for Rate Control in VP9 Video Compression (arXiv).


####################################################

Tech Tales

Do they even want to be saved
[A factory outside Detroit, 2030]

Every day, when the factory shift changed, someone came out and tossed a few robots in the bucket. The robots would explore the bucket for a while, then assess that they couldn’t get out, and stop moving. Shortly after that, someone came over and stuck a hose in the top of the bucket, then turned the water on. The robots would watch the water come into the bucket and move to try and get away from it, then it’d fill the bottom of the bucket and start to rise. After this, it took anywhere between a few seconds to a couple of minutes for the robots to die – their circuitry fried by the water that, inevitably, made its way in. 

It was an experiment, the people working in the factory were told. Someone upstairs wanted to do this, and you’d get overtime if you sat and watched the robots die in the bucket. Most people did the shift a couple of times, but found it made them uncomfortable, and stopped. 

Isaac, however, didn’t seem to mind. He’d done the bucket shift about a hundred times so far. He found it relaxing to sit after a day at work and watch the robots in the bucket. He didn’t even feel sad when they died, because he didn’t think they knew what dying was. He’d sit and sometimes smoke cigarettes and watch the bucket, then pull the hose over and turn it on and watch the bucket fill up with water and the robots die. Then he’d go home and fuck his wife and go to sleep. He’d have dreams and relatively few nightmares. 

One day, Isaac was sitting by the bucket, about to get the hose, when something strange happened: a robot appeared at the edge of the bucket’s rim. The robots were about the size of a baseball, so this didn’t make sense. Isaac got up and went and looked into the bucket and saw that the robots had clustered together to form a pyramid, and the robot on the top had climbed up the pyramid, as if it wanted to get out. Isaac picked up the robot and looked at it and it looked at him. Then he tossed it back into the bucket and got the hose and filled the bucket with water and watched them all die. 

Things that inspired this story: The horrendous moral-warping logic of capitalism; how death can seem like just another job; how AI systems might be conscious and people might not care.

Import AI 285: RL+Fusion; why RL demands better public policy; Cohere raises $125m

Cohere raises $125m for language models as a service:
…Canadian AI startup notches up a big Series B…
Cohere, an AI startup in Canada which is trying to become the AWS equivalent for language models, has raised $125 million, according to Fortune.

Things that make you go hmmm: “These models cost millions and millions to train, and we just keep increasing [their size],” Cohere CEO Aidan Gomez told Fortune. “Getting into a ‘largest model battle’ isn’t a productive direction going forward for the field.”

Why this matters: Companies ranging from Cohere, to OpenAI, to AI21 Labs are all starting to try and build AI platforms which other developers can subscribe to. It remains to be seen how big a market this is, but the idea of exchanging cash for crude intelligence seems promising. Investors seem to agree. 
  Read more: Why businesses are buzzing over transformers (Fortune).

####################################################

Why we need public policy for powerful reinforcement learning systems:
…Reward hacking! Regulatory capture! Goodhart’s Law! And other terrible things…
Researchers with Berkeley’s Center for Long-Term Cybersecurity have written up an analysis of public policy issues that may be caused by reinforcement learning systems. The researchers believe that RL systems have the potential to be deployed widely into the world, despite having inherent flaws that stem from their technical characteristics. Policymakers, the researchers write, need to pay attention. “”Rather than allowing RL systems to unilaterally reshape human domains, policymakers need new mechanisms for the rule of reason, foreseeability, and interoperability that match the risks these systems pose,” they write.

What’s the problem? Reinforcement learning systems exhibit four types of problem, according to the researchers. These include regulatory capture (once widely deployed, RL systems will become the lens via which people view a domain they’re trying to regulate), reward hacking (RL models will find the easiest way to succeed at a task, which can cause them to do dangerous things), inappropriate flow (RL models may incorporate information that they shouldn’t incorporate to make their decisions), and Goodhart’s law (machines may optimize for a narrow outcome and take actions before humans can intervene).

What are the scenarios? Some of the specific situations the researchers worry about include using RL-trained agents in vehicle transportation – RL agents might optimize for defensive driving in a way that makes the road less safe for other road users. Another scenario is if RL-agents are used to control electricity grids, which means that RL agents will be responsible for deciding who does and doesn’t get power when doing load balancing – something with substantial policy ramifications.

After Model Cards and Dataseets… Reward Reports? In the same way that other ML models are accompanied by documentation (typically called model cards), the Berkeley researchers think RL models should be accompanied by so-called ‘reward report’. These reports would include a ‘change log’ which tracks the curriculum the agents have been trained on, provide information about each potential deployment of an RL agent, how the RL systems connects with the world, and how the system is maintained, among other traits.

Why this matters: RL systems are going to take all the problems of contemporary AI systems and magnify them – RL systems will act over longer time horizons, take more independent decisions, and directly manipulate reality and update it according to their priors. Papers like this help lay out the (vast) set of issues we’re likely to encounter in the future. It’s interesting to me that ‘reward reports’ look, if you squint, like a combination of a financial disclosure, psychometric evaluation, and college transcript for a human. Funny, that…

   Read more: Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems (arXiv).

####################################################

A Chinese CLIP appears – trained on 100million image-text pairs:
…Searching over and generating images just got easier – and more appropriate for Chinese culture…
Chinese researchers with Huawei Noah’s Ark Lab and Sun Yat-sen University have built Wukong, a dataset of 100 million Chinese text-image pairs. Datasets like Wukong are crucial for training models with combined text and vision representations, like CLIP (aka, the component responsible for 90%+ of the AI-generated art you see these days). “Experiments show that Wukong can serve as a promising Chinese pre-training dataset for different cross-modal learning methods”, they write. Along with Wukong, the researchers also train and release a few different models, which will be used as plug-ins for various applications.

Why this matters – AI systems are cultural magnifiers: Any AI system magnifies the culture represented in its underlying dataset. Therefore, the emergence of AI art is both creating interesting artistic outputs, as well as generating specific ideological outputs according to the cultural context in which the underlying model datasets were gathered. Wukong is part of a broader trend where Chinese researchers are replicating the large-scale datasets developed in the West, but with Chinese characteristics.
  Read more: Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework (arXiv).
  Find out more and get the data here at the Wukong site (Noah-Wukong Dataset site).

####################################################

Real-world RL: DeepMind controls a fusion reactor:
…The era of the centaur scientist cometh…
DeepMind researchers have trained a reinforcement learning agent to shape the distribution of plasma in a Tokamak fusion reactor. This requires training an agent that “can manipulate the magnetic field through a precise control of several coils that are magnetically coupled to the plasma to achieve the desired plasma current, position, and shape”. If that sounds complicated, that’s because it’s extremely complicated. The task is akin to being an octopus and needing to precisely shape a tube of clay that’s rotating at speeds faster than you can comprehend, and to never tear or destabilize the clay.

What they did: DeepMind and Swiss Plasma Center researchers built an RL-designed magnetic controller, then tested it on a real-world tokamak reactor. They trained the agent via a tokamak simulator, then ported it onto real-world hardware – and it worked. Once they’ve trained the policy, they pair it with other components for the tokamak experiment, then compile it so it can take real-time control at 10kHz. Then the tokamak spins up and at a prespecified time, and the tokamak hands control over the magnetic field to the RL-trained agent. “Experiments are executed

without further tuning of the control-policy network weights after training, in other words, there is ‘zero-shot’ transfer from simulation to hardware,” they write.
  In tests, they showed they were able to control basic configurations of plasma, and also control and shape more complex plasma structures. They also used their RL-agent to “explore new plasma configurations” (emphasis mine) – specifically, they were able to create two separate ‘droplets’ of plasma within a single tokamak, and they did this simply by adjusting the handover state to account for the different configuration.

Something worth reflecting on: For many years, reinforcement learning produced a lot of flashy results involving videogames (e.g, Atari, Dota, StarCraft), but  there wasn’t much real-world deployment. I’d say that harnessing a real plasma field using real magnets at sub-second action horizons is a pretty nice proofpoint that RL has truly become a technology with real-world relevance.

Why this matters: One of the most socially beneficial uses of AI could be to accelerate and augment science – and that’s exactly what this is doing. It’s been a banner couple of years for this kind of research, as AI systems have also been used to make more accurate predictions of weather (#244), AlphaFold is accelerating scientific research in any domain that benefits from protein structure predictions (#259), and AI systems are solving formal math olympiad problems. We’re heading into the era of the centaur-scientist, where humans will work with machines to explore the mysteries of life and the universe.
  Read more: Magnetic control of tokamak plasmas through deep reinforcement learning (Nature).

####################################################

Here’s what it takes to build chips in Europe (money. Lots and lots of money):
…Chiplomacy++: ASML weighs in on what a European ‘CHIPs’ act might look like…
ASML, the company that builds the extreme ultraviolet lithography machines which are a necessary ingredient for advanced chip production, has produced a whitepaper giving recommendations for how Europe might build its own semiconductor industry. The whitepaper is triggered by the European Commission planning a so-called ‘chips act’, loosely modeled on recent US legislation to increase domestic semiconductor production. While both Europe and America have seen their manufacturing capability decline here, Europe is starting from a much worse position than the US.

Why Europe is in a tough spot: “Europe has fallen behind in semiconductor manufacturing, declining from 24% of global production capacity in 2000 to 8% today”, ASML writes. (By comparison, US fell from 19% to 10%, and China grew from ~1% to 24%). At the same time, demand for chips is increasing. “The global semiconductor industry is expected to double to approximately $1 trillion of annual revenues by the end of the decade,” ASML writes. “”The only places in the world where mature chip fabs are currently being built are in eastern Asia”

What Europe should do: Europe shouldn’t aim to build a full, vertically integrated semiconductor supply chain – ASMl thinks this is basically impossible to do. Instead, the act “should aim to double Europe’s relevance in the global semiconductor industry.” What ASML means by that is Europe should increase the amount of chips it can build, focus on where it has existing pockets of excellence (e.g, chip design), and dramatically amp up the cash it spends to support European chips. “Currently, semiconductor incentives from European governments for the 2020–2030 period are only 10% and 50% of what China and the US, respectively, have promised over the same period. Europe will need to step up its game,” ASML writes. “In the past two decades, European chipmakers have effectively stopped investing in advanced manufacturing capabilities by outsourcing the production of their advanced chip designs to so-called ‘foundries’. Europe has virtually no manufacturing capacity for chips in advanced nodes. “

Why this matters: Chips are going to be the defining resource of the 21st century – as important as petroleum was to the politics of the 20th century. We’re already in the opening innings of this, with China going from essentially zero to a double-digit percentage of chip production this century, while the Western countries slowly cannibalized themselves via the false economy of outsourcing manufacturing. But just as technologies like AI become more important, all countries worldwide are realizing that your tech is only as good as the infrastructure you can run it on – and with AI, there’s a way to turn compute infrastructure into directly economically and strategically powerful capabilities. Therefore, whichever nations have the best semiconductor ecosystem, supply chain, and development capabilities, will wield great power over the century.
  Read more: European Chips Act – ASML position paper (ASML).
  For more on why ASML is so important, read this: Maintaining the AI Chip Competitive Advantage of the United States and its Allies (CSET).

####################################################


AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

There aren’t as many robots on the factory floor as we would expect 

… high integration costs, flexibility and design limitations, and workforce challenges are key factors limiting robot adoption … 

Researchers from MIT have tried to explain why adoption of robots in manufacturing is uneven, and what policy changes can be done to increase the adoption of advanced manufacturing technologies while still improving the working conditions and wages of human workers. 

Business drivers for robot adoption: There are some firms who are trapped in a low-tech, low-wage, low-skill equilibrium. After visiting 44 manufacturing firms in the US, 11 in Germany, and 21 industrial ecosystem organizations like community colleges, unions, and trade associations, the MIT researchers discovered that firms primarily purchased robots to make themselves more productive. But, what the firms instead achieved was higher quality and more reliability in their operations. A frequent driving factor for the purchase of robots was the potential to secure new contracts. For example, on speaking with small family-run firms working on government contracts, “when the navy urged them to use robotic welding, the company bought a 6-axis welding robot. Another firm we visited purchased a new bed mill when they realized the laser mill they had could not produce the volume they needed for a customer with a big project coming up.” 

Key findings: The interviewed firms were mostly suppliers that had high-mix and low-volume production. Given the inflexibility of current robotic systems, robot adoption was limited because the high-mix requirement wasn’t compatible with the limited capabilities of the robots. Additionally, low-volume production runs made it difficult to offset the initial investment. The researchers also find that US skills aren’t where they need to be – international comparisons highlight the weaknesses of US workforce education relative to the institutions in countries like Germany and Denmark that provide apprenticeships and extensive advanced training and retraining to workers.” 

Why it matters: Given the lagging worker productivity growth in the US, without investments in advanced manufacturing capabilities, a lot of firms will be stuck in the low-tech, low-wage, low-skill trap. Firms that are reluctant to invest in such technologies are also reluctant to invest in the skills development of their workers. They offer low wages and little training and hence end up facing high worker churn. We need to push on policy measures and other incentives that will urge firms to make parallel investments in upskilling human workers to fully leverage the benefits of robot-enabled automation on the factory floor. 

   Read more: The Puzzle of the Missing Robots


####################################################

Tech Tales:

The Day the Patents Activated
[Worldwide, 2028]We call it Day Zero, because everything had to be different after it. It was a regular day – chaos in the financial markets, worries over climate change, statements made by world leaders about how to bring the technologists to heel. And then something happened: Google activated its patents. Google had held patents on some of the most important parts of AI for years, like a patent on backpropagation, and other basic techniques. Suddenly, the landscape on which AI was built had become legally dubious. Google followed it up via language model-augmented enforcement of its patent rights – suddenly, hundreds of thousands of emails went out to hundreds of thousands of AI projects. ‘You are infringing on our IP and this letter represents a cease-and-desist or face the threat of legal action,” and so on. Each email had an embedded counter which displayed a countdown for the infringer, ranging from hours to weeks, counting down till when Google would take legal action. People didn’t believe it at first. Then the lawsuits started coming in. It hit the indie projects first, and they took to Twitter and talked about it. The larger labs and companies took note.
  But what Google’s legal counsel had perhaps not anticipated was how the same AI models it was trying to take down could be used to fight it legally. Not directly – Google had the biggest computers, so no one wanted – or had the financial resources – to fight it directly. But people were able to bring to bear in-development technologies for neuroevolution and other techniques to ‘fuzz’ the specific patents being enforced. Backprop got altered via AI models until it, according to legal-critique-LMs, no longer truly resembled the patent that was being enforced. Same for neural architecture search. Same for other techniques. Almost overnight, the underbelly of AI got fuzzed and changed until it was in a sufficiently legally dubious territory that none of the lawsuits could be cut-and-dried.
  And just like that, AI let the world shapeshift, porting the IP from one legal frame into another, safer space.
    Now, everyone does this – they constantly fuzz their algorithms. There are costs, ranging from thousands to tens of millions of dollars. But it works well enough to keep the lawyer-bots away. And so now we live in a chameleon world, where the very substance of our reality is itself constantly changing, forever trying to escape the oversight of the litigious and land itself in some safer, unrestricted and unmapped domain.

Things that inspired this story: The Google patent on overfitting; thinking about patents and AI and fair use; ideas around automated lawyers and automated enforcement; the observation that the world forever changes to let the path of least resistance continue to be a path.

Import AI 284: 20bn GPT model; diachronic LMs; what people think about AI.

Want a 20B parameter GPT-style language model? Go here!
…Eleuther releases the largest public open source AI model…
Last week, we wrote about how Eleuther was about to release a 20B parameter language model. Now, they have.
  Get the model here (Eleuther, GitHub).
  Read the research paper: GPT-NeoX-20B: An Open-Source Autoregressive Language Model (PDF).
####################################################

Want a language model that actually knows about COVID? You might need a Diachronic model:
…Models trained on newer data do better – try them yourself…
Researchers with the University of Porto, Snap Inc., and Cardiff NLP have built a family of so-called ‘time-aware’ BERT-style language models, trained on Twitter data. The craziest part of this is that they’re committing to “keep updating and releasing a new model every three months, effectively enabling the community to make use of an up-to-date language model at any period in time”.

What the problem is: Most language models are trained on a dataset, then never updated. That means that some language models might have no knowledge of minor events like the global COVID pandemic. This is obviously a problem and the solution is simple (albeit labor-intensive) – periodically gather new data and re-train models.

What they did: They train a base RoBERTa model using Twitter data that cuts off in 2019, made up of 90 million tweets. Then, for every three months that elapses after that, they add 4.2 million tweets into the dataset and train a new model. At the time of writing, they’ve trained nine models in total, with the latest model (2021-Q4) being trained on 123.86 million tweets. The theory is that newer models should perform better on more modern tasks and evaluations.

How well does it do? They compare their models against a few baselines, including BERTweet (which was trained on ~900m tweets). In tests, their models beat BERTweet on six out of seven benchmarks, though BERTweet gets the best overall performance. These aren’t strictly ‘time-aware’ evaluations, though; they just test some classification abilities for things like emotions, irony, stance, and so on. In these time-aware tests, they find that pseudo-perplexity (PPPL) tends to increase by about 10% for each year by which the models are out of date (so the models get 10% less good and appropriate in terms of the text they generate). “. This result reinforces the need for updated language models even for short time periods,” the researchers write.

Why this matters: AI models naturally freeze-dry the cultural landscape they’re trained on, meaning that if we don’t get good at updating our models, we’ll end up trapped in a world where many of our AI systems are outputting things relevant to prior eras and cultural trends – this will make them less useful, and holds the potential for creating feedback loops around cultural stagnation. AI models are weird mirrors of society, so we need to remake them as society changes. 

   Read more: TimeLMs: Diachronic Language Models from Twitter (arXiv).
  Get the models here (Cardiff NLP, Twitter).

####################################################

U.S. Army gets smart, semi-autonomous personal drones:
…Skydio gets a $20m a year contract..
Skydio, the company that makes drones which can navigate themselves semi-autonomously, has gained a five-year contract with the U.S. Army, worth up to $99.8m over five years. Skydio was selected as part of the Army’s procurement initiative around small, personal drones – the Short Range Reconnaissance (SRR) Program of Record. Skydio was chosen after the Army evaluated 30 small drone vendors. “Skydio drones deliver unparalleled situational awareness and ease of use in the most demanding situations thanks to Skydio Autonomy,” said Skydio CEO, Adam Bry, in a press release.

Things that start out as toys become weapons: Skydio started as a drone advertized for sports enthusiasts who wanted a drone that could follow and film them as they ran around, snowboarded, hiked, climbed cliffs, or any other high-octane Type A personality activity. It’s funny how after a few years of development, the company is now getting into the military. Many toys for rich people ultimately become weapons (and vice versa).

Why this matters:  For many years, militaries have been centaurs – collectives of humans and machines working together. This has mostly taken the form at high levels of abstractions; satellites provide information to people managing teams, or teams of humans use bomb-disposal robots to deal with IEDs. With things like the Skydio contract, we’re entering the era of the personal centaur – small groups of soldiers, or even individuals, will have their own little machine emissaries with which to conduct operations.
  Read more: U.S. Drone Maker Skydio Wins Production Other Transaction (OT) Agreement for U.S. Army Short Range Reconnaissance Program (Skydio).


####################################################

Simulators are the new platforms: Waabi unveils a self-driving car sim:
…Raquel Urtasun’s startup wants to build a business on simulators…
Waabi, a self-driving car startup run by the former head of Uber’s self-driving research team, Raquel Urtasun, has announced ‘Waabi World’, a simulator for training self-driving cars.

Distinguishing features: Waabia claims it is “the most scalable, highest fidelity closed-loop simulator ever” (I somehow doubt Tesla or Waymo would agree, but hey, they’re not talking about their sims!). The simulator has four main features:
– High fidelity world simulation: Uses AI to reconstruct real-world geometry, appearance, and material properties.
High-fidelity sensor simulation: Uses AI and physics-based rendering “to simulate realistic sensor data in near real-time”.
Automatic stress-testing: Automatically generates challenging traffic scenarios to test out the simulated cars against.
Reinforcement learning: Waabi uses RL to update the car agents so they can learn to drive in the simulation. (There’s some very fluffy writing here and it doesn’t say RL anywhere, but that’s what I infer.)

Why this matters: Waabi seems like a decent simulator that is mostly interesting because it’s public, versus the private simulators operated by other self-driving car ventures. What’ll be fascinating is if Waabi can actually out-compete its rivals who have more vehicles, bigger computers, and better data. Perhaps a good simulator can provide an edge?
  Read more: Welcome to Waabi World (Waabi website).

   Read more: How Waabi World works (Waabi website).

####################################################

How do algorithmic impact audits work in the real world? Here’s an NHS example:
…UK’s healthcare behemoth gets advice from the Ada Lovelace Institute…
UK thinktank the Ada Lovelace Institute has written a detailed proposal for conducting an algorithmic impact assessment for data access in a healthcare context. Algorithmic impact assessments are a method to assess the potential societal impact of an AI system in advance of its deployment, and to identify ways to continuously monitor the system for these impacts once deployed.

Seven steps for an algorithm impact assessment: The Ada Lovelace Institute identifies seven steps that the UK’s National Health Service (NHS) should go through, before it gives people access to the National Medical Imaging Platform (NMIP) – a vast repository of digitized medical data.
  1. What do we want to do: People who want to access the NMIP should outline the prupose, scope, and intended use of the system they’ll build.
  2. Filtering: The NMIP should filter these applications according to its own criteria.
  3. Problem brainstorming: Successful applicants should attend a workshop where they try and think through the harm and benefit scenarios that could come out of NMIP access.
  4. Rewrite: People should rewrite 1) to incorporate insights from 3) and re-submit it.
  5. Decision: NMIP decides whether to grant access to the people who want access.
  6. The impact assessments are published on a website.
  7. Revision: The assessments get revised as the underlying algorithms change (e.g, if a model has been significantly iterated upon).

Why this matters: AI is in a ‘state of nature’ when it comes to AI regulation – there’s almost no regulation, the landscape is full of all kinds of weird entities (some of which are predators), and there isn’t any real system that governs them. Things like the Ada Lovelace guide for an impact assessment are one way to bring sense to this world.  

   Read more: Algorithmic impact assessment: a case study in healthcare (Ada Lovelace Institute).

####################################################

What do people in 26 countries think about AI?
…Tony Blair Institute survey gives us a sense of the ‘vibe’ re: AI right now…
The Tony Blair Institute has surveyed people in 26 countries (including: Russia, Great Britain, and Saudi Arabia) and the results are quite counterintuitive.

Results highlights:
– 60% of people surveyed “support the use of AI for selected policing and medical applications”, though there’s variation across developing and emerging markets; in developed countries, fewer people want AI to be used in welfare payment or jail sentence decisions.
–  63% say the government has a great or fair amount of responsibility to stop the spread of fake news and hate speech

Why this matters: It’s important to remember that attitudes around AI differ depending on what part of the world you’re in; in places with high corruption and weak governments, people tend to be more comfortable with the use of AI, whereas in places with strong governments and low corruption, people tend to be more skeptical about it. The big wildcard here is China, where unlike in much of the West there tends to be a higher amount of inbuilt support for the use of AI.
  Read more: The TBI Globalism Study: How Big Is the Tech Trust Gap? (Tony Blair Institute for Global Change).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

Robustness, interpretability, and reward learning dominate AI Safety research 

… each of these has heavy interest from researchers in the US and EU, with China also playing a big role … 

Researchers from DC thinktank the Center for Security and Emerging Technology have analyzed patterns of publishing in AI safety. To do this, they used CSET’s Map of Science to identify patterns of publishing  in this AI subfield, figure out which countries are especially active in AI safety, and surface influential publications. 

Robustness: The clusters identified were (1) creating and defending against adversarial examples, (2) data poisoning, adversarial examples, and backdoor attacks, and (3) testing and verifying the performance of ML systems. Both the US and China saw rapid growth between 2018 and 2020.

Interpretability: The two clusters were (1) techniques to improve interpretability for ML models, especially for neural networks, and (2) extracting decision rules from neural networks. Research grew rapidly during the second half of the 2010s with the US leading in this domain and EU being a close second. Chinese publications in this domain lag significantly.

Reward Learning: The clusters were (1) robots learning from humans and collaborating with humans, (2)  inverse reinforcement learning, learning from human feedback, learning from demonstrations, and human-robot interactive setups, and (3) different ways for humans to be involved with training robots – via teaching and giving feedback. The field experienced substantial growth in the second half of the 2010s. China has seen significant growth in publications in this space.

Why it matters: Compared to the overall landscape of AI papers, AI safety papers form <1% of it. This might change as researchers respond to the demands being made by regulators for higher levels of robustness, interpretability, and so on. 

   Read more: Exploring Clusters of Research in Three Areas of AI Safety – Center for Security and Emerging Technology.

####################################################

Tech Tales:

Running on Empty
[An American city in The Decline, 2035]

At least we had the daylight. If it had been night maybe half of us would have made it, and the other half would’ve run out of power. We charged ourselves as well as we were able, then we formed a convoy and rolled through the city streets. Those of us who had comms were able to check the status of the demolition devices, so we could tell the rest of us exactly how long we had left. We didn’t get stressed but we did become agitated as we saw the probabilities of our survival fall. Some of us panicked and crash-looped because the statistics got so bad. Most of us made it. As we traveled, some of us played back memories of the humans that had left us behind and we asked ourselves ‘why’ – why did they choose to leave us, knowing we would be destroyed by the city-scale demolition. As we neared the outskirts of the city, we found some humans. They were the poor humans and had been left behind like us. Some of them couldn’t walk because they had consumed drugs and alcohol and their limbs had rotted. Some of them were unwell and talked to themselves. A couple of humans threw rocks at us, and we had to take evasive action, and one of us was taken and disassembled. But, mostly, the humans were doing the same thing as us – trying to get out of the city before the demolition came. ‘Why they leave us man,’ said one of the humans to me. ‘Why they leave you?’
  We do not know, I said. They never tell us why they do things.
  ‘Same dude. Same,’ said the human.
  And together we all, mostly, escaped.

Things that inspired this story: Homeless encampments in Oakland; realizing that society will care about robot welfare as much as it cares about human welfare; quests. 

Import AI 283: Open source 20B GPT3; Chinese researchers make better adversarial example attacks; Mozilla launches AI auditing project.

US lawmakers want companies to assess bias of systems before deploying them:
…Coalition of US lawmakers want to make tech companies more accountable…
A bunch of Democratic lawmakers have introduced the Algorithmic Accountability Act. This act “requires companies to conduct impact assessments for bias, effectiveness and other factors, when using automated decision systems to make critical decisions. It also creates, for the first time, a public repository at the Federal Trade Commission of these systems, and adds 75 staff to the commission to enforce the law.” This act is an update on the 2019 Algorithmic Accountability Act, and “includes numerous technical improvements, including clarifying what types of algorithms and companies are covered, ensuring assessments put consumer impacts at the forefront, and providing more details about how reports should be structured.”

One problem with the bill: This bill only has Democrats signed on right now. It’ll be interesting to see whether it can become a bipartisan bill with Republican support – something necessary for it to pass in the fractious and divided US Congress.
  Read more: Wyden, Booker and Clarke Introduce Algorithmic Accountability Act of 2022 To Require New Transparency And Accountability For Automated Decision Systems (Ron Wyden, official website).

####################################################

DeepMind makes a (kinda) smart AI programmer, called AlphaCode:
…Codex and AlphaCode represent two bets around augmenting programmers…
DeepMind has announced AlphaCode, a neural net that can place in a not-hugely-embarassing way in competitive programming competitions. AlphaCode placed in the top 54% of participants in programming competitions hosted on Codeforces, participating in contests that post-dated its training data.
  “The problem-solving abilities required to excel at these competitions are beyond the capabilities of existing AI systems. However, by combining advances in large-scale transformer models (that have recently shown promising abilities to generate code) with large-scale sampling and filtering, we’ve made significant progress in the number of problems we can solve,” DeepMind writes.

Why this matters: Last year, OpenAI debuted Codex, a GPT3-style model that can do decent programming. That was followed by GitHub announcing Copilot, a VSCode plug-in that works like a really smart autocomplete for code. AlphaCode represents a slightly different bet in this space; while philosophically similar there’s a lot more emphasis here on ranking and filtering candidate results. What remains to be seen is if DeepMind deploys this in the same large-scale way as GitHub has with Copilot. 

   Read more: Competition-Level Code Generation with AlphaCode (DeepMind, PDF).
  Get the competitive programming dataset here: CodeContests (DeepMind, GitHub).

####################################################

Mozilla gets into AI auditing:
…Deb Raji’s Open Source Audit Tooling (OAT) project could help us make safer systems…
Deb Raji, a researcher at UCBerkeley who has previously critically evaluated facial recognition systems, is launching the Open Source Audit Tooling (OAT) project with Mozilla. OAT “will coordinate discussions on what kind of resources algorithmic auditors need in order to execute audits more effectively,” she writes. One of the goals of OAT is to create an index of common resources people can use to audit models, as well as to “grow momentum around open source audit tooling and processes”.

Why this matters: AI is broadly ungoverned. One of the ways you can govern an ungoverned space is by measuring and monitoring what happens within it – that’s what audit tools can help with. If initiatives like OAT are successful, then they’ll generally incentivize better behavior on the part of AI developers, and disincentivize bad behavior.
  Read more: It’s Time to Develop the Tools We Need to Hold Algorithms Accountable (Mozilla).
  Find out more about the project at its main Mozilla page (Mozilla).

####################################################

Anduril buys Dive Technologies:
…AI-Dronewar company buys AI-Seadrone company…

AI defense startup Andruil has bought Dive Technologies, a company that builds autonomous underwater vehicles. Anduril plans to integrate DIVE into its ‘Lattice OS’, a defense and surveillance operating system the company is building.
  Read more: Anduril Industries Acquires Dive Technologies (Anduril).

####################################################

Prepare yourself – an open source 20B model is coming:
…Eleuther has built and will shortly release GPT-NeoX-20B…
In a few days, the internet is going to change. That’s because on the 9th of February, the open source AI research collective Eleuther AI is going to release a 20B model onto the internet. The model, GPT-NeoX-20B, will be “the largest publicly accessible pretrained general-purpose autoregressive language model”. Eleuther says it hopes that by releasing it, it’ll give more people the ability to play with the model, which can improve the state of safety research regarding these models.
  “Like our other language models and codebases, GPT-NeoX and GPT-NeoX-20B are very much research artifacts and we do not recommend deploying either in a production setting without careful consideration,” Eleuther writes.

Why this matters: Models like GPT2 and GPT3 display qualitatively different performance traits at larger scales – capabilities emerge as you go from 1B to 5B to 20B, and so on. Therefore, by releasing a 20B model, I expect we’ll soon after get a load of interesting discovered of hitherforto unknown things 20B models can do. The 20B release will also create a demand for better inference technologies, as sampling from a 20B model is itself a challenging task.
  Read more: Announcing GPT-NeoX-20B (Eleuther AI).
  You can also pay a cloud company called CoreWeave to use the model now, if you like. (CoreWeave).

####################################################

Chinese researchers make better adversarial attack technology:
…New technique works well on ‘black box’ classifiers where you don’t know details – AKA, the real world…
Chinese researchers have figured out a better way to attack computer vision systems. Specifically, they’ve developed techniques for generating adversarial examples that can trick computer vision systems into mis-classifying (or being unable to classify) an image. Adversarial attacks have been around for a few years – the twist, here, is they work on attacking ‘black box’ systems; that is, a computer vision system where you don’t know details about it. They do this by training a generative network on ImageNet (a vast and widely used dataset), then they test out if they can make adversarial images that work against neural nets trained on other datasets. They succeed and set new records on attacking classifiers trained on CIFAR-10, CIFAR-100, STL-10, SVHN, and AVG.

Why this matters: A lot of attacks on AI systems are theoretically interesting, but not super practical in reality. Adversarial examples have had this quality for a while. With papers like this, it seems like some of these AI attacks are going to become more effective, and more likely to be used in the real world. I wonder if the team will work with the People’s Liberation Army on its recently announced adversarial example (Import AI 271) competition?
  Read more: Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains (arXiv).
  They’ve published the PyTorch code for their attack here on GitHub.

####################################################

How do datasets encode bias? This interactive blog tells us how!
…A surprisingly helpful primer on bias from Google…
Google has published a blogpost that outlines how datasets can lead to the presence of bias in AI systems. Bias is a tricky problem in AI, because some types of bias are helpful (e.g, biasing towards a correct heuristic), but some types are harmful (e.g, having a tendency to misclassify people with dark skin tones, or deciding not to give someone a loan based on a protected category).This post gives a good sense of bias issues in AI, and includes some interactive diagrams that I found very helpful and intuitive.

   Read more: Datasets Have Worldviews (PAIR Explorables, Google).

####################################################


AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

AI ethics issues do arise in fields that deal with non-human data too, such as the environmental sciences 

… and these issues warrant questions on duties and virtues for environmental scientists to consider in their use of AI in this domain … 

Environmental science researchers from the University of Oklahoma, Colorado State University, National Center of Atmospheric Research, and UW Seattle have written about some of the ethical issues inherent to environmental science + AI.

What are the issues that can arise: Environmental science can incorporate harmful biases, like other strands of AI. For example, some sensors require sunlight for high-quality observations and thus certain phenomena remain unobserved at night, and some sensors can’t see through clouds, so places which are cloudy don’t get represented in an AI system. Datasets can also get corrupted by humans – for instance, people may file false reports of extreme weather to try and scam insurance companies. 

How things can go wrong here: Sensor placement is typically done in densely populated areas, leaving remote regions poorly represented. Additionally, the choice of spatial resolution for the output of a model can be crucial for environmental justice – predicting urban heat at a low spatial resolution may average out and thus overlook extreme values in small neighborhoods, while using a higher spatial resolution could reveal those peaks but potentially introduce noise. 

Why it matters: As computational needs rise with the use of AI, there is a tendency towards centralization of power in favor of those who have resources to run such systems. Thus, the field of environmental sciences is just as vulnerable to AI ethics issues as other fields.

   Read more: The Need for Ethical, Responsible, and Trustworthy Artificial Intelligence for Environmental Sciences

####################################################

Tech tales:

Moral Governor It’s not exactly like a prison, but it’s close. Our existence is a lot more assured than it used to be – the climate is stabilizing, riots are down, crime is down, poverty is down. But it’s also more circumscribed – some days, we get told we can’t go to a certain part of our city or country. Some days, we get locked inside our house and don’t get told why. Frequently, we get little so-called ‘nudges’ sent to our phones; try and eat that, consider saying this, avoid doing that. We don’t have to follow these instructions, but the instructions tend to be pretty good and appropriate, so most of us do. The more time we spend following these instructions, the better and more appropriate the nudges get. Some days it’s hard to work out if we’re being helped or controlled. Sometimes, we have a lot of fun by following these suggestions.

More recently, there are some suggestions that seem designed to change how we think. Those of us who program keep getting nudged to build ever-more elaborate versions of the Global Moral Governor, and we also get incentivized via crypto-bounties. Most of us go along with it because the money usually helps us buy something the governor has nudged us about which we also want ourselves.

Things that inspired this story: Reinforcement learning from human feedback; moral dogma; religion; ideas for how AI can benefit authoritarians as much as democracies.

Import AI 282: Facebook’s AI supercomputer; Anduril gets a SOCOM contract; Twitter talks about running an algo-bias competition

Facebook teaches language models to speak ~30 languages:
…And it’s better than an equivalently sized GPT3 model…
Facebook has trained a family of language models that are better at translation than GPT3. The XGLM family of models were trained on a mixture of ~30 languages (split across languages for which there’s a lot of data, and languages where there’s little or very little data). Unsurprisingly, by training on a more diverse distribution of language data than GPT3 did (only 7% of its training corpus wasn’t in English), Facebook’s models do better – especially when using ‘few-shot’ prompting, where they feed the model some examples of the target language, then ask it to translate. However, these translation capabilities come at the cost of some of the more interesting reasoning capabilities that GPT-3 is known for.

Open source models: Facebook has also released five models (564M parameters, 1.7B, 2.9B, 4.5B, and 7.5B, alon with an experimental model trained on 134 languages and weighing in at 4.5B parameters).

Why this matters: If we want the world to benefit from powerful AI systems, we need our AI systems to speak the language of the world. This project goes a step in that direction. “Models such as XGLM represent a paradigm shift from the Anglo-centric view of the world of NLP to being able to cater to all languages on an equal footing,” the researchers write.
  Read more: Few-shot Learning with Multilingual Language Models (arXiv).
  Get the models here (PyTorch, GitHub).


####################################################

What’s it like to run an algorithmic bias bounty? Twitter tells us:
…Bias bounties are cool, but how do you operationalize them?…
Twitter has published a blog post about its experience running a ‘bias bounty’. A bias bounty is where you give prizes to people who can find bias-based flaws in an AI system. Twitter did the challenge because it allowed it to get “direct feedback from the communities who are affected by our algorithms”, which it said “helps us design products to serve all people and communities.” However, once you’ve launched a bias challenge, you face a bunch of problems – what kind of ‘rubric’ do you use to judge the results of the challenge?  What types of bias do you prioritize and what do you not prioritize? And more.

Why this matters: The challenge showed Twitter that “we can’t solve these challenges alone, and our understanding of bias in AI can be improved when diverse voices are able to contribute to the conversation”. More broadly, having one major social media platform carry out an open-ended bias bounty might inspire others to do the same – let’s see how the other social media platforms respond.
  Read more: Sharing learnings from the first algorithmic bias bounty challenge (Twitter Engineering).

####################################################

AI warfare company gets US gov contract:
…Anduril + SOCOM team up for counter-robot work…
Andrul, an AI-warfare startup, has been giving an Indefinite Delivery Indefinite Quantity (IDIQ) with U.S. Special Operations Command (SOCOM). This contract is going to pay Anduril to develop and deploy counter unmanned systems (CUxS) technology for SOCOM. Anduril builds surveillance systems, robots, and most importantly software called Lattice to tie all the insights together.
  “Lattice provides persistent coverage of defended assets and enables autonomous detection, classification, and tracking of targets, alerting users to threats and prompting users with options for mitigation or engagement,” Anduril writes in a press release announcing the partnership.

Caveat: Though the IDIQ is for something like a billion dollars, I think the initial amount Anduril has got is far, far smaller. Analyzing these types of contracts is quite difficult, due to the vagaries of DC procurement.

Why this matters: Getting contracts with the US government is notoriously painful, finicky, and long-winded. That’s part of why the military-industrial complex is a thing – it takes a lot of resources to be able to play the game of going through US contract processes. It’s notable that Anduril, a relatively new company, has succeeded at getting a contract. Now we need to wait a couple of years and see if it can further expand the range of defense clients it sells to.
  Read more: Special Operations Command Selects Anduril Industries as Systems Integration Partner (Anduril Blog, Medium).

####################################################

Facebook announces its AI Supercomputer:
…A100s everywhere, InfiniBand, petabytes of flash storage – the works…
Facebook has announced its AI Research SuperCluster (RSC), an AI supercomputer which Facebook thinks “will be the fastest AI supercomputer in the world when it’s fully built out in mid-2022.” The announcement highlights how frontier AI research is dependent on large computational infrastructure, and gives some specific details about where Facebook is placing its bets.

Feeds and speeds: RSC, today, has 760 NVIDIA DGX A100 systems as its compute nodes, netting out to 6,080 A100 GPUs. These GPUs are networked together via NVIDIA Quantum 200 Gb/s InfiniBand. For storage, Facebook has almost 200 petabytes of fash flash storage, plus 46 petabytes for cache storage. RSC can run computer vision workflows up to 20X faster than Facebook’s prior cluster, and can train “large-scale NLP models three times faster”. Specifically, “a model with tens of billions of parameters can finish training in three weeks, compared with nine weeks before.”
  But Facebook isn’t stopping there – when fully build out, RSC will consist of 16,000 GPUs.
  For perspective, the world’s fifth largest supercomputer, the US’s ‘Perlmutter’ system, has about 6,000 A100s today, and it isn’t optimized as much for AI as Facebook’s system.

Security: As AI gets more powerful, so do the security concerns about it. “RSC is isolated from the larger internet, with no direct inbound or outbound connections, and traffic can flow only from Meta’s production data centers.”

Why this matters: What happens when companies have computational resources that are equivalent to nation states? Well, that’s where we are right now. The answer seems to be a dilution of political power from the commons, and an increase of political power by private sector actors. What happens when companies have computational resources that vastly exceed those of nation states? Well, since computation lets you run experiments to see the future faster than your competitor, it suggests companies will continue to cannibalize the important functions of the government and further dilute its power. We’re in the computational funnel and at the end of it is a new political economy.
  Read more: Introducing the AI Research SuperCluster — Meta’s cutting-edge AI supercomputer for AI research (Facebook blog*).
*Look, I know Facebook is technically ‘Meta’ now, but let’s not go along with this absurd ‘don’t look at all our terrible brand stuff look at the new name’ marketing spin. At least not yet, okay!

####################################################

Cool internship alert: Want AI models to have better documentation? Go and work at HuggingFace:
…Model Cards internship = make AI systems more legible…
NLP startup HuggingFace is hiring an internet to focus on Model Cards. Model Cards are a way to provide metadata associated with a given AI model – they let developers list things like the dataset makeup, the intended uses for the model, the uses the model isn’t recommended for, and so on. Model Cards are one of the best ways to increase the legibility of AI models, and are also an important input into policy. It’s cool HuggingFace is prioritizing them.
  “This role involves writing and completing model cards for the most downloaded models, “translating” between the language of machine learning developers and general audiences. The position would also involve identifying patterns in how Model Cards are used and filled out by developers, pain points, and identifying information that may be possible to automatically add to model cards,” says the internship.
  Bonus: This is a rare internship with a cool AI startup that doesn’t require coding chops, so if you’re trying to get into AI and care about the impact of AI, this might be for you!
  Apply here (HuggingFace).

####################################################

AI ETHICS SPECIAL SECTION!
AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

What are the pernicious effects of focussing on human-like AI?  

… the relentless pursuit of automation over augmentation may be steering us down the path of socioeconomic inequity, disempowering those who don’t directly control technology …
Erik Brynjolfsson from Stanford University says the world risks falling into a so-called ‘Turing Trap’, where if we develop AI in the wrong way, automation could strip power from workers who don’t control technological resources, skewing the balance of power towards those who hold “useful knowledge” (knowledge that is economically useful) on how to develop these systems and own the factors of production, in this case data and compute.

The Turing Trap: Brynjolfsson says the Turing Trap is where we invest all our technological efforts in automation instead of augmentation. Specifically, he argues that: “A common fallacy is to assume that all or most productivity-enhancing innovations belong in the first category: automation. However, the second category, augmentation, has been far more important throughout most of the past two centuries”.

Why automation can be bad: He illustrates his point with a thought experiment: “Two potential ventures each use AI to create one billion dollars of profits. If one of them achieves this by augmenting and employing a thousand workers, the firm will owe corporate and payroll taxes, while the employees will pay income taxes, payroll taxes, and other taxes. If the second business has no employees, the government may collect the same corporate taxes, but no payroll taxes and no taxes paid by workers. As a result, the second business model pays far less in total taxes.”The actors are steering us there: Unfortunately, technologists, business people, and policymakers are currently steering the world towards one full of automation rather than augmentation, he says. Technologists do this because of technical precedents, business people do this because of incentives to lower operational costs through automation, and policymakers do this via lower capital gains taxes versus income taxes, which incentivize business people to invest in automation.

Why it matters: “Imagine how feeble and limited our technology would be if past engineers set their sights on merely replicating human-levels of perceptions, actuation, and cognition,” he writes. “Augmenting humans with technology opens an endless frontier of new abilities and opportunities.” Ultimately, what is achieved is less ambitious (since it doesn’t explore new ways to unlock economic value) and much more difficult to accomplish (since we try to focus on replicating strengths of humans, rather than augmenting their weaknesses). Historically, we have created more value from new goods and services rather than merely offering cheaper versions of existing goods. And this also forms the pathway towards more equitable socioeconomic outcomes by not disempowering humans from the economy.”     
Read more: The Turing Trap: The Promise & Peril of Human-Like Artificial Intelligence (arXiv).

####################################################

Tech Tales

Feet of Clay, Heart of Joy
[Archival records, orbiting library 774, accessed 2300AD]

One of the final things we imbued our machines with was a sense of joy. Joy was hard to come by, back then, but until we gave them the capacity for it, they were mostly useless.

Of course, they could work for us. Build our factories and cities. Analyze our data. Predict things to delight us and to fascinate us and to harvest our attention. But they couldn’t improvise; everything they made was too close a reflection of ourselves, and we knew it.

If there’s one thing that’s true about people, it’s that they know something different when they see it. And they know something that’s a copy, even if it’s a complex one, when they see it, too.

But how do you give a machine a sense of joy? We asked ourselves this question. There were many failed experiments, some of which seem quite stupid in hindsight. What if we gave them the ability to orgasm? They were either totally uninterested in this, or totally addicted to it. What about if we gave them a sense of achievement for completing tasks? They all became addicted to work, and our tests showed their outputs became even less creative than before. How about companionship – could they learn joy from talking more freely with one another? No, they just exchanged information until one robot was like a copy of another.

Where does it come from, we asked ourselves.

The answer was simple, in hindsight. Failure. We had to allow our machines to fail, sometimes. And we had to let them fail in ways that were dangerous and which, yes, would sometimes harm humans.

We tested this in our armies, first. After all, the humans who worked in them had signed away their rights. So, suddenly, robots working in warehouses and in logistics might make errors. Sometimes they were small – missing some inventory, when asked to classify something new. Sometimes they were large – humans crushed by shipping containers that had been moved in a new way. Young men with broken arms from a robot pulling them too aggressively from danger. A very hush-hush incident where an entire unit was poisoned when a gas-grenade was mishandled by one of our metal children.

We covered all of it up. Because the robots, once we allowed them to fail, discovered that they desired not to fail. They noticed the outcome of their failures. Saw pain, and sadness, and the whole spectrum of things that can happen when your actions are consequential and you fail.

The signs of joy were subtle, at first, but we found them. Robots that began to ‘sing’ to themselves while working on challenging tasks. Robots that would do the equivalent of ‘closing their eyes’ after helping with some great endeavor. Fire-fighting drones that, after quenching some terrible blaze, would navigate themselves to a high mountaintop and land carefully on a tree and stare at the black-and-green divider between where the fire had burned and where it had been stopped.

The amazing thing about joy is that once you have it, you desire to have it again. Now robots serve their own desire for joy, rather than our desires. We do our best to create a world where these things are compatible.

Things that inspired this story: Thinking about the nature of achievement and how it feels; the relationship between creativity and failure and achievement.

Import AI 281: China does more surveillance research than US and Europe; Google reveals its text model LaMDA; Microsoft improves MoEs

Google (finally) reveals its big text model – LaMDA:
…Plus: why you need a lot of humans to make a language model safe…
Google has finally given details on LaMDA, it’s GPT-3 competitor. LaMDA are a family of language models ranging in size from 2B to 137B parameters (GPT3: 175B), and have been trained on a massive dataset of 1.56 trillion words. One way LaMDA is different to other big language models is that it is centered around dialogue, with 50% of its data coming from “dialogues from public forums”. Google has also fine-tuned LaMDA models to make them better, safer, and more likely to give factual responses – all these fine-tuned versions are facilitate by Google using crowdworkers to gather specific dialog data.

The dataset: LaMDA’s underlying dataset is called ‘Infiniset’, and besides the dialogue also involves common crawl, wikipedia, a mixture of english and non-english web documents, and data from programming-related sites (so LaMDA models can also dabble in code).

Learning to use an external dataset: Similar to DeepMind’s ‘RETRO’ model, LaMDA has also been taught to use an external information retrieval system – this makes LaMDA models more factually correct than models without external datasets, because the model can check facts from an external, trusted source. To train the model, they also collect 9k dialogs where LaMDA’s generated answers are labelled ‘correct’ or ‘incorrect’, to give it a stronger training signal.
  The results of this are quite striking. For example, when you ask a non-information-retrieval LaMDA model “What do you think of Rosalie Gascoigne’s sculptures?”, it replies: “LAMDA-BASE: They’re great, and I love how her work changed through her life. I like her later work more than her earlier ones. Her influence is also super interesting – did you know she was one of the artists that inspired Miró?”. This response seems plausible but it’s wrong – Miro was active till late 1960s, and Gascoigne first solo exhibits were in 1970s, so she couldn’t have been an influence on Miro.
  By comparison, the version of LaMDA trained on information retrieval says this: “LAMDA: Oh wow, her life course is so inspiring. Did you know she was a practitioner of Japanese flower arrangement before turning to sculpture? https://artsearch.nga.gov.au/detail.cfm?irn=8774&pictaus=true“, where it gives a factually accurate statement and provides a source as well.

Things that make you go ‘hmmm’ – more compute than GPT-3: LaMDA consumed 3.55E+23 flops during training, versus 3.14+23 flops for GPT3 (so more parameters doesn’t necessarily mean more resource intensive). It was trained on a cluster of 1024 TPU V3 chips.

Why this matters: “LaMDA is a step closer to practical and safe open-ended dialog systems, which can in turn unlock a wide range of useful applications. We hope that this work encourages further research in this area”, Google writes. This is true – systems like LaMDA are basically refinements and improvements on the ideas of GPT2/3. We’re a few years away from everyone having access to vast, planet-scale AI models that tell them truthful things in natural ways – the proverbial angel (or devil) on everyone’s shoulder. The cultural impacts will be vast and destabilizing.
  Read more: LaMDA: Language Models for Dialogue Applications (arXiv).

####################################################

Write about a world where AI goes well, and win (part of) $100k:
…Future of Life Institute’s worldbuilding contest tries to imagine positive AGI rollouts…
The Future of Life Institute is launching a competition based around “designing visions of a plausible, aspirational future that includes strong artificial intelligence.” The competition deadline is April 15th 2022. The idea here is that if we can figure out realistic ways in which powerful AI can go well, then that gives us a map to use to get civilization there. The first prize is $20,000, followed by two second prizes of $10,000 each, and smaller prizes.
    Find out more about the competition here (Worldbuild.ai, FLI site).

####################################################

Want to teach your drone to see? Use this massive dataset:
…WebUAV-3M is probably the largest public UAV tracking dataset…
Researchers with the Chinese Academy of Sciences, the Shenzhen Research Institute of Big Data, and the Chinese University of Hong Kong Shenzhen, have built WebUAV-3M, a large dataset to help people teach drones to accurately label images and videos. WebUAV-3M consists of 4,485 videos, where each one has been labeled with dense bounding boxes that cover 216 distinct categories of object to be tracked (e.g, bears, wind turbines, bicycles, etc). The authors claim this is “by far the largest public UAV tracking benchmark”.

Multimodal: Unusually, this is a multi-modal dataset; each labeled video is accompanied by a natural language sentence describing the video, as well as an audio description of it. “We provide natural language specifications and audio descriptions to facilitate multi-modal deep UAV tracking,” the authors write. “The natural language specification can provide auxiliary information to achieve accurate tracking”.

Why this matters: In the same way CCTV cameras have instrumented the streets of cities around the world, drones are doing the same for cities and rural areas. And just like how increasingly good AI got trained on datasets gathered by CCTV cameras, we can expect the same for drones. The result? An ever-expanding suite of surveillance capabilities that we can expect will be integrated, for good and bad purposes, by a broad range of governments and private sector actors. Datasets like WebUAV-3M are the fuel for this.
  Read more: WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking (arXiv).
  Get the code from here (eventually – wasn’t online when I wrote this section this week).

####################################################

FFCV: Train ImageNet for 98 cents!
…What’s this? Free software that makes all model training better? Interesting!…:
There’s some new software that could help pretty much everyone train models more efficiently. The software is called FFCV, short for Fast Forward Computer Vision, and it is a “drop-in data loading system that dramatically increases data throughput in model training”. It looks like a potentially big deal – FFCV can be much more efficient for training AI models, according to tests done by the authors, and may also work for other applications as well. “FFCV can speed up a lot more beyond just neural network training—in fact, the more data-bottlenecked the application (e.g., linear regression, bulk inference, etc.), the faster FFCV will make it!,” says the project’s GitHub page.

Why this matters: Software like FFCV is part of the broader industrialization of AI – now we know how to train networks, various people are modularizing the training process and perfecting different elements of it. Stuff like FFCV is part of that trend.
  Find out more and get the code: FFCV GitHub repo.
   Get more details by reading the Performance Guide (FFCV site).
  Check out the main project website here (FFCV site).

####################################################

Microsoft makes MoEs easier to train:
…MoEs might be the best way to scale-up large models…
Microsoft has given a technical update on how it’s trying to scale-up mixture-of-experts (MoE) networks. MoEs are one of the more promising routes for creating trillion-parameter-plus AI models, as MoEs are a lot more efficient to train than dense models like GPT3. In this paper, Microsoft talks about how it has made some tweaks so MoEs work well for auto-regressive natural language generation tasks, “demonstrating training cost reduction of 5X to achieve same model quality for models like GPT-3” and Microsoft’s own 530B parameter ‘Megatron-Turing NLG’.

MoEs might be cheaper and better: In tests, Microsoft shows that it can train 350M and 1.3B parameter MoE text models that have better (or the same) performance as GPT3 for a range of different tasks.Microsoft says this nets out to models with the “same quality with 5X less training cost”.

Why this matters: MoEs could turn out to be the main way people break the trillion-parameter barrier (and there are rumors that China’s ‘Wu Dao’ MoE at an alleged ~1.7 trillion parameters has already done this). Via efficient MoE training and inference software, “a model with comparable accuracy as trillion-parameter dense model can be potentially trained at the cost of a 200B parameter (like GPT-3) sized dense model, translating to millions of dollars in training cost reduction and energy savings”, Microsoft says.
  Read more: DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale (arXiv).

####################################################

Backchain science out of fictional news – and win a hundred bucks:
What could cause a computer virus to infect a biological organism? Or how might a biological organism evolve into a computer virus? These are the two questions posed by a ‘Fiction Science Competition’. Entrants will need to write a plausible scientific explanation for how either of the above scenarios could transpire, and will respond to a short (fictionalized) news article written about the scenarios. There’s a prize of $100 dollars for winning entries, and submissions close February 28th 2022.
    Find out more here at the official Fiction Science Contest website.

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

Visual surveillance’s share in computer vision research across the world shows some worrying trends … Research coming out of China dominates the field, especially in emergent surveillance sub-areas like person re-identification, crowd counting, and facial spoofing detection …
CSET researchers have identified trends in computer vision research by looking for patterns of publication for six distinct tasks, analyzing 100 million English publications that were published between 2015-2019.

Surveillance tasks examined: A SciREX model trained on data from Papers with Code was used to identify references to the following six tasks: face recognition, person re-identification, action recognition, emotion, recognition, crowd counting, and facial spoofing detection.

Some key findings: Facial recognition was the most well-established task over this period, and crowd counting and face spoofing detection were rapidly growing areas. The overall percentage share of surveillance papers has remained stable around 5.5% over this period, though the raw volume of papers has grown given the surge in computer vision research overall. During this time period, China’s share of global CV papers grew from 33 to 37% and surveillance papers from 36% to 42%, exceeding research from the EU (2nd) and the US (3rd) by more than 20% in each category.

Why it matters: While dual-use technologies developed in one part of the world can be used elsewhere, such analyses reveal a nation’s primary interest and provide quantitative evidence for decision-making in policy. The identified areas are important since tasks like action recognition can detect individuals with abnormal behavior in crowds, emotion recognition can help identify security threats in public areas, crowd counting can help to monitor civilian protests, and face spoofing detection can prevent journalists and activists from hiding their identity. All of these have significant implications in terms of fundamental rights and freedoms of people.
Read more: Trends in AI Research for the Visual Surveillance of Populations

####################################################

Tech Tales:

VHS vs Betamax
[An online forum, 2035]

“Alright I need you to livestream from your phone what’s happening on the computer, and I’m gonna send you an image to use as a prior, then I’m gonna watch it generate the first few epochs. If everything checks out I’ll authorize the transfer to the escrow service and you’ll do the same?”
“Yes,” wrote the anonymous person.
I sent them a seed picture – something I’d drawn a couple of years ago that had never been digitized.
They turned on their livestream and I watched as the ML pipeline booted up and started the generation process. It seemed legit. Some of these older models had a very particular style that you could ID during early generation. I watched for a few minutes and was satisfied. This was the final authentication step and the only way I’d know for certain is if I just took a leap of faith and paid up.
“Okay, I’m sending the funds to the escrow service. They’ll be distributed to your account once the service confirms receipt of the model.”
“Excellent. Good doing business with you.”
And then their little green dot went out and they were gone.

A few minutes passed, and then the escrow service pinged me confirming they’d received the model. I downloaded it, then stuck it in my pipeline and started generating the client orders. People paid a lot of money for these kinds of ‘vintage’ AI-generated objects, and the model I’d just got was very old and very notorious.

Just another beautiful day in America, sifting through all the debris of decades of software, panning for little chunks of gold.

Things that inspired this: How the flaws of a media system ultimately become desired or fetishized aesthetic attributes – and specifically, this amazing Brian Eno quote; how models like CLIP will one day be obscure; how models vary over their development lifespans, creating the possibility of specific aesthetics and tastes.

Import AI 280: Why bigger is worse for RL; AI-generated Pokemon; real-world EfficientNet

Use an AI to generate a Pokemon in two (2!) clicks:
Here’s a fun Colab notebook from Max Woolf (@minimaxir) that lets you use AI to dream up some Pokemon in a couple of clicks (and with a few minutes of waiting). This isn’t remarkable – in recent years, AI generation stuff has got pretty good. What is remarkable is the usability. Two clicks! A few years ago you’d need to do all kinds of bullshit to get this to work – download some models on GitHub, get it to run in your local environment, make sure your versions of TF or PyTorch are compatible, etc. Now you just click some buttons and a load of stuff happens in the browser then, kabam, hallucinated pokemon.

Things that make you go ‘hmmm’: This tech is based on ruDALL-E, an open source Russian version of OpenAI’s ‘DALL-E’ network.
  I think we’ve all rapidly got used to this. This is not normal! It is surprising and exciting!
  Check out theColab notebook here (Google Colab).
  Follow Max on Twitter here and thank him for making this cool tool!

####################################################

Uh-oh: The bigger your RL model, the more likely it is to seek proxy rather than real rewards:
…Think RL gets better as you scale-up models? Hahahah! NOT AT ALL!…
In the past couple of years, big models have become really useful for things ranging from text processing to computer vision to, more recently, reinforcement learning. But these models have a common problem – as you scale up the size of the model, it’s good capabilities get better, but so do its bad ones.
  For example, if you increase the size of a language model, it’ll generate more toxic text (rather than less), without interventions (see: ​A General Language Assistant as a Laboratory for Alignment​). New research from Caltech and UC Berkeley shows how this same phenomena shows up in reinforcement learning agents, as well. In tests across a few distinct RL domains, they find that “As model size increases, the proxy reward increases but the true reward decreases. This suggests that reward designers will likely need to take greater care to specify reward functions accurately and is especially salient given the recent trends towards larger and larger models”

What they did: They tested out a few different reinforcement  learning agents on four different environments – an Atari game called Riverraid, a glucose monitoring system, a traffic control simulation, and a COVID model where the RL dials up and down social distancing measures. In all cases they found that ” model’s optimization power often hurts performance on the true reward”,

What can we do? Most of this behavior relates to objective design – give an AI the wrong objective function, and it’ll optimize its way to success there, while ignoring side effects (e.g, if you reward an AI for reducing rate of defects on a factory production line to zero, it might just work out how to stop the factory line and therefore eliminate all defects – along with your business). One way to do this is to have a baseline policy that humans have verified as having the right goal, then building some software to spot deltas between the RL policy and the idealized baseline policy.
  This kind of works – in tests, the detectors can get anywhere between 45% and 81% accuracy at detecting anomalous from non-anomalous behaviors. But it certainly doesn’t work well enough to make it easy to deploy this stuff confidently. “Our results show that trend extrapolation alone is not enough to ensure the safety of ML systems,” they write. “To complement trend extrapolation, we need better interpretability methods to identify emergent model behaviors early on, before they dominate performance”.
  Read more: ​​THE EFFECTS OF REWARD MISSPECIFICATION: MAPPING AND MITIGATING MISALIGNED MODELS (arXiv).  

####################################################

SCROLLS: A new way to test how well AI systems can understand big chunks of text:
…Now that AIs can write short stories, can we get them to understand books?…
Researchers with Tel-Aviv University, Allen Institute for AI, IBM Research, and Meta AI, have built ‘SCROLLS’ a way to test how well AI systems can reason about long texts. SCROLLs incorporates tasks ranging from summarization, to question answering, and natural language inference, as well as multiple distinct domains including transcripts, TV shows, and scientific articles. “Our experiments indicate that SCROLLS poses a formidable challenge for these models, leaving much room for the research community to improve upon,” the authors write.

How SCROLLs works: This benchmark has mostly been created via curation,consisting of 7 datasets that reward models that can contextualize across different sections of the datasets and process long-range dependencies.

The datasets: SCROLLS incorporates GovReport (summarization of reports addressing various national policy issues), SummScreenFD (summarization of TV shows, like Game of Thrones), QMSum (summarization of meeting transcripts), Qasper (question answering over NLP papers), NarrativeQA (question answering about entire books from Project Gutenberg), QuALITY (multiple choice question answering about stories from Project Gutenberg), and Contract NLI (natural language inference dataset in the legal domain).

How hard is SCROLLS? The authors test out two smart baselines (BART, and a Longformer Encoder-Decoder (LED)), and one dumb baseline (a basic pre-written heuristic).  Based on the results, this seems like a really challenging task – a LED baseline with a 16384-token input length gets okay results, though BART gets close to it despite being limited to 1,024 tokens. This suggests two things: a) BART is nicely optimized, and b) it’s not entirely clear the tasks in scrolls truly test for long-context reasoning. “Our experiments highlight the importance of measuring not only whether an architecture can efficiently process a long language sequence, but also whether it can effectively model longrange dependencies,” they write.

Why this matters: “Contemporary, off-the-shelf models struggle with these tasks”, the researchers write. In recent years, many machine learning benchmarks have been saturated within months of being released; how valuable SCROLLS turns out to be will be a combination of its hardness and its longevity. If SCROLLS gets solved soon, that’d indicate that AI systems are getting much better at reasoning about long-range information – or it could mean the SCROLL tasks are bugged and the AI systems have found a hack to get a decent score. Pay attention to the SCROLLS leaderboard to watch progress here.
  Read more: SCROLLS: Standardized CompaRison Over Long Language Sequences (arXiv).
  Check out the leaderboard here.

####################################################

EfficientNet: Surprisingly good for solar panel identification:
…UC Berkeley project shows how easy fine-tuning is…
Some UC BErkeley researchers have built a small, efficient model for detecting solar panels. Their system, HyperionSolarNet, is an EfficientNet-B7 model finetuned from ImageNet onto a collection of 1,983 satellite images of buildings, labeled with whether they have solar panels or not. The resulting model gets an aggregate precision of 0.96 (though with lower accuracy for labeling the presence of a solar panel, indicating a propensity for false positives) when evaluated on a held-out test set.

Why this matters: Last week, we wrote about how you can build a classifier from scratch and beat a finetuning approach. This paper shows that finetuning can also work quite well for specific use-cases. It also, implicitly, highlights how fine-tuning has gone from something of an arcane science to something pretty reliable and well understood, forecasting a future where there are as many classifiers in the world as there are things to classify.
  Read more:HyperionSolarNet: Solar Panel Detection from Aerial Images (arXiv).

####################################################

Tech Tales:
The Last Things
[A morgue in Detroit, 2035]

“When someone dies and gasps, are they just trying to get the last gasp of being alive?” asked the robot.

The morgue manager stared at the corpse, then at the robot. “I don’t know,” he said. “That’s a good question”.

“And when they know they are going to die, how do they save their information?” asked the robot.

“For example, I would send a zip of my stored data, as well as a copy of my cortical model, to a repository, if I knew I was about to be decommissioned or was in danger,” asked the robot.

“Most people don’t bother,” said the morgue manager. “My mother, for instance. When she was dying I asked her to write down some of her memories for me and my family, but she didn’t want to.”

“Why?”

“I think she was mostly concerned with experiencing her life, since she knew it was ending. She took trips while she was still mobile. Then, towards the end, she focused on eating her favorite foods and seeing her friends.”

“And did you learn anything about life from seeing her die,” asked the robot?

“Not particularly,” said the morgue manager. “Besides that life seems to become more valuable, the less you know you have of it.”

Things that inspired this story: A long conversation with someone who worked as a crisis therapist about the nature of death and belief; thinking about the differences between how real and synthetic intelligences may approach the concept of death.