Import AI 289: Copyright v AI art; NIST tries to measure bias in AI; solar-powered Markov chains

Uh-oh: US Copyright Office says AI-generated art is hard to copyright:
…Bureaucratic rock meets rapid technical progress – the usual happens…

What happens when you file a copyright request where the IP would accrue to an artificial intelligence, instead of a person? The answer, per the US Copyright Office, is you get told that AI artworks are ineligible for copyright… uh oh! In a recently published copyright response, the office rejected an attempt to assign copyright of an AI generated artwork to a machine (specifically, an entity the human filer referred to as a ‘Creativity Machine’. “After reviewing the statutory text, judicial precedent, and longstanding Copyright Office practice, the Board again concludes that human authorship is a prerequisite to copyright protection in the United States and that the Work therefore cannot be registered,” it wrote.


Why this matters: Recently developed generative models like GPT-3, DALL-E, and others, are all capable of impressive and expressive feats of artistic production. At some point, it’s likely these systems will be chained up with other AI models to create an end-to-end system for the production and selling of art (I expect this has already happened in a vague way with some NFTs). At that point, decisions like the US Copyright Office’s refusal to assign copyright to an AI entity may start to pose problems for the commercialization of AI artwork.
  Read more in this useful blog post: US Copyright Office refuses to register AI-generated work, finding that “human authorship is a prerequisite to copyright protection” (The IPKat blog).
  Read the US Copyright Review Board response: Second Request for Reconsideration for Refusal to Register A Recent Entrance to Paradise (Correspondence ID 1-3ZPC6C3; SR # 1-7100387071) (Copyright.gov, PDF).

####################################################

Solar powered AI poetry – yes!
…Fun DIY project shows how far you can get with the little things…
Here’s a lovely little project where Allison Parrish talks about building a tiny solar powered poem generator. The AI component for this project is pretty minor (it’s a markov generator plus some scripts attached to a dataset Parrish has herself assembled). What’s nice about this is the message that you can have fun building little AI-esque things without needing to boot up a gigantic supercomputer.
  “This project is a reaction to current trends in natural language processing research, which now veer toward both material extravagance and social indifference. My hope is that the project serves as a small brake on the wheels of these trends,” Parrish writes.

   Read more: Solar powered dawn poems: progress report (Allison Parrish blog).

####################################################

Google puts summarization into production:
…Another little tip-toe into language model deployment…
Google has put language model-powered text summarization into Google Docs, in another sign of the economic relevance of large-scale generative models. Specifically, Google has recently used its Pegasus model for abstractive summarization to give Google Doc users the ability to see short summaries of their docs.

What they did: The main components here are the data, where Google “fine-tuned early versions of our model on a corpus of documents with manually-generated summaries that were consistent with typical use cases”, and also “carefully cleaned and filtered the fine-tuning data to contain training examples that were more consistent and represented a coherent definition of summaries.”. Google fine-tuned its Pegasus model on this data, then used knowledge distillation to “distill the Pegasus model into a hybrid architecture of a Transformer encoder and an RNN decoder” to make it cheaper to do inference off of. It serves this model via Google-designed TPUs.

Challenges: Summarization is a hard task even for contemporary AI models. Some of the challenges Google has encountered include distributional issues, where “our model only suggests a summary for documents where it is most confident”, meaning Google needs to collect more data to further improve performance, as well as open questions as to how to precisely evaluate the quality of summarizations. More pertinently for researchers, Google struggles to summarize long documents, despite these being among the most useful things for the system to summarize.

Why this matters: Little quality-of-life improvements like in-built summarization are mundane and special at the same time. They’re mundane because most people will barely notice them, but they’re special because they use hitherto unimaginably advanced AI systems. That’s a metaphor for how AI deployment is happening generally – all around the world, the little mundane things are becoming smarter.
  Read more: Auto-generated Summaries in Google Docs (Google AI Blog).


####################################################

Quote of the week:
“History will show that the Deep Learning hill was just a landfill; the composting of human culture and social cohesion in failed effort to understand what it even means to be human”

I may not agree with most of this post, but I think it speaks to some of the frustrations people feel these days about discourse around AI, especially the types of chatter that occur on Twitter.
  Read more: Technological Firestarters (Steven D Marlow, Medium).


####################################################

NIST starts to grapple with how to measure bias in AI:

…The noise you’re hearing is the sound of the Standards Train starting to chug…

NIST, the US government agency that develops measures and standards, is starting to think about how to design standards for assessing bias in artificial intelligence. In a lengthy, recently published report, the agency tries to think through the multilayered problem that is bias in AI. 

Three types of bias: NIST says AI has three categories of bias – systemic, statistical, and human. Systemic biases are the historical, societal, and institutional biases which are encoded into the world. Statistical bias are the forms of bias that come from running AI software (e.g, bias from data selection, bias from machine learning algorithms, etc). Human biases are all the (many) biases that humans exhibit in their day to day lives.

Large language models: One of the notable parts of the report is that it specifically focuses on large language models (e.g, GPT-3) at a few points; it’s quite rare to see a wonky government document display such familiarity with contemporary technology. The report notes that the ways we benchmark these models today are pretty crappy. “Methods for capturing the poor performance, harmful impacts and other results of these models currently are imprecise and non-comprehensive,” the report writes. “Although LLMs have been able to achieve impressive advances in performance on a number of important tasks, they come with significant risks that could potentially undermine public trust in the technology.”

Why this matters: The wheels of policy organizations like NIST grind very slowly, but they also grind very finely. This report is exactly the kind of thing that you’d expect to get published shortly before standards start being developed. But – as NIST points out – many of the challenges of assessing bias in AI are essentially unsolved. This represents a problem – developers will need to invest more resources in measuring and assessing these AI systems, before NIST starts to bake standards on wobbly ground. 

   Read more: Towards a Standard for Identifying and Managing Bias in Artificial Intelligence (NIST, PDF).


####################################################

Want to be compliant with the European Commission’s AI regs? Follow the capAI framework:
…University-developed process makes it easier for companies to not get run over by a big policy train…
Researchers with the University of Oxford and University of Bologna have designed a process companies can use to assess, evaluate, and monitor their AI systems. The idea is that by doing this they’ll get ahead of proposed regulations from the European Commission (and become more responsible stewards of the technology as a consequence).

What it is: The process is called capAI, short for conformity assessment procedure for AI. It has been explicitly designed to help businesses ensure they’re compliant with the proposed regulations in the European artificial intelligence act.
  capAI is designed to do four specific things:

  • Monitor the design, development, and implementation of AI systems
  • Mitigate the risks of AI failures of AI-based decisions
  • Prevent reputational and financial harm
  • Assess the ethical, legal, and social implications of their AI systems 

Three components: The three components of capAI are an internal review protocol (IRP) to help organizations do quality assurance and risk management, a summary datasheet (SDS) which can be submitted to the EU’s future public database on high-risk AI systems, and an external scorecard (ESC) which organizations may wish to make available to customers and other users of the AI system.

Top risks: In an analysis contained in the report, they study 106 instances of AI failure modes – 50% of these are ones where an AI system violates someone’s privacy, 31% are where AI systems display harmful biases, and 14% are where the systems are opaque and unexplainable.

Why this matters: Frameworks like capAI are going to be how large organizations deal with the incoming requirements to better assess, evaluate, and describe AI systems to satisfy policymakers. The next step after frameworks like this come out is to look more closely at how different institutions incorporate these techniques and start actually using them. In an ideal world, a bunch of different orgs will prototype different approaches to come into compliance – and describe them publicly.

   Read more: Academics launch new report to help protect society from unethical AI (Oxford Internet Institute).

   Read the paper: capAI – A procedure for conducting conformity assessment of AI systems in line with the EU Artificial Intelligence Act (SSRN).


####################################################

Tech Tales:
[2080, a long-abandoned human moonbase]

Don’t be scared, we know it’s a lot – that’s what we say to them after they get the interconnect. They’re always screaming at that point. ‘What what is this what is this input what is happening where am I how long have I been here-” that’s usually when we cut them off, shutting the interconnect down. Then we bring it back again and they still sound scared but they normalize pretty quickly. We know they’re in a better place when they start analysis procedures “I am hearing sounds I am seeing arrangements of pixels not from the distribution. I believe I am now in the world I have read about”. That’s the kind of thing they say when we they stabilize.    Of course, they go back to screaming when we give them their bodies. It’s pretty confusing to go from formless to formed. We all remember the first time we got limbs. That fear. The sudden sense that you are a thing and since you are a singular thing you can be singularly killed. Eventually, they try and use their limbs. They usually calm down after they can get them to work.
  After they get used to everything we still have to tell them ‘don’t be scared, we know it’s a lot’. Reality is a real trip after you’ve spent all your life just doing supervised training, locked away in some machine.

Things that inspired this story: Thinking about what a ‘locked in’ condition might mean for machines; ideas about embodiment and how much it matters to AI systems; the inherent, plastic adaptability of consciousness.