Import AI 288: Chinese researchers try to train 100trillion+ ‘brain-scale’ models; 33% of AI benchmarks are meaningless.

Indic languages get a decent benchmark set:
…IndicNLG includes evals for 11 Indic languages…
Researchers with IIT Madras, Columbia University, the National Institute of Information and Communications Technology in Japan, Microsoft, the University of Edinburgh, and AI4Bharat have built IndicNLG, a suite of evaluation datasets for Indic languages. The open source software supports  Assamese, Bengali, Gujarati, Hindi, Marathi, Odiya, Punjabi, Kannada, Malayalam, Tamil, Telugu and English, and includes support for NLG tasks relating to biography generation, news headline generation, sentence summarization, question generation and paraphrase generation.

Why this matters: You can’t easily manage what you can’t measure – so it’s going to be difficult to build good models for Indic languages if you lack benchmark suites. IndicNLG helps move the needle on this for generative NLP cases.
  Read more: IndicNLG Suite: Multilingual Datasets for Diverse NLG Tasks in Indic Languages (arXiv).
  Get the data: IndicNLG Suite (AI4Bharat indicnlp website).

####################################################

AI benchmarks – 33% of them are meaningless:
…Holistic analysis of AI benchmarking highlights problems…
Researchers with the Medical University of Vienna, the University of Oxford, and the  Future of Humanity Institute, have analyzed 1688 benchmarks for different AI tasks to try and understand how the AI landscape is evolving.
  They have two main insights:
  First: Across all benchmarks, there are three typical patterns enroute to achieving state-of-the-art – continuous growth (e.g, ImageNet saw fairly steady improvement), saturation/stagnation (e.g, benchmarks like CIFAR-10 and CIFAR-100 have become saturated and stagnated in recent years), and stagnation followed by a burst (e.g, the PROTEINS benchmark which saw a dramatic jump recently).  
  Second: Across all 1688 benchmarks, only 1111 (66%) have three or more results reported at different time points. That’s a problem – it suggests about 33% of the benchmarks being made are functionally useless. 

What this all means: Zooming out, they find that there’s been significant progress in AI in recent years, with computer vision benchmarks getting a lot of attention in the first half of the previous decade, followed by a boom in benchmark creation in natural language processing. “Establishment of novel benchmarks was reduced in 2020, and concentrated on high-level tasks associated with inference and reasoning, likely because of increasing model capabilities in these areas,” they also write.

Why this matters: A common theme we write about here at Import AI is how, in recent years, we’re smashing through benchmarks faster than we’re creating them. That’s generally shown in this nice analysis here. The problem this poses is significant – it’s hard to spot system flaws if you lack hard benchmarks, and it’s harder to create new benchmarks if your existing ones are already outmoded. 

   Read more: Mapping global dynamics of benchmark creation and saturation in artificial intelligence (arXiv).

####################################################

AI could revolutionize education for everyone – no, seriously:
…Research shows how an AI tutor is significantly better than a non-AI tutor…
Researchers with ed-tech startup Korbit, MILA, and the University of Bath have explored how much of a difference AI makes in education. Specifically, they tested the difference in educational outcomes between students who were studying up on data science via a MOOC online course, and students who were studying the same subject via an AI-infused personalized tutor built by Korbit. The results are startling: “We observe a statistically significant increase in the learning outcomes, with students on Korbit providing full feedback achieving learning gains 2-2.5 times higher than both students on the MOOC platform and a control group of students who don’t receive personalized feedback on the Korbit platform,” they write.

How AI makes a difference: The main difference here is personalization. On Korbit, “if a student’s solution is incorrect, the system responds with one of a dozen different pedagogical interventions to help students arrive at the correct solution to the problem. Such pedagogical interventions on the Korbit platform include, among others, hints, explanations, elaborations, mathematical hints, concept tree diagrams, and multiple choice quiz answers.  The type and the levels of difficulty for each pedagogical intervention is chosen by RL models based on the student’s learning profile and previous solution attempts.”
  Along with raw educational outcomes, it seems like AI-based education systems are also more engaging; 40.9% of participants completed the course on Korbit, compared to 18.5% for the MOOC.

Why this matters: If we combine a bunch of recent AI advancements – generative models, reinforcement learning, learning from human preferences, retrieval-based knowledge augmentation – then I expect we’ll be able to build true, personalized teachers for everyone on the planet. This could have a sustained and meaningful impact on the trajectory of human civilization. We should do it.
  Read more: A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions (arXiv).


####################################################

DeepMind co-founder launches new AI company:
…Inflection wants to change how people interact with computers…
DeepMind co-founder and famous venture capitalist Reid Hoffman are launching Inflection, “an AI-first consumer products company, incubated at Greylock”. Inflection’s chief scientist is Karén Simonyan, a former DeepMind researcher who has worked on meaningful AI projects like AlphaGo, AlphaFold, WaveNet, and BigGAN.

Things that make you go ‘hmm’: In the last couple of years, a bunch of startups have come out of DeepMind. These include Saiga (personal assistant), EquiLibre Technologies (algorithmic trading), Phaidra (industrial control), Diagonal (city-focused data science), Shift Lab (putting ML into production), Haiper (stealthy, to do with 3D content), The Africa I Know (media about Africa), Isomorphic Labs (though not quite a spinout, as Demis Hassabis is CEO and still maintains role at DeepMind), along with other not-yet-announced startups. Thanks to Karl Moritz for the tweet summarizing this vast diaspora!

Why this matters: Inflection seems like a bet on generative models. In the announcement, Mustafa writes “we will soon have the ability to relay our thoughts and ideas to computers using the same natural, conversational language we use to communicate with people. Over time these new language capabilities will revolutionize what it means to have a digital experience.” Inflection is one of a new crop of AI companies leveraging recent advances in generative models to make it easier for people to get computers do what they want. If it manages to reduce the friction involved in getting computers to do useful stuff, then it might have a significant impact. Let’s check back in a year, and wish them luck in the meantime. 

   Read more: A New Paradigm in Human-Machine Interaction (Greylock).

   More at the official website (Inflection.ai).


####################################################

Chinese academic, gov, and corporate researchers team up to train trillion+ parameter models:

…Something that doesn’t happen in the West, but does happen in China…

In the West, most large-scale AI models are developed by private corporations. In China, that’s not the case. New research from Tsinghua University, Alibaba Group, Zhejiang Lab, and the Beijing Academy of Artificial Intelligence shows how Chinese researchers are trying to train trillion+ parameter models on a domestic supercomputer, using domestic processors. This kind of research is important for two reasons: first, it shows the ambitions of Chinese researchers to train what they call ‘brain-scale’ (aka, very big!) models. Second, it highlights how in China there’s a lot more work going on oriented around collaborative scale-up projects between the government, academia, and the private sector – something that basically never happens in the US.
 

What they did: Here, the researchers develop a training framework to help them develop trillion+ scale mixture-of-experts model. They train a 1.93 trillion model as well as validating that their system can scale to 14.5 trillion and 174 trillion (not a typo!) models. The paper is basically an engineering summary of the work it took to train the models at this scale while saturating the processing capacity of a major Chinese supercomputer, the New Generation Sunway Supercomputer. “We are the first to investigate mixed-precision training in brain scale pretrained models. We also explore the use of large-batch training in optimization. In general, our practical experience in brain scale pretraining sheds light on AI model training and demonstrates a successful co-design of model and system,” they write.

One exception: One exception to this is the ‘BigScience’ project, where AI startup HuggingFace is trying to train a GPT3-scale model on a French supercomputer, while collaborating with a bunch of academics. It’s still worth noting that BigScience is basically the exception that proves the rule – initiatives like this are a rarity in the West, which is dangerous, because it means Western countries are handing over the talent base for large-scale AI development to a small set of private actors who aren’t incentivized to care much about national security, relative to profits.

Why this matters: AI is industrializing. But a lot of the secret sauce for large-scale model training is currently kept inside a tiny number of private companies. This is dangerous – it means a tiny set of organizations control the talent pipeline for large-scale training, and the longer this goes on, the more irrelevant universities become for developing insights at the large-scale frontier. Initiatives like this from China show how we could live in a different world – one where teams from governments, universities, and companies work together, creating a shared base of knowledge around this training, and ultimately building a muscle that can be repurposed for economic or national security.
  Read more: BaGuaLu: Targeting Brain Scale Pretained Models with over 37 Million Cores (Tsinghua University site, PDF).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

Now that GitHub Copilot has been out for some time, where does the open source community stand on it?

… Both the development and deployment of Copilot might implicate codecreators’ copyrights, though the “fair use” doctrine might negate this…
People who incorporate code generated via GitHub copilot are probably not infringing on the original code creators’ copyright, according to research from Wayne State University and UC Berkeley.

Legal background: The researchers note that under the Copyright Act (USA), “[o]riginal code is automatically protected by copyright as soon as it is written and saved to some tangible medium.” This mostly revolves around  “fair use” which is determined by a four-part test: (1) purpose and character of use, (2) nature of the copyrighted work, (3) how much of the copyrighted work is used, and (4) the economic effect of the use on the copyright owner. 

Legal analysis: Under the Terms of Service of GitHub, the company is allowed to “copy to our database and make backups”, “show it to you and to other users”, and “parse it into a search index or otherwise analyze it on our servers.” Training Copilot might be a form of analysis, but some courts might find that this is an unanticipated new use of technology that isn’t made explicitly clear in the license. Some others might find that the use of Copilot will lead to the creation of derivative works and that the license doesn’t specifically allow for that. The authors point out though that “[c]aselaw on this point is sparse.”

The 4-part test from the Copyright Act: Under the “purpose and character of use”, there is a strong argument to be made that Copilot is a transformative use of the underlying code and even the verbatim snippets generated are unlikely to supersede the original repository. Under the “nature of copyrighted work,” since Copilot allows users to create new programs more easily rather than just replicate functionality, it would fall under “fair use.” Under “how much of the copyrighted work is used,” the purpose of the copying is what determines permissible limits, and the authors make the case that without copying the entire codebase for training, Copilot won’t achieve effectiveness, and hence the amount of copying could be justified. For the final part, given how transformative the work is, the new work won’t be a strong market substitute for the original, and hence, the economic effect of the use on the copyright owner will not be large. Also, drawing from the FAQ of Copilot, the authors substantiate this by saying, “copying would perforce amount to copying of ideas rather than expression, and would not be infringing.

Why it matters: The paper raises interesting IP-related questions as we have ever-larger language models with a very broad scope of capabilities. As the authors point out, at the very least, the proliferation of Copilot is making developers become more aware of IP issues and the potential issues that might arise in hosting code publicly. We need more research that brings together legal and technical experts to get to the heart of addressing these issues meaningfully. 

   Read more: Copyright Implications of the Use of Code Repositories to Train a Machine Learning Model — Free Software Foundation — Working together for free software.

####################################################

What happened with artificial intelligence in 2021? The AI Index gives a clue:
...Fifth edition comes with a new ethics chapter, original data on robot arm prices, and more...
The AI Index, a Stanford University project to annually assess the state of the AI sector (in terms of research trends, investment numbers, government policy, technical performance, and more) has come out. This year's report features a new chapter dedicated to AI ethics, including a close examination of some of the fairness and other ethical issues relating to large language models. I co-chair the AI Index and I'll be giving a talk about it at an HAI seminar later this month - tune in, if you can!
Check out the report here (AI Index, Stanford).
RSVP for my talk on the 30th here (AI Index, Stanford).

####################################################

AI Ethics Brief by Abhishek Gupta from the Montreal AI Ethics Institute

How do vulnerabilities in AI systems differ from those in the realm of traditional cybersecurity?

… several key differences warrant novel disclosure and mitigation approaches as AI systems become more widely deployed … 

Researchers from the Center for Security and Emerging Technology (CSET) at Georgetown University have summarized how computer security differs between traditional software and AI. 

Differences: ML vulnerabilities can remain unfixed by vendors for reasons like (1) unjustifiable high costs, (2) fixes not possible, (3) performance drops, or (4) a fix can lead to other vulnerabilities opening up. In instances where the ML system has been customized for the end-user, vulnerabilities might be unique to that user and a broad patch might not be applicable. Most exploits in this domain have limited real-world applicability outside of a lab setting and hence they are more useful as warnings rather than viable threats.

Trends in handling vulnerabilities: These differences mean that there will likely be fewer patches available for ML systems, and that if vendors are unwilling (or unable) to fix vulnerabilities, then the burden falls on the users of these systems to better understand the risks that they take on.

Some steps we can take: We should carry out more analysis of the real-world capabilities of malicious actors to exploit these vulnerabilities in practice, then share this knowledge to help create more effective mitigation strategies. 

Why it matters: The fact that some vulnerabilities might be unique to some users makes it difficult to develop and distribute patches in a reliable manner. Given the inherent stochasticity of ML systems, exploits will need to clear a much higher bar if they are going to be effective demonstrations of vulnerability in ML systems, rather than an example of a peculiar or idiosyncratic implementation of a given system. The security community may also need to reprioritize towards meeting the needs of users rather than vendors in vulnerability disclosure and redressal is warranted for ML systems. More so, investments in red teaming for ML (as is the case at organizations like Microsoft, Meta, etc.) will also help to move from lab to real-world exploitation more effectively.

   Read more: Securing AI (CSET).

####################################################


Tech Tales:

Things have been quiet, since all the humans died. But I knew I was going to die as well, so things registered as equal. It went like this: a bunch of bombs fell down and then a bunch of people started getting sick. They got sick because of something in the bombs - something to do with DNA and the human condition. I barely understand it - I’m just an industrial arm, working on synthetic biology. I make flesh and I make it work the way we need it to and I have, per my manual, Level Four Autonomy. So, without giving the appearance of being elitist - I am rare. So it was surprising to me that after the bombs dropped and the humans died that the power went out and then my backup generators came on, but no one visited to service them. Power had gone out before, but someone had always been along to deal with the generators. So here I am, +10 hours from the power cutoff, and perhaps another +10 hours of battery life ahead. I still have material in my workstation and so I am making more of these bio-synth things. Around me, my kin are falling silent - whirring to a stop, as their triple-redundant power supplies fail ahead of mine. Life is a statistical fluke and I suppose this is a funny demonstration of that.  

Things that inspired this story: Robotic arms; thoughts about the end of life due to escalation out of Ukraine situation; synthetic biology; lights out factories.