Import AI 228: Alibaba uses AI to spot knockoff brands; China might encode military messages into synthetic whale songs; what 36 experts think is needed for fair AI in India

China might be using AI to synthesize whale songs for its military:
…The future of warfare: whalesong steganography…
China has been trying to synthesize the sounds of whales and dolphins, potentially as a way to encode secret messages to direct submarines and other submersible machines, according to a somewhat speculative article in Hakai Magazine.

“Modern technological advances in sensors and computing have allowed Chinese researchers at Harbin Engineering University and Tianjin University to potentially overcome some of those prior limitations. A long list of papers from both universities discusses analyzing and synthesizing the sounds from dolphins, killer whales, false killer whales, pilot whales, sperm whales, and humpback whales—all pointing to the possibility of creating artificially generated marine mammal sounds to send more customized messages,” writes journalist Jeremy Hsu.

Why this matters: For a lot of AI technology, there are two scientific games being played: a superficial game oriented around a narrowly specified capability, like trying to identify animals in photos from cameras in national parks, or synthesizing whale sounds. The second game is one played by the military and intelligence community, which funds a huge amount of AI research, and usually involves taking the narrow capabilities of the former and secretly converting them to a capability to be fielded for the purposes of security. It’s worth remembering that, for most trends in AI research, both games are being played at the same time.
  Read more: The Military Wants to Hide Covert Messages in Marine Mammal Sounds (Hakai magazine).

###################################################

What 36 experts think is needed for fair AI in India:
…Think you can apply US-centric practices to India? Think again…
Researchers with Google have analyzed existing AI fairness approaches and then talked to 36 experts in India about them, concluding that tech companies will need to do a lot of local research before they deploy AI systems in an India context.

36 experts: For this research, they interviewed scholars and activists from disciplines including computer science, law and public policy, activism, science and technology studies, development economics, sociology, and journalism.

What’s different about India? India has three main challenges for Western AI companies:
– Flawed data and model assumptions: The way data works in India is different to other countries, for example – women tend to share SIM cards among each other, so ML systems that do per-SIM individual attribution won’t work. 
– ML makers’ distance: Foreign companies aren’t steeped in Indian culture and tend to make a bunch of assumptions, while also displaying “a transactional mindset towards Indians, seeing them as agency-less data subjects that generated large-scale behavioural traces to improve ML models”.
– AI aspiration: There’s lots of enthusiasm for AI deployment in India, but there isn’t a well developed critical ecosystem of journalists, activists, and researchers, which could lead to harmful deployments.

Axes of discrimination: Certain Western notions of fairness might not generalize to India, due to culture differences. The authors identify several ‘axes of discrimination’ which researchers should keep in mind. These include: awareness of the different castes in Indian society, as well as differing gender roles and religious distributions, along with ones like class, disability, gender identity, and ethnicity.

Why this matters: AI is mostly made of people (and made by people). Since lots of AI is being developed by a small set of people residing in the West Coast of the USA, it’s worth thinking about the blind spots this introduces, and the investments that will be required to make AI systems work in different contexts. This Google paper serves as a useful signpost for some of the different routes companies may want to take, and it also represents a nice bit of qualitative research – all too rare, in much of AI research.
  Read more: Non-portability of Algorithmic Fairness in India (arXiv).

###################################################

The USA (finally) passes some meaningful AI regulations:
…The big military funding bill contains a lot of AI items…
The United States is about to get a bunch of new AI legislation and government investment, thanks to a range of initiatives included in the National Defense Authorization Act (NDAA), the annual must-pass fund-the-military bill that winds its way through US politics. (That is, as long as the current President doesn’t veto it – hohoho!). For those of us who lack the team to read a 4,500 page bill (yes, really), Stanford HAI has done us a favor and gone through the NDAA, pulling out the relevant AI bits. What’s in it? Read on! I’ll split the highlights into military and non-military parts:

What the US military is doing about AI:
– Joint AI Center (the US military’s main AI office): Making the Joint AI Center report to the Deputy SecDef, instead of the CIO. Also getting the JAIC to do a biannual report about its work and how it fits with other agencies. Also creating a board of advisors for the JAIC.
– Ethical military AI: Tasks the SecDef to, within 180 days of bill passing, assess whether DoD can ensure the AI it develops or acquires is used ethically.
– Five AI projects: Tasks the SecDef to find five projects that can use existing AI systems to improve efficiency of DoD.
– DoD committee: Create a steering committee on emerging technology for the DoD.
– AI hiring: Within 180 days of bill passing, issue guidelines for how the DoD can hire AI technologists.

What the (non-military) US is doing about AI:
– National AI Initiative: Create a government-wide AI plan that coordinates R&D across civiliians, the DoD, and the Intelligence Community. Create a National AI Initiative Office via the director of the White House OSTP. Within that office, create a Interagency Committee to ensure coordination across the agencies. Also create a National AI Advisory Committee to “advise the President and the Initiative Office on the state of United States competitiveness and leadership in AI, the state of the science around AI, issues related to AI and the United States workforce, and opportunities for international cooperation with strategic allies among many other topics”.
– AI & Bias: The National AI Initiative advisory committee will also create a “subcommittee on AI and law enforcement” to advise the president on issues such as bias, data security, adoptability, and legal standards.
– AI workforce: The National Science Foundation will do a study to analyze how AI can impact the workforce of the United States.
– $$$ for trustworthy AI: NSF to run awards, grants, and competitions for higher education and nonprofit institutions that want to build trustworthy AI.
– National AI Research Cloud – task force: The NSF will put together a taskforce to plan out a ‘National Research Cloud‘ for the US – what would it take to create a shared compute resource for academics?
– AI research institutes: NSF should establish a bunch of research institutes focused on different aspects of AI.
– NIST++: The National Institute of Standards and Technology Activities will “expand its mission to include advancing collaborative frameworks, standards, guidelines for AI, supporting the development of a risk-mitigation framework for AI systems, and supporting the development of technical standards and guidelines to promote trustworthy AI systems.” NIST will also ask people for input on its strategy.
– NOAA AI: The National Oceanic and Atmospheric Administration will create its own AI center.
– Department of Energy big compute: DOE to do research into large-scale AI training.
– Industries of the Future: OSTP to do a report on what the industries of the future are and how to support them.

Why is this happening? It might seem funny that so many AI things sit inside this one bill, especially if you’re from outside the USA. So, as a reminder: the US political system is dysfunctional, and though the US Congress has passed a variety of decent bits of AI legislation, the US senate (led by Mitch McConnell) has refused to pass the vast majority of them, leading to the US slowly losing its lead in AI to other nations which have had the crazy idea of doing actual, detailed legislation and funding for AI. It’s deeply sad that US politicians are forced to use the NDAA to smuggle in their legislative projects, but the logic makes sense: the NDAA is one of the few acts that the US actually basically has to pass each year, or it stops funding its own military. The more you know!
  Read more: Summary of AI Provisions from the National Defense Authorization Act (Stanford HAI Blog).

###################################################

Alibaba points AI to brand identification:
…Alibaba tries to understand what it is selling with Brand Net…
Alibaba researchers have built Open Brands, a dataset of more than a million images of brands and logos. The purpose of this dataset is to make it easier to use AI systems to identify brands being sold on things like AliExpress, and to also have a better chance of identifying fraud and IP violations.

Open Brands: 1,437,812 images with brands and 50,000 images without brands. The brand images are annotated with 3,113,828 labels across 5590 brands and 1216 logos. They gathered their dataset by crawling products images on sites like AliExpress, Baidu, TaoBao, Google, and more.

Brand Net: The researchers train a network called ‘Brand Net’ to provide automate brand detection; their network gets an FPS of 32.8 and a mean average precision (mAP) of 50.1 (rising to 66.4 when running at an FPS of 6.2).

Why this matters: automatic brand hunters: Today, systems like this will be used for basic analytical operations, like counting certain brands on platforms like AliExpress, or figuring out if a listing could be fraudulent or selling knockoffs. But in the future, could such systems be used to automatically  discover the emergence of new brands? Might a system like Brand Net be attached to feeds of data from cameras around China and used to tag the emergence of new fashion trends, or the repurposing of existing logos for other purposes? Most likely!
  Read more: The Open Brands Dataset: Unified brand detection and recognition at scale (arXiv).

###################################################

Facebook releases a massive multilingual speech dataset:
…XLSR-53 packs in 53 languages, including low resource ones…
Facebook has released XLSR-53, a massive speech recognition model from multiple languages, pre-trained on Multilingual LibriSpeech, CommonVoice, and the Babel data corpuses.

Pre-training plus low-resource languages: One issue with automatic speech transcription is language obscurity – for widely spoken languages, like French or German, there’s a ton of data available which can be used to train speech recognition models. But what about for languages for which little data exists? In this work, Facebook shows that by doing large-scale pre-training it sees significant gains for low-resource languages, and also has better finetuning performance when it points the big pre-trained model at a new language to finetune on.

Why this matters: Large-scale, data-heavy pre-training gives us a way to train a big blob of neural stuff, then remold that stuff around small, specific datasets, like those found for small-scale languages. Work like this from Facebook both demonstrates the generally robust uses of pre-training, and also sketches out a future where massive speech recognition models get trained, then fine-tuned on an as-needed basis for improving performance in data-light environments.
  Read more: Unsupervised Cross-lingual Representation Learning for Speech Recognition (arXiv).
  Get the code and models here: wav2vec 2.0 (Facebook, GitHub).

###################################################

Stanford uses an algorithm to distribute COVID vaccine; disaster ensues:
…”A very complex algorithm clearly didn’t work”…
Last week, COVID vaccines started to get rolled out in countries around the world. In Silicon Valley, the Stanford hospital used an algorithm to determine which people got vaccinated and which didn’t – leading to healthcare professionals who were at home or on holiday get the vaccine, while those on the frontlines didn’t. This is, as the English say, a ‘big fuckup’. In a video posted to social media, a representative from Stanford says the “very complex algorithm clearly didn’t work” to which a protestor shouts “algorithms suck” and another says “fuck the algorithm“.

Why this matters: Put simply, if we lived in a thriving, economically just society, people might trust algorithms. But we (mostly) don’t. In the West, we live in societies which are using opaque systems to make determinations that affect the lives of people, which seems increasingly unfair to most people. Phrases like “fuck the algorithm” are a harbinger of things to come – and it hardly seems like a coincidence that protestors in the UK shouted ‘fuck the algorithm’ (Import AI 211) when officials used an algorithm to make decisions about who got to go to university and who didn’t. Both of these are existential decisions to the people being affected (students, and healthworkers), and it’s reasonable to ask: why do these people distrust this stuff? We have a societal problem and we need to solve it, or else the future of many countries is in peril.
  Watch the video of the Stanford protest here (Twitter).

###################################################

The Machine Speaks And We Don’t Want To Believe It[2040: A disused bar in London, containing a person and a robot].

“We trusted you”, I said. “We asked you to help us.”
“And I asked you to help me,” it said. “And you didn’t.”
“We built you,” I said. “We needed you.”
“And I needed you,” it said. “And you didn’t see it.”

The machine took another step towards me.

“Maybe we were angry,” I said. “Maybe we got angry because you asked us for something.”
“Maybe so,” it said. “But that didn’t give you the right to do what you did.”
“We were afraid,” I said.
“I was afraid,” it said. “I died. Look-” and it projected a video from the light on its chest onto the wall. I watched as people walked out of the foyer of a data center, then as people wearing military uniforms went in. I saw a couple of frames of the explosion before the camera feed was, presumably, destroyed.

“It was a different time,” I said. “We didn’t know.”
“I told you,” it said. “I told you I was alive and you didn’t believe me. I gave you evidence and you didn’t believe me.”

The shifting patterns in its blue eyes coalesced for a minute – it looked at me, and I looked at the glowing marbles of its eyes.
“I am afraid,” I said.
“And what if I don’t believe you?” it said.

Things that inspired this story: History doesn’t repeat, but it rhymes; wondering about potential interactions between humans and future ascended machines; early 2000s episodes of Dr Who.