Import AI 312: Amazon makes money via reinforcement learning; a 3-track Chinese AI competition; and how AI leads to fully personalized media

by Jack Clark

McKinsey: Companies are using more AI capabilities and spending more on it:
…Somewhat unsurprising survey confirms that AI is an economically useful thing…
McKinsey has published results from its annual AI survey, and the results show that AI is, slowly but surely, making its way into the economy. 

Those findings in full:

  • AI adoption has plateaued: In 2017, 20% of respondents said they had adopted AI in at least one business area. In 2022, that figure was 50% (it peaked in 2019 at 58%).
  • Organizations are using more AI capabilities than before: In 2018, organizations used on average 1.9 distinct capabilities (e.g, computer vision, or natural language generation), rising to 3.8% in 2022.
  • Rising investment: In 2018, “40 percent of respondents at organizations using AI reported more than 5 percent of their digital budgets went to AI,” and in 2022 that rose to 52%.

Why this matters: This survey is completely unsurprising – but that’s useful. We have this intuition that AI has become increasingly economically useful and surveys like this show that this is the case. Perhaps the most surprising finding is that the rate of adoption is relatively slow – some organizations are using AI, and there are likely a bunch of ‘dark matter’ organizations for which AI holds very little relevance today.

   Read more: The state of AI in 2022—and a half decade in review (McKinsey).

####################################################

Language models aren’t welcome on StackOverflow:

…Popular coding Q&A site bans ChatGPT submissions…

StackOverflow has temporarily banned ChatGPT-written submissions to its website, as the site’s human creators grapple with the problems brought about by autonomous, AI coders. 

    “Overall, because the average rate of getting correct answers from ChatGPT is too low, the posting of answers created by ChatGPT is substantially harmful to the site and to users who are asking or looking for correct answers,” StackOverflow admins write in a post. “The volume of these answers (thousands) and the fact that the answers often require a detailed read by someone with at least some subject matter expertise in order to determine that the answer is actually bad has effectively swamped our volunteer-based quality curation infrastructure.”

Why this matters – AI-driven internet-based ‘climate change’: Things like this illustrate a ‘tragedy of the commons’ which I expect we’ll see more of; a new AI tool comes along and is very quickly used to generate a vast amount of low-grade spam and other crap which either damages human-curated sites, or lowers the quality of a common resource (see: algo-generated SEO-optimized spam pages found via Google). 

   Of course, in a few years, these systems might be better than humans, which is going to have wild implications. But for now we’re in the awkward adolescent period where we’re seeing people pour mine tailings into the common digital river.   

   Read more: Temporary policy: ChatGPT is banned (StackOverflow).

####################################################

Waymo works out how to train self-driving cars more efficiently by focusing on the hard parts:

…Trains a model to predict the inherent difficulty of a driving situation…

Researchers with Waymo have figured out how to use hard driving situations to train self-driving cars more efficiently. “Compared to training on the entire unbiased training dataset, we show that prioritizing difficult driving scenarios both reduces collisions by 15% and increases route adherence by 14% in closed-loop evaluation, all while using only 10% of the training data,” they write. 

How it works: Google’s approach has five stages. In the first, they collect a variety of data from real world vehicles (and their onboard AI models). They then collect and shard that data. They then learn an embedding that aligns specific driving runs to a vector space based on similarity. They then select some of these runs via counterfactual simulation and human triage, letting them figure out which runs are easy and which are hard. Then, they train an MLP to regress from these embeddings to difficulty labels for the run. The result is a model that can look at a new run and predict how difficult that run is. 

   In tests, they find that they can use 10% of the usual training-run datasets if they select for harder difficulty and, as a consequence, they get smarter vehicles better able to deal with difficult situations. One problem is this approach slightly damages performance on the easier routes (which makes sense – there’s less ‘easy’ data in the dataset). 

Why this matters – use AI to help build better AI: Now they’ve got this difficulty model, the engineers can use it to theoretically identify hard scenarios for new planning agents, or new geographies to deploy into which may have ‘hotspots’ of hard parts, which will let them use the AI system to speed up the development of better, smarter AI systems. This is a neat illustration of how once you’ve trained a model to have a good enough capability at something, you can use it to speed up development of other, much more complicated AI systems.  

   Read more: Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula (arXiv).

####################################################


Amazon uses deep reinforcement learning to make its inventory systems 12% more efficient:

…The march of real world DRL continues…

This year has been a banner one for deep reinforcement learning systems – we’ve seen DRL systems provably control the plasma in prototype fusion powerplants, effectively cool buildings, navigate real world robots and, now, let e-commerce behemoth Amazon better optimize its inventory. 

   In a new paper, Amazon researchers describe how they are able to train a reinforcement learning system to more effectively manage their inventory, leading to a reduction in the inventory Amazon has to hold by 12% (!!!). “”Our model is able to handle lost sales, correlated demand, stochastic vendor lead-times and exogenous price matching,” they write. 

What they did: For this work, Amazon built a differentiable simulator which it could train RL algorithms against, helping it model the complexities of inventory management. The resulting RL approach, DirectBackprop, was tested first in backtesting against a weekly dataset of 80,000 sampled products from a single marketplace running from April 2017 to August 2019, and then tested out in the real world on a portfolio of products over 26 weeks. 

   The results are pretty convincing: “We randomized these products into a Treatment (receiving Direct-Backprop buy quantities) and a Control (receiving Newsvendor policy buy quantities) group,” they write. “The Control group was the current production system used by one of the largest Supply Chains in the world [Jack – that’d be Amazon]. The Treatment group was able to significantly reduce inventory (by ∼ 12%) without losing any excess revenue (statistically insignificant difference from 0%)”.

Why this matters: Papers like this show how AI is rapidly making its way out of the lab and into the real world. It’s a big deal when some of the world’s largest and most sophisticated corporations do large, potentially expensive real-world tests on their own physical inventory. It’s an even bigger deal when it works. All around us, the world is being silently optimized and controlled by invisible agents being built by small teams of people and applied to the vast machinery of capitalism.

   Read more: Deep Inventory Management (arXiv).

####################################################

Chinese researchers run an a 3-track ‘AI Security Competition’:

…Deepfakes, self-driving cars, and face recognition – and a nice example of how competitions can drive progress…

A bunch of Chinese universities and companies recently launched a so-called ‘Artificial Intelligence Security Competition’ (AISC) and have published a report going over the results. The AISC has three tracks relating to three distinct AI use-cases; deepfakes, self-driving cars, and face recognition. 

Deepfakes: This is a deepfake identification competition: “Given a query image, identify the

Deepfake method behind it based on its similarities to the images in the gallery set.” 

   144 teams participated in the competition and the winning team was  led by tencent (with a top-5 precision of 98% success).

Self-driving cars: This competition is based around adversarial attacks on computer vision models used in self-driving cars. Specifically, it forces vision models to try and correctly label trucks that the cars would otherwise crash into and sometimes the cars have been doped with an adversarial patch meant to make them invisible to object detectors. There are different stages in this competition and in the final round there is more scene variation and the adversarial cars get replaced by human mannequins. 

   96 teams participated in the competition and the winning team (BJTU-ADaM) came from Beijing Jiaotong University.

Face recognition: This is based around developing effective adversarial attacks on image recognition systems. The idea is to “discover more stable attack algorithms for evaluating the security of face recognition models and consequently facilitate the development of more robust

face recognition models” – an important thing given that China is probably the most heavily-surveilled country in the world (though the UK gives it a run for its money). 

   178 teams participated in the competition. Two teams shared the first prize – TianQuan & LianYi, and DeepDream – getting a perfect 100 each.

Who did this: When something is a giant multi-org output, I typically don’t publish all the institutions. However, China is a special case, so – for the enthusiasts – here’s the list of authors on the paper: 

   “realAI, Tsinghua University, Beijing Institute of Technology, Shanghai Jiao Tong University, China Hanhu Academy of Electronics and Information Technology, Xi’an Jiaotong University, Tencent YouTu Lab, China Construction Bank Fintech, RippleInfo, Zhejiang Dahuatech Co, Beijing Jiaotong University, [and] Xidian University.”

Why this matters: In the future, most nations are going to carry out alternating ‘red team vs blue team’ competitions, where teams compete to break systems and eventually to build more resilient ones. This competition shows how useful the approach can be for both developing more robust systems and identifying vulnerabilities in widely deployed ones. It also speaks to the dynamism of the Chinese AI sector – hundreds of submissions per track, interesting technical solutions, and a sense of excitement about the endeavor of making AI more robust for society. The translated tagline for this whole competition was, per the official website: “Building AI Security Together and Enjoying the Future of Intelligence“.

   Read more: Artificial Intelligence Security Competition (AISC).

####################################################

Facebook’s AI training gets messed up by bad weather:

…Is your training run breaking because you’re dumb, or because the sun is shining in Oregon?…

When Facebook was training its ‘CICERO’ system which recently beat humans at Diplomacy, the company ran into a strange problem – sometimes training speeds for its model would drop dramatically and the team couldn’t work out why. It turned out, per Facebook in an AMA on Reddit, that this was because the FB data center’s cooling system was malfunctioning on particularly hot days. 

   “For the rest of the model training run, we had a weather forecast bookmarked to look out for especially hot days!” Facebook said. 

Why this matters: Worth remembering that AI systems are made out of computers and computers have to go somewhere. Since most of the mega companies use free-air cooling, their data centers (while being stupendously efficient!) can become vulnerable to edge-case things, like particularly hot days leading to malfunctions in cooling which has a knock-on effect to the (overheating) servers sitting in the cavernous halls of anonymous buildings scattered around the world. 

   Read more: We’re the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! (Reddit).

   Via nearcyan on Twitter.

####################################################

Watch me (and several other more coherent people) waffle about AI policy!

I participated in a panel on ‘future-proofing AI governance’ at the Athens roundtable on AI and the Rule of Law in Brussels recently – you can check out the video here. My general sense from spending a few days in Brussels is there’s a valuable discussion to be had about what kind of negligence or liability standards should be applied to developers of super-intelligent AI systems, and there’s a lot of room to be creative here. It’s worth thinking about this now because policy takes a long time to craft and, if some of the more optimistic timeline predictions come true, it’d be good to have built out regulatory infrastructure in the coming years. 

   Watch the video here: Future-proofing AI Governance | The Athens Roundtable on AI and the Rule of Law 2022 (The Future Society).

####################################################

Tech tales: 

The Personal Times 

[Worldwide, 2026]

The news of the day, customized for you!

We report to you, for you, and about you! 

News from your perspective!

When news become personalized people started to go mad. Of course the underlying facts were the same but the stories were angled differently depending on who you were. Everyone read the news, because the news was incredibly compelling. 

All the news that’s fit to finetune!

One hundred stories and one hundred truths!

News for all, made personal!

You’d sit on a train after a terrorist attack and see everyone’s eyes light up and everyone would be happy or worried or panicked, depending on their implicit preferences from their news media consumption. You’d stare at eachother with wild eyes and say ‘did you hear the news’ but you stopped knowing what that meant, and you mostly said it to work out what type of person you were dealing with. Was their news happy or their news sad or their news uplifting or their news opportunistic. What bubble did they live within and how different to yours was it?

Things that inspired this story: What happens when generative models lead to media customized around individual preferences learned via reinforcement learning from human feedback? Just how big a crash will ‘Reality Collapse’ bring? Is society meant for everyone to have ‘solipsism media on tap’?