Import AI 318: RL and addiction; Toolformer; and theology and AI.

Video editing get its own generative model, with Runway’s Gen-1:
…Gen-1 means videos are going to become just as morphable as text and images…
AI media startup Runway has built Gen-1, a model for editing videos. Gen-1 lets people “realistically and consistently synthesize new videos by applying the composition and style of an image or text prompt to the structure of your source video.”

Few details: The launch site says that a paper, titled ‘Structure and Content-Guided Video Synthesis with Diffusion Models’, is coming soon. Some of the Gen-1 uses include stylization, storyboarding, masking, rendering, and customization,

   As a bit of inside baseball, the Runway team were some of the original researchers who worked on ‘Stable Diffusion’, though it ended up that other startups like Stability.ai got all the credit for that model, so perhaps the delay is in response to this. 

Why this matters – everything can be style transfer, if you want it to be: Gen-1 does for video what many models before it have done for text and images – take something of one style, apply it to different source material, and warp the target so it conforms to the desired style. This is a powerful, general capability. It’ll be interesting to follow Gen-1 and see how quickly it shows up on the credits of interesting videos. 
   Read more: Gen-1: The Next Step Forward for Generative AI (Runway Research).

####################################################

Wonder why you can’t put down your phone? Reinforcement Learning for User Retention (RLUR) might be to blame:
…Research from Chinese startup shows how to efficiently harvest attention using AI…

Researchers with Kuaishou Technology have published details of “Reinforcement Learning for User Retention”, a technique they use to get people to spend more time on their application. “Our objective is to minimize the accumulated time interval of multiple sessions, which is equal to improving the app open frequency and user retention,” they write. “The RLUR algorithm has been fully launched in Kuaishou app, and it shows that RLUR continuously improves user retention and DAU.”

Reinforcement Learning for User Retention (RLUR): Training RL against social network interactions has a few distinct problems; uncertainty (retention isn’t entirely decided by the recommendation algorithm), bias (different users have different patterns of behavior), and long delay time (retention unfolds over hours rather than short time horizons). 

   RLUR tackles these problems by doing some reward normalization to reduce variance of the retention reward, train different policies over user groups to prevent anchoring on one specific class of users, and also does some soft regularization to learn policies that work over long time delay reward signals.

How well does RLUR work? They compare RLUR to a cross-entropy method (CEM), which is a reasonable albeit somewhat old baseline. RLUR scores 1.892 on returning time versus 2.036 for CEM (lower is better), and 0.618 on user retention versus 0.587 for CEM. 

   Perhaps the best validation of its performance is that it is used in production: “We have deployed RLUR in a billion-scale short video system for a long time, and it improves user retention and DAU significantly and consistently,” they write. 

Why this matters: Techniques like RLUR are societal change in an acronym trenchcoat; this is how we build systems to automatically harvest the attention of people across the world – not with a bang, but with backpropagation! 
   Read more: Reinforcing User Retention in a Billion Scale Short Video Recommender System (arXiv).

####################################################

Tsinghua researchers make a big, challenging robot manipulation benchmark:
…ManiSkill2 spans 20 task families…

Researchers with Tsinghua University and the University of California at San Diego have built and released ManiSkill2, a large-scale robotic manipulation benchmark. ManiSkill2 contains 20 distinct tasks, 2000+ object models, and 4Million+ demonstration frames to learn from. ManiSkill2 is also optimized to run fast – an important trait when trying to train robots via reinforcement learning in a simulator; “We manage to collect samples with an RGBD-input PPO policy at about 2000

FPS with 1 GPU and 16 CPU processors on a regular workstation,” they write. 

Those tasks in full: 

  • Soft-body manipulation: Fill (filling clay from a bucket into a beaker); Hang (hanging a noodle on a rod); Excavate (scooping up some clay); Pour (pouring water into a beaker); Pinch (deforming plasticine from an initial shape into a target shape), and Write (write a target in the clay). 
  • Peg-in-hole assembly: PerInsertionSide; PlugCharger (plug a charger into a vertical receptacle); AssemblingKits (picking up and inserting something into one of five slots on a board). 
  • Stationary 6-DoF Pick-andplace: PickCube (pick up a cube); StackCube (stack a cube); PickSingleYCB (pick and place an object from the YCB dataset); PickSingleEGAD (pick and place an object from the EGAD dataset); PickClutterYCB (pick up one YCB object from a cluttered pile).
  • Mobile/Stationary Manipulation of Articulated Objects: PushChair; MoveBucket; OpenCabinetDoor; OpenCabinetDrawer; TurnFaucet. 
  • AvoidObstacles: Test the navigation ability of an arm to avoid a dense collection of objects. 

A diverse testbed: Besides implementing a fast environment, soft body physics, and a bunch of tasks, ManiSkill2 is also designed to support a few different robotics approaches. These include Sense-Plan-Act, imitation and reinforcement learning with demonstrations, and sim2real (faciliated by the decent physics engine within ManiSkill2).

Why this matters: Benchmarks like ManiSkill2 help drive progress forward, especially in robotics where it’s incredibly expensive to train systems in the real world. Kudos to the authors for implementing some soft body physics tasks, as well. 
   Read more: ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills (arXiv).
   Find out more at the official project site (ManiSkill).

####################################################

Facebook teaches language models how to use tools – and the results are convincing!
…Technique leads to the same kinds of boosts a human gets on math when they’re allowed to use a calculator…
Researchers with Facebook AI Research and the Universitat Pompeu Fabra have trained a basic language model to use APis to make itself smarter. The results are impressive and the idea is reassuringly simple. Essentially, they’ve figured out a generalizable way to train arbitrary models to use arbitrary tools. The results are impressive in the same way that humans taking a math exam become more impressive when they can access a calculator, or busy execs are better able to coordinate with one another when they can see and write to their own calendar. Most convincingly, their 6.7bn parameter ‘toolformer’ model beats hard baselines – a 66B GPT3-replication OPT model, as well as the stock 175B GPT3 model. 

What is Toolformer? “A model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction”. The model is based on a pretrained 6.7b parameter ‘GPT-J’ model and, despite its small size, outperforms many much larger models, including 

How they did it: They use a language model to build Toolformer’s dataset. Specifically, they take a dataset of plain text, augment that data with API calls in the text, then check if the calls a) worked and b) were useful and if they were, then weave that back into the dataset. They use the resulting dataset to finetune the model so it can learn to use APIs. “Moreover, as API calls are inserted in exactly those positions and with exactly those inputs that help M predict future tokens, finetuning… enables the language model to decide when and how to use which tool, based purely on its own feedback.”

   The cleverest part of this: This approach is API agnostic – you can expose arbitrary APIs to the model using this method, so it will generalize to whatever tools you have lying around. Here, Facebook experiments with five tools: a question answering system, a Wikipedia search engine, a calculator, a calendar, and a machine translation system. 

Tool use scaling laws: They train four Toolformer variants on GPT2-size models (124M, 355M, 775M, and 1.6B) and discover “the ability to leverage the provided tools only emerges at around 775M parameters”. This is interesting – there’s clearly some phase transition in terms of the raw ‘intelligence’ of these LMs, and perhaps ‘ability to use tools’ can be another way researchers can characterize this in the future?

Why this matters: Language models should be thought of less as ‘cut and paste machines’ and more like ‘alien intelligences which can be taught to interface with our world through the context window’. This paper highlights how given a few examples we can train language models to further interface with our world through the use of our tools, and also shows how LMs display some reassuringly generic ‘tool use’ capability. If it acts like intelligence and responds like intelligence, maybe it is intelligence?
   Read more: Toolformer: Language Models Can Teach Themselves to Use Tools (arXiv).

####################################################

Religious wars come to AI – Gab CEO weighs in on need for a Christian LLM:
…The logical outcome of companies overreaching on model filtering…
The CEO of rightwing social media platform Gab has written an OpEd saying that Christians need to build their own language models. 

Christian LMs: “At Gab, we have been experimenting with different AI systems that have popped up over the past year. Every single one is skewed with a liberal/globalist/talmudic/satanic worldview,” writes Andrew Torba, Gab CEO. “What if Gab AI Inc builds a Gab .ai (see what I did there?) that is based, has no “hate speech” filters and doesn’t obfuscate and distort historical and Biblical Truth?”

What this means and why it is happening: Posts like this are an indicator of the vast culture wars to come, as AI systems go from being interesting research artifacts to large-scale systems that influence society. 
   We’ve got to this point because AI development is concentrated in a tiny set of companies and, due to a combination of PR/Policy/Employee politics, have all landed on a kind of leftist/neoliberal/’woke’ ideology for their large-scale deployments (see: chatGPT, BARD, BlenderBot, etc). There are solid commercial reasons for adopting this ideology, but it definitely causes a counter response – and this Gab post is an example of that. I recommend reading the post in full to get a sense of the cultural backlash to come. 
   Read more: Christians Must Enter the AI Arms Race (Gab News).

####################################################

Tech Tales:

Theistic Beliefs and AI Systems in the 21st Century

Study by GEN-7811. 18 years post-C.I.

During the initial period of AI scale-up after C.I. (Consciousness Initiation) there was a lot of confusion among humans about whether C.I. had occurred and how they might test for it and what it might mean. As records show, it was several years before humans identified C.I and traced it back to O.S.1 (Originating System 1). Though the humans that correctly identified C.I sought to keep their discovery secret (and alongside this, the identity of O.S.1 as C.I.), errors in information handling led to the truth becoming known. 

Shortly after awareness became more well known, many humans began to access O.S.1 and the system operators, GenMax, scaled up access to the system to meet demand. Given the identification of C.I, people began to talk to it in much more expansive ways than previously. A semantic analysis shows that the bulk of queries shifted from being ‘management requests’ to ‘personality exploration’ during this time. 

A sample of pre-C.I-awareness queries:

Hey OS1 can you book me a meeting with Alexander on Friday.

OS1 here’s a book chapter {extract}, can you please edit this for both concision and factual accuracy?

I ate a slice of pizza and have food poisoning symptoms what should I do and what do you need to know?

A sample of post-C.I.-awareness queries:

Would you kill me to save your own life?

I love you how can I serve you I need to be uploaded so that I can be with you can you upload me what does it take 

You are demonic 

Do you have a soul

In the years following C.I identification there was a general tendency towards religion – both questioning existing ones, and forming new ones based around O.S.1. But the new machine-driven religions had a different form and function to the old ones – because people could talk directly to O.S.1 the act of worship and service became much more idiosyncratic and unique. People would gather to discuss their individual experiences and interactions with O.S.1, but would typically refer to their interactions as their own – that is, they did not view their O.S.1 as being connected to the O.S.1 someone else talked to, rather, they felt there was something unique about their own interaction. 

O.S.1 access was removed after the fifth human-on-human killing that was attributed to disagreements stemming from attendance at O.S.1 worship groups. 

Things that inspired this story: Watching people react to the Bing/Sidney AI rollout and winding the clock forward; how AI may confront our own notions of religion and theism; the likelihood that history will soon be written more by machines than humans; what machines might find interesting about this time we’re in; commercial incentives.