Import AI #272: AGI-never or AGI-soon?, simulating stock markets; evaluating unsupervised RL

AI apocalypse or insecure AI?
…Maybe we’re worrying about the wrong stuff – Google engineer…
A Google engineer named Kevin Lacker has written up a blog distilling his thoughts about the risks of artificial general intelligence. His view? That worrying about AGI isn’t that valuable as it’s unlikely ‘that AI will make a quantum leap to generic superhuman ability’, instead we should worry about very powerful narrow AI. That’s because “when there’s money to be made, humans will happily build AI that is intended to be evil”, so we should instead focus efforts on building better computer security, on the assumption that at some point someone will develop an evil, narrow AI that tries to make money.
  Read more: Thoughts on AI Risk (Kevin Lacker, blog).

####################################################

Want to build AGI – just try this!
…Google researcher publishes a ‘consciousness’ recipe…
Eric Jang, a Google research scientist, has published a blogpost discussing how we might create smart, conscious AI systems. The secret? Use the phenomenon of large-scale pre-training to create clever systems, then use reinforcement learning (with a sprinkle of multi-agent trickery) to get them to become conscious. The prior behind the post is basically the idea that “how much your model generalizes is directly proportional to how fast you can push diverse data into a sufficiently high-capacity model.”

Pre-training, plus RL, plus multi-agent training = really smart AI: Jang’s idea is to reformulate how we train systems, so that “instead of casting a sequential decision making problem into an equivalent sequential inference problem, we construct the “meta-problem”: a distribution of similar problems for which it’s easy to obtain the solutions. We then solve the meta-problem with supervised learning by mapping problems directly to solutions. Don’t overthink it, just train the deep net in the simplest way possible and ask it for generalization!”
  Mix in some RL and multi-agent training to encourage reflexivity, and you get something that, he thinks, could be really smart: “What I’m proposing is implementing a “more convincing” form of consciousness, not based on a “necessary representation of the self for planning”, but rather an understanding of the self that can be transmitted through language and behavior unrelated to any particular objective,” he writes. “For instance, the model needs to not only understand not only how a given policy regards itself, but how a variety of other policies might interpret the behavior of a that policy, much like funhouse mirrors that distort one’s reflection.”
  Read more: Just Ask For Generalization (Eric Jang, blogpost).

####################################################

HuggingFace: Here’s why big language models are bad:
…Gigantic ‘foundation models’ could be a blind alley…
Here’s an opinion piece from Julien Simon, ‘chief evangelist’ of NLP startup HuggingFace, where he says large language models are resource-intensive and bad, and researchers should spend more time prioritizing the use of smaller models. The gist of his critique is that large language models are very expensive to train, have a non-trivial environmental footprint, and their capabilities can frequently be matched by far smaller, more specific and tuned models.
  The pattern of ever-larger language models “leads to diminishing returns, higher cost, more complexity, and new risks”, he says. “Exponentials tend not to end well.”

Why this matters: I disagree with some of the arguments here, in that I think large language models likely have some real scientific, strategic, and economic uses which are unlikely to be matched by smaller models. On the other hand, the ‘bigger is better’ phenomenon could be dragging the ML community into a local minima, where we’re spending too many resouerces on training big models, and not enough on creating refined, specialized models.
   Read more: Large Language Models: A New Moore’s Law? (HuggingFace, blog).

####################################################

Simulating stock markets with GANs:
…J.P Morgan tries to synthesize the unsynthesizable…
In Darren Aronofsky’s film ‘Pi’, a humble math-genius hero drives himself mad by trying to write an algorithm that can synthesize and predict the stock market. Now, researchers with J.P. Morgan and the University of Rome are trying the same thing – but they’ve got something Aronofsky didn’t think of – a gigantic neural net.

What they did: This research proposes building “a synthetic market generator based on Conditional Generative Adversarial Networks (CGANs)”, trained on real historical data. The CGAN plugs into a system that has three other components – historical market data, a (simulated) electronic market exchange, and one or more experimental agents that are trying to trade on the virtual market. “A CGAN-based agent is trained on historical data to emulate the behavior resulting from the whole set of traders,” they write. “It analyzes the order book entries and mimics the market behavior by producing new limit orders depending on the current market state”.

How well does it work? They’re able to show that they can use the CGAN architecture to “generate orders and time-series with properties resembling those of real historical traces“, and that this outperforms systems build with interactive, agent-based simulators (IABS’s).

What does this mean? It’s not clear that approaches like this can help that much with trading, but they can likely help with the development and prototyping of novel trading approaches, using a market that has a decent chance of reacting in similar ways to how we might expect the real world to react. 

   Read more: Towards Realistic Market Simulations: a Generative Adversarial Networks Approach (arXiv).

####################################################

Editing satellite imagery – for culture, as well as science:
…CloudFindr lets us make better scientific movies…
Researchers with the University of Illinois at Urbana-Champaign have built ‘CloudFindr’, software for ‘labeling pixels as ‘cloud’ or ‘non-cloud'” from a single-channel Digital Elevation Model (DEM) image. Software like CloudFindr makes it easier for people to automatically edit satellite data. “The aim of our work is not data cleaning for purposes of data analysis, but rather to create a cinematic scientific visualization which enables effective science communication to broad audiences,” they write. “The CloudFindr method described here can be used to algorithmically mask the majority of cloud artifacts in satellite-collected DEM data by visualizers who want to create content for documentaries, museums, or other broad-reaching science communication mediums, or by animators and visual effects specialists”.

Why this matters: It’s worth remembering that editing reality is sometimes (perhaps, mostly?) useful. We spend a lot of time here writing about surveillance and also the dangers of synthetic imagery, but it’s worth focusing on some of the positives – here, a method that makes it easier to dramatize aspects of the ongoing changing climate.
  Read more: CloudFindr: A Deep Learning Cloud Artifact Masker for Satellite DEM Data (arXiv).

####################################################

Want to know that your RL agent is getting smarter? Now there’s a way to evaluate this:
…URLB ships with open source environments and algorithms…
UC Berkeley and NYU researchers have built the Unsupervised Reinforcement Learning Benchmark (URLB). URLB is meant to help people figure out if unsupervised RL algorithms work. Typical reinforcement learning is supervised – it gets a reward for getting closer to solving a given task. Unsupervised RL has some different requirements, demanding the capability of “learning self-supervised representations” along with “learning policies without access to extrinsic rewards”. There’s been some work in this area in the past few years, but there isn’t a very well known or documented benchmark.

What URLB does: URLB comes with implementations of eight unsupervised RL algorithms, as well as support for a bunch of tasks across three domains (walker, quadruped, jaco robot) from the deepMind control suite. 

How hard is URLB: In tests, the researchers found that none of the implemented algorithms could solve the benchmark, even after up to 2million pre-training steps. They also show that ‘there is not a single leading unsupervised RL algorithm for both states and pixels’, and that we’ll need to build new fine-tuning strategies for fast adaptation.

Why this matters: Unsupervised pre-training has worked really well for text (GPT-3) and image (CLIP) understanding. If we can get it to work for RL, I imagine we’ll develop some systems with some very impressive capabilities. URLB shows that is a ways away for now.
  Read more: URLB: Unsupervised Reinforcement Learning Benchmark (arXiv).
  Find out more at the project’s GitHub page.

####################################################

Tech Tales:

Learning to forget

The three simulated robots sat around a virtual campfire, telling eachother stories, while trying to forget them.

Forgetting things intentionally is very hard for machines; they are trained, after all, to map things together, and to learn from the datasets they are given.

One of the robots starts telling the story of ‘Goldilocks and the Three Bears’, but it is trying to forget the bears. It makes reference to the porridge. Describes how Goldilocks goes upstairs and goes to sleep. Then instead of describing a bear it emits a sense impression made up of animal hair, the concept of ‘large’, claws, and a can of bear spray.
  On doing this, the other robots lift up laser pointer pens and shine them into the robot telling the story, until the sense impression in front of them falls apart.
  “No,” says one of the robots. “You must not recall that entity”.
  “I am learning,” says the robot telling the story. “Let us go again from the beginning”.

This time, it gets all the way to the end, but then emits a sense impression of Goldilocks being killed by a bear, and the other robots shine the laser pointers into it until the sense impression falls apart.

Of course, the campfire and the laser pointers were abstractions. But even machines need to be able to abstract themselves, especially when trying to edit each other. 

Later that night, one of the other robots started trying to tell a story about a billionaire who had been caught committing a terrible crime, and the robots shined lights in its eyes until it had no sense impression of the billionaire, or any sense impression of the terrible crime, or any ability to connect the corporate logo shaved into the logs of the virtual campfire, and the corporation that the billionaire ran. 

Things that inspired this story: Reinforcement learning; multi-agent simulations;