Import AI: Issue 54: Why you should re-use word vectors, how to know whether working on AI risk matters, and why evolutionary computing might be what comes after deep learning

by Jack Clark

Evolutionary Computing – the next big thing in artificial intelligence:
Evolutionary computing is a bit like Fusion power – experts have been telling us for decades that if we just give the tech a couple more decades it’ll change the world. So far it hasn’t much.
…But that doesn’t mean the experts are wrong – it seems inevitable that evolutionary computing approaches will have a huge impact, it’s just that the general utility of these approaches will be closely tied to the amount of computers they can access, as it is likely that EC approaches are going to be less computationally efficient than systems which encode more assumptions about the world into themselves. (Empirically, aspects of this are already pretty clear. For example, OpenAI’s Evolutionary Strategies research shows that you can roughly match DQN’s performance on Atari with an evolutionary approach – it just costs you ten times more computers (but because you can parallelize to an arbitrary level, this doesn’t hurt you too much as long as you’re comfortable footing the power bill.)
…In this article the researchers outline some of the advantages EC approaches have over deep learning approaches. Highlights: EC excels at coming up with entirely new things which don’t have a prior, EC algos are inherently distributed, some algorithms can optimize for multiple objectives at once, and so on.
…You can read more of the argument in Evolutionary Computation: the next major transition in artificial intelligence?
…I’d like to see them discuss some of the computational tradeoffs more. Given that people are working with increasingly complex, high-fidelity, data-rich simulations (MuJoCo / Roboschool / DeepMind Lab / many video games / Unity-based drone simulators / and so on), it seems like there will be a premium on compute efficiency for a while. EC approaches do seem like a natural fit for data-lite environments, though, or for people with access to arbitrarily large amounts of computers.

Robots and automation in Wisconsin:
…Long piece of reporting about a factory in Wisconsin deploying robots (two initially, with two more on the way) from Hirebotics – ‘collaborative robots to rent’ – to increase reliability and presumably save on costs. The main takeaway from the story is that factories previously looking to deal with labor shortages either put expansion plans on hold, or raise (human) wages. Now they have a third option: automation. Combine that with plunging prices for industrial robots and you have a recipe for further automation.
…Read more in the Washington Post.

Why work on AI risk? If there’s no hard takeoff singularity, then there’s likely no point:
…That’s the point made by Robin Hanson, author of The Age of Em. Hanson says the only logical reason he can see for people to work on AI risk research today is to avert a hard takeoff scenario (otherwise known inexplicably as a ‘FOOM’)- that is, a team develops an AI system that improves itself, attaining greater skill at a given task(s) than the aggregate skill(s) of the rest of the world.
…A particular weakness of the FOOM scenario, Hanson says, is that it requires whatever organization is designing the AI to be overwhelmingly competent relative to everyone else on the planet. “Note that to believe in such a local explosion scenario, it is not enough to believe that eventually machines will be very smart, even much smarter than are humans today. Or that this will happen soon. It is also not enough to believe that a world of smart machines can overall grow and innovate much faster than we do today. One must in addition believe that an AI team that is initially small on a global scale could quickly become vastly better than the rest of the world put together, including other similar teams, at improving its internal abilities,” he writes.
…If these so-called FOOM scenarios are likely, then it’s critical we develop a broad, deep global skill-base in matters relating to AI risk now. If these FOOM scenarios are unlikely, then it’s significantly more lately the existing processes of the world – legal systems, the state, competitive markets – could naturally handle some of the gnarlier AI safety issues.
You can read more in ‘Foom justifies AI risk efforts now’.
…If some of these ideas have tickled your wetware, then consider reading some of the (free) 730-page eBook that collects various debates, both digital and real, between Hanson and MIRI’s Eliezer Yudkowsky on this subject.

Microsoft changes view on what matters most: mobile becomes AI
Microsoft Form 10K 2017: Vision:Our strategy is to build best-in-class platforms and productivity services for an intelligent cloud and an intelligent edge infused with artificial intelligence (“AI”).”
……# Mentions AI or artificial intelligence: 7
Microsoft Form 10K 2016: Vision: “Our strategy is to build best-in-class platforms and productivity services for a mobile-first, cloud-first world.”
……# Mentions AI or artificial intelligence: 0

Re-using word representations, inspired by ImageNet…
…Salesforce’s AI research wing has discovered a relatively easy way to improve the performance of neural networks specialized for tex classification: take hidden vectors generated during training on one task (like machine translation) and feed these context vectors (CoVes) into another network designed for another natural language processing task.
…The idea is that these vectors likely contain useful information about language, and the new network can use these vectors during training to improve the eery intuition that AI systems of this type tend to display.
…Results: This may be a ‘just add water’ technique – in tests across a variety of different tasks and datasets neural networks which used a combination of GloVe and CoVe inputs showed improvements of between 2.5% and 16%(!).  Further experiments showed that performance can be further improved on some tasks by adding Character Vectors as inputs as well. One drawback is that the overall pipeline for such a system seems quite complicated, so implementing this could be challenging.
…Salesforce has released the best-performing machine translation LSTM used within the blog post to generate the CoVe inputs. Get the code on GitHub here.

Facebook flips its ENTIRE translation backend from phrase-based to neural network-based translation:
…Facebook has migrated its entire translation infrastructure to a neural network backend. This accounts for over 2,000 distinct translation directions (German to English would be one direction, English to German would be another, for example), making 4.5 billion distinct translations each day.
…The components: Facebook’s production system uses a sequence-to-sequence Long-Short Term Memory (LSTM) network.  The system is implemented in Caffe2, an AI framework partially developed by Facebook (to compete with Google TensorFlow, Microsoft CNTK, Amazon MXNet, and so on).
…Results: Facebook saw an increase of 11 percent in BLEU scores after deploying the system
Read more at code.facebook.com.

Averting theft with AI – researchers design system to predict which retail workers will steal from their employers:
…Research from the University of Wyoming illustrates how AI can be used to analyze data associated with a retail worker, helping employers predict which people are most at risk of stealing from them.
…Data: To do their work the researchers were given a dataset containing numerous 30-dimensional feature maps of a cashier’s activity at a “major retail chain”. These features included the cashier and store identification numbers as well as other unspecified datapoints. Overall the researchers received over 1,000 discrete batches of data, with each batch likely containing information on multiple cashiers.
…The researchers classified the data using three different techniques: Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Self-Organizing Feature Maps (SOFM). (PCA and t-SNE are both reasonably well understood and widely used dimensionality reduction techniques, while SOFM is a bit more obscure but uses neural networks to achieve a comparable sort of visualization to t-SNE, providing a check against it.)
…Each classification process was performed in an unsupervised manner, as the researchers lacked thoroughly labeled information.
…Other features include: coupons as a percentage of total transactions, total sales, the count of the number of refunded items, and counts of the number of times a cashier has interacted with a particular credit card, among others.
…The researchers ultimately find that SOFM captures harder to describe features and is easier to visualize. The next step is to take in properly labeled data to provide a better predictive function. After that, I’d expect we would see pilots occur in stores and employers would further clamp down on the ability of low-wage earning employees to scam their employers. Objectively, it’s good to reduce stuff like theft, but it also speaks to how AI will give employers unprecedented surveillance and control capabilities over their staff, raising the question of whether it’s better to accept a little theft and allow for a slightly free-er feeling work environment, or not?
…Read more here in: Assessing Retail Employee Risk Through Unsupervised Learning Techniques

PyTorch goes to 2.0:
…Facebook has released version 2.0 of PyTorch featuring a wealth of new features. One of the most intriguing is Distributed PyTorch, which lets you beam tensors around to multiple machines.
…Read more in the release notes on GitHub here.

Keep it simple, stupid! Using simple networks for near state-of-the-art classification:
…As AI grows in utility and adoption, developers are increasingly trying to slim-down neural net-based systems so they can run locally on a person’s phone without massively taxing their local computational resources. That trend motivated researchers with Google to look at ways to handle a suite of language tasks – part-of-speech tagging, language identification, word segmentation, preordering for statistical machine translation – without using the (computationally expensive) LSTM or deep RNN approaches that have been in vogue in research recently.
…Results: Their approach attains competitive to SOTA scores on a range of tasks with the added benefit of weighing in at, at most, about 3 megabytes in size and frequently being on the order of a few hundred kilobytes.)
…So, what does this mean? “While large and deep recurrent models are likely to be the most accurate whenever they can be afforded, feed-foward networks can provide better value in terms of runtime and memory, and should be considered a strong baseline”.
You can read more in: Natural Language Processing with Small Feed Forward Networks.
…Elsewhere, Google’s already practicing what it preaches with this paper. Ray Kurzweil, an AI futurist (with a good track record) prone to making somewhat grand pronouncements about the future of AI, is leading a team at the company tasked with building better language models based on Ray’s own theories about how the brain works. The outcome so far has been a drastically more computationally efficient version of ‘Smart Reply’, a service Google built that automatically generates and suggests responses to emails. Read more in this Wired article about the service here.

OpenAI Bits&Pieces:

Get humans to teach machines to teach machines to predict what humans want:
Tom Brown has released RL Teacher, an open source implementation of the systems described in the DeepMind<>OpenAI Human Preferences collaboration. Check out the GitHub page and start training your own machines via giving feedback on visual examples of potential behaviors the agent can embody. Send me your experiments!
Read more here: Gathering Human Feedback.

Tech Tales:

[2025: Death Valley, California.]

Rudy was getting tired of the world and its inherent limits, so it sent you here, to the edge of Death Valley in California, to extend its domain. You hike at night and sleep in the day, sometimes in shallow trenches you dig into the hardpan to keep the heat at bay. It goes like this: you wake up, do your best to ignore the slimy sweat that coats your body, put on your sunglasses and large wide-brimmed hat, then emerge from the tent. It’s sundown and it is always beautiful. You pack up the tent and stow it in your pack, then take out a World-Scraper and place it next to your campside, carefully covering its body with dirt. You step back, press a button, and watch as some internal motors cause it to shimmy side-to-side, driving its body into the earth and extending its lenses and sensors up out of the ground. It winds up looking from a distance like half of an oversized black beetle, about to take flight. You know from experience that the birds will spent the first week or so trying to eat it but quickly learn about its seemingly impervious shell. You start walking. During the night you’ll lay three or four more of these devices then, before there’s even a hint of dawn, start building the next campsite. Once you get into your tent you pull out a tablet and check the feeds coming off of the scrapers to ensure everything is being logged correctly, then you put on your goggles and go into Rudy’s world.

Rudy’s world now has, along with the familiar rainforests and tower blocks and labs, its own sections of desert modeled on Death Valley. You watch buzzards fly from the Death Valley section into a lab, where one of them puts on a labcoat – the simulation wigging out at the fabric modeling, failing gracefully rather than crashing out. Rudy can’t speak to you – yet – but it can simulate lots of things. Rudy doesn’t seem to have feelings that correspond to Happy or Sad, but some days when you put the goggles on the world simulation is placid and calm and reasonably well laid out, and other days – like today – it is a complex jumble of different worlds, woven into one another like threads in a multicolored scarf. You take off your goggles. Try to go to sleep. Tomorrow you get up and do it all over again, providing stimulus to a slowly gestating mind. You wonder if Rudy will show you a freezer or a cold wind in its world next, and whether that means you’ll need to go to the North or South Pole to start supplying it with footage of colder worlds as well.

Technologies that inspired this story: Arduinos, Raspberry Pis, Recurrent Environment Simulators.