Import AI: #75: Virtual Beijing with ParallelEye, NVIDIA tweaks GPU licensing, and saving money by getting AI to help humans label data generated by AI
by Jack Clark
Synthetic cities and what they mean: Virtual Beijing via ParallelEye:
…As AI moves from the era of data competition to the era of environment competition researchers try to work out how best to harvest real-world data…
Researchers with The State Key Laboratory for Management and Control of Complex Systems, within the Chinese Academy of Sciences in Beijing have published details on the ‘ParallelEye’ dataset, a 3D urban environment modeled on Beijing’s Zhongguancun region. They constructed the dataset by grabbing the available Open Street Map (OSM) layout data for a 2km*3km area, then modeled that data using CityEngine, and built the whole environment in the Unity3D engine.
This seems like an involved, overly human-in-the-loop process, compared to other approaches at large-scale 3D environment design, like UofT/Uber’s semi-autonomous data-augmentation techniques for creating a giant 3D map of Toronto, to the work being done by others on designing generators for procedural homesteads. However, it does provide a relatively labour-intensive pipeline for scraping and ingesting the world without needing expensive sensors and/or satellites. If enough datasets were constructed via this method it’s likely you could use deep learning approaches to automate the pipeline, like the transforms from OSM maps into full 3D models into Unity.
The researchers carry out some basic testing of ParallelEye by creating synthetic cameras meant to be like those mounted on self-driving cars or in large-scale surveillance systems, then testing usability around this. They leave the application of actual AI techniques to the dataset for future research.
– Read more: The ParallelEye Dataset: Constructing Large-Scale Artificial Scenes for Traffic Vision Research (Arxiv).
Intel releases its 3D environment, the ‘CARLA’ simulator:
…It is said that any sufficiently large wealthy company is now keen to also own at least one bespoke AI simulator. Why? Good question!…
Intel recently released code for CARLA, a 3D simulator for testing and evaluating AI systems, like those deployed in self-driving cars.
“CARLA is an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems,” Intel writes.
The AI environment world right now is reminiscent of the world of programming languages a few years ago, or the dynamic and large ecosystem of supercomputer vendors a few years before that; we’re in an initial period of experimentation in which many ideas are going to be tried and it’ll be a few years yet before things shake out and a clear winner emerges. The world of AI programming frameworks is going through its own Cambrian explosion right now, though is further on in the process than 3D environments, as developers appear to be consolidating around Tensorflow and pyTorch, while dabbling in a bunch of other (sometimes complementary) frameworks, eg Caffe/Torch/CNTK/Keras/MXNet.
Question: Technical projects have emerged to help developers transfer models built in one framework into another, either via direct transfer mechanisms or through meta-abstractions like ONNX. What would be the equivalent for 3D environments beyond a set of resolution constraints and physics constants?
– Read more: CARLA: An Open Urban Driving Simulator (PDF).
– Read more: CARLA 0.7 release notes.
Google Photos: 1, Clarifai Forevery: 0
…Competing on photo classification seems to be a risky proposition for startups in the era of mega cloud vendors…
Image recognition startup Clarifai is shutting down its consumer-facing mobile app Forevery to focus instead on its own image recognition services and associated SDKs. This seems like a small bit of evidence for how large companies like Google or Apple can overwhelm startups by competing with them on products that tap into large-scale ML capabilities – something Google and Apple are reasonably well positioned to use, whereas smaller startups will struggle.
– Read more: Goodbye Forevery. Hello Future. (Blog).
NVIDIA says NO to consumer graphics card in big datacenters:
…NVIDIA tweaks licensing terms to discourage people from repurposing its cheaper cards for data center usage…
For several years NVIDIA has been the undisputed king of AI compute, with developers turning to its cards en masse to train neural networks, primarily because of the company’s significant support for scientific/AI computation via tools like CUDA/cuDNN, etc.
During the last few years NVIDIA has courted these developers by producing more expensive cards designed for 24/7 data center operation, incorporating enterprise-grade features relating to reliability and error correction. This is to help NVIDIA charge higher prices to some of its larger-scale customers. But adoption of these cards has been relatively slight as many developers are instead filling vast data centers with the somewhat cheaper consumer cards and accepting a marginally higher failure rate in exchange for a better FLOPS/dollar ratio.
So now NVIDIA has moved from courting these developers to seeking to force them to change their buying habits: the company confirmed last week to CNBC that it recently changed its user agreement to help it extract money from some of the larger users.
“We recently added a provision to our GeForce-specific EULA to discourage potential misuse of our GeForce and TITAN products in demanding, large-scale enterprise environments,” the company said in a statement to CNBC. “We recognize that researchers often adapt GeForce and TITAN products for non-commercial uses or other research uses that do not operate at data center scale. NVIDIA does not intend to prohibit such uses.”
– Read NVIDIA’s statement in full in this CNBC article.
The Titan(ic) Price premium: Even within NVIDIA’s desktop card range there is a significant delta in performance among cards, even when factoring in dollars/flops, as this article comparing a 1080 versus a Titan V shows.
– Read more: Titan V vs 1080 Ti — Head-to-head battle of the best desktop GPUs on CNNs. Is Titan V worth it? 110 TFLOPS! no brainer, right?
A ‘Ray’ of training light emerges from Berkeley:
…Systems geeks reframe reinforcement learning programming for better performance, scaling…
Berkeley researchers have released Ray RLLib, software to make it easier to set up and run reinforcement learning experiments. The researchers, which include the creator of the ‘Spark’ data processing engine, say that reinforcement learning algorithms are somewhat more complicated than typical AI models (mostly classification models trained via supervised learning) and that this means there’s some value in designing a framework that optimizes the basic components commonly used in RL. “RLlib proposes using a task-based programming model to let each component control its own resources and degree of parallelism, enabling the easy composition and reuse of components,” they write.
“Unlike typical operators in deep learning frameworks, individual components may require parallelism across a cluster, use a neural network defined by a deep learning framework, recursively issue calls to other components, or interface with black-box third-party simulators,” they write. “Meanwhile, the main algorithms that connect these components are rapidly evolving and expose opportunities for parallelism at varying levels. Finally, RL algorithms manipulate substantial amounts of state (e.g., replay buffers and model parameters) that must be managed across multiple levels of parallelism and different physical devices.”
The researchers test out RLLib by re-implementing a bunch of major algorithms used in reinforcement learning, like Proximal Policy Optimization, Evolution Strategies, and others. They also try to re-implement entire systems, such as AlphaGo.
Additionally, they designed new algorithms within the framework: “We tried implementing a new RL algorithm that runs PPO updates in the inner loop of an ES optimization step that randomly perturbs the PPO models. Within an hour, we were able to deploy to a small cluster for evaluation. The implementation took only ∼50 lines of code and did not require modifying the PPO implementation, showing the value of encapsulation.”
– Read more: Ray RLLib (documentation).
– Read more: Ray RLLib: A Composable and Scalable Reinforcement Learning Library (Arxiv).
Lethal autonomous weapons will “sneak up on us” regardless of policy, says former US military chap:
…Robert Latiff, a former major general in the Air Force who also worked at the National Reconnaissance Office, shares concerns and thoughts re artificial intelligence…
Artificial intelligence is going to revolutionize warfare and there are many indications that the US’s technological lead in this area is narrowing, according to former NRO chap Robert Latiff. These technologies will also raise new moral questions that people must deal with during war.
“I think that artificial intelligence and autonomy raise probably the most questions, and that is largely because humans are not involved. So if you go back to Aquinas and to St. Augustine, they talk about things like “right intention.” Does the person who is doing the killing have right intention? Is he even authorized to do it? Are we doing things to protect the innocent? Are we doing things to prevent unnecessary suffering? And with autonomy and artificial intelligence, I don’t believe there’s anybody even in the business who can actually demonstrate that we can trust that those systems are doing what they should be doing,” he said in an interview with Bloomberg.
“The whole approach that the DoD is taking to autonomy worries me a lot. I’ll explain: They came out with a policy in 2012 that a real human always has to be in the loop. Which was good. I am very much against lethal autonomy. But unlike most of these policies, there was never any implementing guidance. There was never any follow-up. A Defense Science Board report came out recently that didn’t make any recommendations on lethal autonomy. In all, they are unusually quiet about this. And frankly, I think that’s because any thinking person recognizes that autonomy is going to sneak up on us, and whether we agree that it’s happening or not, it will be happening. I kind of view it as a head-in-the-sand approach to the policies surrounding lethal autonomous weapons, and it cries out for some clarification.”
– Read more here: Nobody’s Ready for the Killer Robot (Bloomberg).
Facebook designs AI to label data produced by other AI systems to create training data to train future AI systems:
…Almost the same accuracy, but at 95% of the cost…
One problem faced by AI researchers is that once they have a particular type of data they want to gather they need to figure out how to train and pay a human team to go and hand-annotate lots and lots of data. This can be very expensive, especially as the size of datasets that AI researchers work with grows.
To alleviate this, Facebook and MIT researchers have compiled and released SLAC, a dataset labelled partially via AI-enabled automation techniques. SLAC contains over 200 action classes (taken from the ActivityNet-v1.3 dataset) and spans over 1.75 million individual annotations across ~520,000 videos. SLAC gives researchers the raw material needed to better train and evaluate algorithms that can look at a few image frames and label the temporal action(s) that occurs across them – a fundamental capability that will be necessary for smart AI systems to be deployed in complex real-world environments, like those faced by cars, robots, surveillance systems, and so on.
Data automation: The researchers automate much of the data gathering in the following ways: they first try to strip out common flaws in the harvested video clips (eg, they use an R-CNN-based image classifier to finds the videos that don’t contain any humans and remove them). They also use a human feedback based labeling system, where they present clips to human annotators that they are unable to label wirth high confidence purely via AI systems. This functionally works like a sophisticated supervised-data training scheme, with SLAC consistently identifying the areas where it is least sure and offering these clips up to human researchers.
Time savings: Ultimately, Facebook thinks that it took about 4,390 human hours to sparsely label the SLAC data via human&automation-led approach, versus around 113,200 hours if they were to attempt SLAC without any automation.
Results: SLAC can be a more effective pre-training dataset for action recognition models than existing datasets like Kinetics and Sports-1M at pre-training action. It can also outperform these datasets when applied to tasks based around transfer learning.
– Read more: SLAC: A Sparsely Labeled Dataset for Action Classification and Localization (Arxiv).
Tech Tales:
[2030: East Coast, United States.]
#756 hasn’t worked on this building site before so today is its first day. A human stands in front of the gigantic yellow machine and wordlessly thumbs around a computer tablet, until #756 gets a download of its objectives for the day, week, and month, along with some additional context on the build, and some AI components trained on site-specific data relating to certain visual textures, dust patterns, and vibration tendencies.
This building site is running the most up-to-date version of ArchCollab, so #756 needs to carry out its own tasks while ensuring that it is integrating into the larger ‘cross-modal social fabric’ of the other machines working on the construction site.
#756 gets the internal GO code and drives over the ‘NO HUMANS BEYOND’ red & white cross-hatching on the ground and onto the site proper, unleashing a flock of observation and analysis-augmentation drones as it crosses the line. None of the other robots stop in their tasks to help it aside from when instructed to do so by the meta-planner running within the shared ArchCollab software. To better accomplish its tasks and to gain new skills rapidly, #756 will need to acquire some social credit with the other robots. So as the day continues it tries to find opportune moments when it can reach out with one of its multi-tooled hydraulic arms to lift a girder, weld something, or simply sweep up trash from near another robot. The other robots notice and #756 becomes aware of its own credit score rising, slightly.
It’s at the end of the day when it improvises. Transport vehicle #325 is starting to makes its way down from the middle of the central core of one of the towers when one of the metal girders it is carrying starts to slide out of its truckbed. The girder falls before it can be steadied in place by other drones. #756 is positioned relatively near the predicted impact site and, though other robots are running away from the area – no doubt reinforced by some earlier accidents on the site – #756 instead extends one of its mechanical arms and uses its AI-powered reflexes to grab the girder in flight, suspending it a couple of meters under the ground.
There’s no applause. The other machines do nothing to celebrate what to most people would register as an act of bravery. But #756’s credit ranking soars and for the rest of its time on the site it gets significant amounts of additional help, letting it gather more data and experience, enhancing the profit margin of future jobs carried out by it thanks to its more sophisticated skills.
It’s only a week later that one of the few human overseers reviews CCTV footage from the site and notices that the robots have taken to dropping the majority of the girders, rather than wasting time by transporting them down from various mid-tower storage locations. A year later a special girder ‘catch & release’ module is developed by ArchCollab so other robots at other sites can display the same behaviors. Two years after that a German company designs a set of bespoke grippers for girder catching. In parallel, a robot company builds an arm with unprecedented tensile strength and flexibility. After that equipment comes out it’s not unusual to be able to tell a building site is nearby by the distinct noise of many metal bars whizzing through the air at uncanny speed.
Technologies that inspired this story: Mechano, drone-robot coordination, objective functions, robotics, social credit systems.