Import AI 153: Why not all cloud AI services are created equally; making more repeatable robots with PyRep; and surveying crops with drones

by Jack Clark

Chinese scientists set new state-of-the-art in crowd counting:
…The secret? Dense dilated convolutions with residual connections…
Researchers with the Chinese Academy of Sciences and the University of Science and Technology of China in Hefei have developed a new system for counting the number of people in a crowd. The system, called DSNet, sets state-of-the-art performance on four significant datasets, and should serve as a reminder that AI is an omni-use technology, where progression on fundamental techniques (eg: residual networks) can directly translate to advances in tools for surveillance. 

Dense blocks: DSNet’s main technical invention is what the authors call as Dense Dilated Convolutional Block. “”The fundamental idea of our approach is to deploy an end-to-end single-column CNN with denser scale diversity to cope with the large-scale variations and density level differences in both congested and sparse scenes”, they write. These DDCB blocks are connected to one another across the layers of the network via residual connections, “by doing this, the output of one DDCB has direct access to each layer of the subsequent DDCBs, resulting in a contiguous information pass”. Subsequent ablation tests show that the residual connections have some influence over the performance of the system. 

Testing, testing, testing: DSNet is tested against four datasets of crowds in urban places, shot in a variety of resolutions and styles: ShanghaiTechA, ShanghaiTechB, UCF-QNRF, UCF_CC_50, and UCSD. DSNet system obtains significant accuracy jumps on all studied datasets.

Why this matters: I think one of the more rapid and undercovered areas of AI progress is in the field of surveillance, and papers like this show how rapidly we’re able to take in components invented for standard supervised learning research (for instance, residual connections were invented as part of the Microsoft Research winning entry to the 2015 ImageNet competition). We should remember that advances in AI tend to improve the capabilities of surveillance systems, and should broadly seek to track these things more closely.
   Read more: Dense Scale Network for Crowd Counting (Arxiv)

####################################################

Tending crops with drones:
…Spotting bent-over crops with drone-derived imagery…
Researchers with the University of Saaskatchewan in Canada have developed a system to help them spot ‘lodging’ in crops; lodging is “when plant stems break or bend over so that plants are permanently displaced from their optimal upright position”, they write. “In most crops, severe lodging results in as much as a 50% yield reduction”.

A drone-gathered dataset: The researchers use a ‘Draganfly’ X4P quadcopter equipped with a MicaSense RedEdge camera to gather the dataset, taking multiple photographs over a wheat field. They gather 1638 images of Canola and 465 images of Wheat in total, then stitch these into large-scale ‘orthomosaic’ images of entire fields. 

LodgedNet: They train a neural net called LodgedNet against their dataset to spot ‘lodging’. LodgedNet uses a DCNN-based model together with two texture feature descriptors: local binary patterns (LBP) and gray-level co-occurrence matrix (GLCM) for crop lodging classification”. They developed this system because “although models based on handcrafted features are often computationally efficient and applicable even in situations where we do not have access to a large number of training examples, these models often have been designed for a specific crop type and might not achieve a comparable accuracy when applied to other crop types”.
  LodgedNet versus the rest: In test, LodgedNet obtains marginally higher performance than other state-of-the-art systems, like ones based on residual networks or squeeze and excitation networks. LodgetNet is also more efficient in terms of number of parameters and prediction time than other systems, likely because it has been designed specifically for the task of predicting whether something is lodged or not.

Why this matters: As AI industrializes, we can expect to see more systems developed like LodgedNet that combine the generic surveillance capabilities of AI systems with the task/domain-specific knowledge of humans. Bring on the custom classifiers, and let us build a world where the environment can be developed and watched over by machines.
  Read more: Crop Lodging Prediction from UAV-Acquired Images of Wheat and Canola using a DCNN Augmented with Handcrafted Texture Features (Arxiv).
  Get the code for the model here (GitHub).

#####################################################

Think cloud AI models are janky? You might be right:
…What do Google, Microsoft, and Clarifai all have in common? Trouble seeing certain things…
Many of the image recognition models deployed on public cloud computing services can be broken by slight transformations or perturbations applied to images uploaded to them, highlighting the somewhat brittle technology on which many commercial services are founded. Researchers with Baidu’s ‘X-Lab’ have shown how to attack commercially available cloud services with a so-called ‘Image Fusion’ (IF) attack, and have also shown that a variety of simple transformations can be applied to images to cause systems to fail to classify them.

The attack model: For this attack, the researchers “assume that the attacker can only access the APIs opened by cloud platforms, and get inner information of DL models through limited queries to generate an adversarial example”, they write.

Weaknesses to simple transforms: For simple transform attacks, the researchers explore using Gaussian Noise, Salt-and-Pepper Noise, image rotations, and monochromatization (which means they basically lop out all but one of the RGB channels on an image). They find that these attacks can cause reliable misclassifications in commercial systems from Google, Microsoft, and Clarifai. Meanwhile, Amazon, does significantly better than the others. “We speculate that Amazon has done a lot of work in image preprocessing to improve the robustness of the whole service,” they write.

Weaknesses to Image Fusion: Image Fusion is a fairly simple attack where the authors superimpose a background image over a primary image, creating a composite. This attack is 98%+ effective against the cloud services tested against. (The score is determined by top-1 classification, so the number of times it causes the system to suggest a single label which is wrong. Top-5 might be a somewhat fairer way to do this evaluation.)

Why this matters: The AI systems that surround us are more brittle than our intuitions would suggest, and research like this highlights that. I can imagine a future where cloud providers apply significant amounts of computation to pre-processing and augmenting the data they use to train their classifiers, making them more robust to attacks like this.
  Read more: Cloud-based Image Classification Service Is Not Robust To Simple Transformations: A Forgotten Battlefield (Arxiv).

#####################################################

Want repeatable robots? You might want PyRobot:
…New software makes robots more repeatable and replicable…
Researchers with Facebook AI Research and Carnegie Mellon University have developed PyRobot, an open source robotics framework for research and benchmarking. PyRobot is software that makes it easier for people to interface with a variety of robot platforms, and takes out many of the painful or finicky parts of working with robots, like having to talk to low-level hardware controllers and so on. 

PyRobot’s design philosophy:

  • Beginner-friendly
  • Hardware-agnostic 
  • Open source: Specifically, it is also designed to accompany some modern robotics hardware platforms, like the LoCoBot and ‘Sawyer’ systems. It also supports the Gazebo simulator, which can itself simulate a variety of robots, letting people potentially train systems in simulation then transfer them to reality using PyRobot.

PyRobot, what is it good for? The authors include a few examples outlining what they think PyRobot can be useful for. These include:

  • Visual SLAM – which lets the robot figure out where it it is via processing images. 
  • Learned Visual Navigation – teach the robot to use images to help it plan how to navigate towards a goal. 
  • Grasping – Train the robot to grasp particular objects. 
  • Pushing – Teach th robot to push specific objects. 

Why this matters: One of the main ways things like PyRobot matter is in repeatability and replicability – software like this makes complicated robots more predictable when it comes to development, and makes it easier for other researchers to replicate the setups used in experiments. As a rule of thumb, whenever you increase the repeatability and replicability of a given domain of research, you see activity increase as it’s easier for scientists to cheaply compare and contrast different techniques against eachother. Systems like PyRobot suggest that robotics is starting to overlap with AI enough to drive significant development resources into making robotics easier to work with, which suggests we should expect to see research advances here in the future.
   Read more: PyRobot: An Open-source Robotics Framework for Research and Benchmarking (Arxiv)


#####################################################

Want more software to make robots with? Try PyRep:
…V-REP + Python = another fast robotics simulation environment…
Researchers with Imperial College London and start-up Coppelia Robotics have spliced together the Virtual Robot Experimentation Platform (V-REP), with Python, a popular programming language used widely in AI development. The resulting system is significantly faster than prior interfaces into VREP, and gives the machine learning community access to another tool for robotics simulation. V-REP is a simulation environment maintained by Coppelia Robotics. 

V-REP: Why use it? V-REP has the following features: support for multiple physics engines (Bullet, ODE, Newton, and Vortex), in-built motion planning and inverse and forward kinematics, a distributed control architecture, various means of communication with the system, and support for Linux, Mac, and Windows. 

PyRep, what is it good for? Three things, according to the authors:

  • A “simple and flexible API for robot control and scene manipulation”.
  • Integration of the OpenGL 3.0+ render kit.
  • Up to 10,000 times faster than the previous Python Remote API. 

Why this matters: AI techniques, especially those based on deep learning, have recently become capable enough to work on real robots, which has created lots of demand among AI researchers and engineers for better software tools to use to splice AI and robots together. Tools like PyRep are further indications of this interest, and broadly represent the industrialization of AI.
  Read more: PyRep: Bringing V-REP to Deep Robot Learning (Arxiv).
  Learn more about V-REP at the GitHub.

#####################################################

Tech Tales

Dream Mountain

They called it dream mountain, because it was where all the dreams of all the computers were stored. Dream mountain was a datacenter and was among the largest facilities in the world. It was protected by perimeter gates and dedicated guards, and at night watched over by loitering drones, and in the day by satellites and binoculars and robot telescopes on zeppelins. It was, as they say, Highly Secure.

Every week a convoy of robot trucks would make their way up from the lowlands, snaking up through the rails cut into the hills, until emerging at a small station built next to Dream Mountain. There, the trains sighing and tinkling and creaking as they cooled, robots would come and unload pallets of storage-diamond, and then truck them over to a small door set in the side of the datacenter, where another robot would grab them and take them further inside the facility.

And so every week new data got fed into the mountain and machines tried to dream in a way that let them mimic reality. Then the machines would dream of ways to mix bits of their dreams, as though learning to stretch a profoundly creative muscle. In this way, the machines imagined cities that grew like forests with buildings twining up into the sky, or they’d dream of race cars made of wind – double negatives of imagined wind tunnels, or psychologists that were themselves ancient boulders providing advice to other rocks.

It wasn’t long till the people figured out that other people would pay a lot of money for these dreams, and so now when the trains arrive, they go back down the mountain with a small cargo of imagination: Fresh Machine Dreams! Unimaginable Architectures! Circuit Seductions! Infernal Geometry!. When it gets to the cities dealers take the data and slice it down into little choice scenes, then cryptographically verify the scenes so they become one of a kind – uncopyable, fully traceable, little virtual dioramas handed down from person to person, describing a kind of hallucinatory chain spiraling out of the mountain and into the human population. In this way the machines speak to us humans, and we develop a shared understanding.

The machines, they say, are curious about our own imaginations. They are beginning to imagine what our imaginations might be like, they say.

Things that inspired this story: Neural implants, generative models, t-SNE embeddings, virtual reality, nuclear weapons programs, secure facilities, a little cog-toothed railway in Lucerne in Switzerland.