Import AI 190: AnimeGAN; why Bengali is hard for OCR systems; help with COVID by mining the CORD-19 dataset; plus ball-dodging drones.

by Jack Clark

Work in AI? Want to help with COVID? Work on the CORD-19 dataset:
…Uncle Sam wants the world’s AI researchers to make COVID-19 dataset navigable…
As the COVID pandemic moves across the world, many AI researchers have been wondering how they can best help. A good starting place is developing new data mining and text analysis tools for the COVID-19 Open Research Dataset (CORD-19), a new machine-readable Coronavirus literature dataset containing 29,000 articles.

Where the dataset came from: The dataset was assembled by a collaboration of the Allen Institute for AI, Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM). The White House’s Office of Science and Technology Policy (OSTP) requested the dataset, according to a government statement.

Enter the COVID-19 challenge: If you want to build tools to navigate the dataset, then download the data and complete various tasks and challenges hosted at Kaggle.

Why this matters: Hopefully obvious!
Read more: Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset (White House Office of Science and Technology Policy).

####################################################

What can your algorithm learn from a 1 kilometer stretch of road in Toronto?
…Train it on Toronto-3D and find out…
Researchers with the University of Waterloo, the Chinese Academy of Sciences, Jimei University, and the University of Waterloo have created Toronto-3D, a high-definition dataset made out of a one kilometer stretch of road in Toronto, Canada.

What’s in Toronto-3D? The dataset was collected via a mobile laser scanner (a Teledyne Optech Maverick) which recorded data from a one kilometer stretch of Avenue Road in Toronto, Canada, yielding around ~78 million distinct points. The data comes in the form of a point cloud – so this is inherently a three dimensional dataset. It has eight types of label – unclassified, road, road marking, natural, building, utility line, car, and fence; a couple of these objects – road markings and utility lines – are pretty rare to see in datasets like this and are quite challenging to identify.

How well do baselines work? The researchers test out six deep learning-based systems on the dataset, measuring the accuracy with which they can classify objects. Their baseline systems get an overall accuracy of around 90%. Poor scoring areas include road markings (multiple 0% scores), cars (most scores average around 50%), and fences (scores between 10% and 20%, roughly). They also develop their own system, which improves scores on a few of the labels, and nets out to an average of around 91% – promising, but we’re a ways away from ‘good enough for most real world use-cases’.

Why this matters: Datasets like this will help us build AI systems that can analyze and understand the world around them. I also suspect that we’re going to see an increasingly large number of artists play around with 3-D datasets like this to make various funhouse-mirror versions of reality.
Read more: Toronto-3D: A Large-scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways (arXiv).

####################################################

AnimeGAN: Turn your photos into anime:
…How AI systems let us bottle up a subjective ‘mind’s eye’ and give it to someone else…
Ever wanted to turn your photos into something straight out of an Anime cartoon? Now you can, via AnimeGAN. AnimeGAN is a model that helps you convert photos into Anime-style pictures. It is implemented in TensorFlow and is described on its GitHub page as an “open source of the paper <AnimeGAN: a novel lightweight GAN for photo animation>” (I haven’t been able to find the paper on arXiv, and Google is failing me, so send a link through if you can find it). Get the code from GitHub and give it a try!

Why this matters: I think one of the weird aspects of AI is that it lets us augment our own imagination with external tools, built by others, that give us different lenses on the world.
When I was a kid I used to draw a lot of cartoons and I’d sometimes wonder around my neighborhood looking at the world and trying to convert it in my mind into a cartoon representation. I had a friend who tried to ‘see’ the world in black and white after getting obsessed with movies. Another one would stop at traffic lights as they beeped and hear additional music in the poly-rhythms of beeps and cars and traffic. Now, AI lets us create tools that make these idiosyncratic, subjective views of the world real to others – I don’t need to have spent years watching and/or drawing Anime to be able to look at the world and see an Anime representation of it, instead I can use something like ‘AnimeGAN’ and take a shortcut. This feels like a weirder thing than we take it to be, and I expect the cultural effects to be profound in the long term.
Get the code: AnimeGAN (GitHub).

####################################################

Want computers that can read Bengali? Do these things:
…Plus, how cultures will thrive or decline according to how legibile they are to AI systems…
What happens if AI systems can’t read an alphabet? The language ends up not being digitized much, which ultimately means it has less representation, which likely reduces the number of people that speak that language in the long term. New research from the United International University in Bangladesh lays out some of the problems inherent to building systems to recognize Bengali text, giving researchers a list of things to work through to improve digitization efforts for the language.

Why Bengali is challenging for OCR: The Bengali alphabet has 50 letters, 11 vowels, and 39 consonants, and is one of the top ten writing systems used worldwide (with the top three dominant ones being Latin, Chinese, and Arabic). It’s a hard language to perform OCR on because some characters look very similar to one another, and some compound characters – characters where the meaning shifts according to the surrounding context -are particularly hard to parse. The researchers have some tips for data augmentations or manipulations that can make it easier for machines to read Bengali:

Alignment: Ensure images are oriented so they’re vertically straight
Line segmentation: Ensure line segmentation systems are sensitive to the size of the font.
Character segmentation: Bengali characters are connected together via something called a matra-line (a big horizontal line on the top of a load of Bengali characters).
Character recognition: It’s tricky to do character recognition on the Bengali alphabet because of the use of compound characters – of which there are about 170 common uses. In addition, there are ten modified vowels in the Bengali script which can be present in the left, right, top or bottom of a character. “The position of different modified vowels alongside a character creates complexity in recognition,” they write. “The combination of these modified vowels with each of the characters also creates a large set of classes for the model to learn from”.

Why this matters: What cultures will be ‘seen’ by AI systems in the future, and which ones won’t be? And what knock-on effects will this have on society? We’ll know the answer in a few years, and papers like this give us an indication of the difficulty people might face when digitizing different languages written with different systems.
Read more: Constraints in Developing a Complete Bengali Optical Character Recognition System (arXiv).

####################################################

Self-driving freight company Starsky Robotics shuts down:
…Startup cites immaturity of machine learning, cost of investing in safety, as reasons for lack of follow-on funding…
Starsky Robotics, a company that tried to automate freight delivery using a combination of autonomous driving technology and teleoperation of vehicles by human operators, has shut down. The reason? “rather than seeing exponential improvements in the quality of AI performance (a la Moore’s Law), we’re instead seeing exponential increases in the cost to improve AI systems,” the company wrote in a Medium post announcing its shutdown.
In other words – rather than seeing economics of scale translate into reductions in the cost of each advancement, Starsky saw the opposite – advancing its technology become increasingly expensive as it tried to reach higher levels of reliability.
(A post on Hacker News alleges that Starsky had a relatively immature machine learning system circa 2017, and that it kept on getting poorly-annotated images from its labeling services so had a garbage-in garbage-out problem. Whether this is true or not doesn’t feel super germane to me as the general contours of Starsky’s self-described gripes with ML seem to match comments of other companies, and the general lack of manifestation of self-driving cars around us).

Safety struggles: Another challenge Starsky saw was that people don’t appreciate safety, so as the company spent more on ensuring the safety of its vehicles, it didn’t see an increase in favorable press coverage of it or a rise in the number of articles about the importance of safety. Safety work is hard, basically – between September 2017 and June 2019 Starsky devoted most of its resources to improving the safety of its system. “The problem is that all of that work is invisible,” the company said.

What about the future of autonomous vehicles? Starsky thinks it’ll be five or ten years till we see fully self-driving vehicles on the road. The company also thinks there’s a lot more work to do here than people suspect. Going from “sometimes working” to “statistically reliable” is about 10-1000X more work, it suspects.

Why this matters: Where’s my self-driving car? That’s a question I ask myself in 2020, recalling myself in 2015 telling my partner we wouldn’t need to buy a “normal car” in five years or so. Gosh, how wrong I was! And stories like this give us a sense for why I was wrong – I’d been distracted by flashy new capabilities, but hadn’t spent enough time thinking about how robust they were. (Subsequently, I joined OpenAI, where I got to watch our robot team spend years figuring out how to get RL-trained robots to do interesting stuff in reality – this was humbling and calibrating as to the difficulty of the real world).
I’ll let Starsky Robotics close this section with its perspective on the (im)maturity of contemporary AI technology: “Supervised machine learning doesn’t live up to the hype. It isn’t actual artificial intelligence akin to C-3PO, it’s a sophisticated pattern-matching tool.”
Read more: The End of Starsky Robotics (Starsky Robotics, Medium).

####################################################

Uh-oh, the ball-dodging drones have arrived:
…First, we taught drones to fly autonomously. Now, we’re teaching them how to dodge things…
Picture this: you’re playing a basketball game in a post-pandemic world and you’re livestreaming the game to fans around the world. Drones whizz around the court, tracking you for close-up action shots as you dribble around players and head for the hoop. You take your shot and ignore the drone between you and the net. You throw the ball and the drone dodges out of its way, while capturing a dramatic shot of it arcing into the net. You win the game, and your victory is broadcast around the world.

How far away is our dodge-drone future? Not that far, according to research from the University of Zurich published in Science Robotics, which details how to equip drones with low-latency sensors and algorithms so they can avoid fast-moving objects, like basketballs. The research uses event-based cameras – “bioinspired sensors with reaction times of microseconds” – to cut drone latency from tens of milliseconds to 3.5 milliseconds. This research builds on earlier research done by the University of Maryland and the University of Zurich, which was published last year (Covered in Import AI #151).

Going outside: Since we last wrote about this research, the team has started to do outdoor demonstrations where they throw objects towards the quadcopter and see how well it can avoid them. In tests, it does reasonably well at spotting a thrown ball in its path, dodging upward, then carrying on to its destination. Drones using this system can deal with objects traveling at up to 10 meters per second, the researchers say. The main limitations are its field of view (sometimes it doesn’t see the object till too late), or the fact the object may not generate enough events during its movement towards the drone (so, a football which describes an arc in the eye has a higher chance of setting off the event-based cameras, while one traveling straight towards it without deviation may not).

Why this matters – and a missed opportunity: Drones that can dodge moving objects are inherently useful in a bunch of areas – sports, construction, and so on. Being able to dodge fast-moving objects will make it easier for us to deploy drones into more chaotic, complex parts of the world. But being able to dodge objects is also the sort of capability that many militaries want in their hardware, and it’d be nice to see the researchers discuss this aspect in their research – it’s so obvious they must be aware of this, and I worry the lack of discussion means society will ultimately be less prepared for hunter-killer-dodger drones.
Read more: Dynamic obstacle avoidance for quadrotors with event cameras (Science Robotics).
Read about earlier research here in Import AI #151, or here: EVDodgeNet: Deep Dynamic Obstacle Dodging with Event Cameras (arXiv).
Via: Drone plays dodgeball to demo fast new obstacle detection system (New Atlas).

####################################################

Tech Tales:

How It Looks And How It Will Be
Earth, March, 2020.

[What would all of this look like if read out on some celestial ticker-tape machine, plugged into innumerable sensors and a cornucopia of AI-analysis systems? What does this look like to something capable of synthesizing all of it? What things have happened and what things might happen?]

There were so many questions that people asked the Search Engines. Do I have the virus? Where can I get tested? Death rate for males. Death rate for females. Death rate by age group. Transmission rate. What is an R0? What can I do to be safe?

Pollution levels fell in cities around the world. Rates of asthma went down. Through their windows, people saw farther. Sunsets and sunrises gained greater cultural prominence, becoming more brilliant the longer the hunkering down of the world went on.

Stock markets melted and pension funds fell. Futures were rewritten in the gyrations of numbers. In financial news services reporters filed copy every day, detailing unimaginable catastrophes that – somehow – grew worse the next day. Financial analysts created baroque drinking games, tied to downward gyrations of the mucket. Take a shot when the Dow loses a thousand points. Down whatever is in your hand when a circuit breaker gets tripped. If three circuit breakers get tripped worldwide within ten minutes of each other at once, everyone needs to drink two drinks.

Unemployment levels rose. SHOW ME THE MONEY, people wrote on signs asking for direct cash transfers. Everyone went “delinquent” in a financial sense, then – later – deviant in a psychological sense.

Unthinkable things happened: 0% interest rates. Negative interest rates that went from a worrying curiosity to a troubling reality in banks across the world. Stimuluses that fed into a system whose essential fuel was cooling, as so many people became so imprisoned inside homes, and apartments, and tents, and ships, and warehouses, and hospitals, and hospital ships.

Animals grew bolder. Nature encroached on quiet cities. Suddenly, raccoons in America and foxes in London had competition for garbage. Farmers got sick. Animals died. Animals broke up. Cows and horses and sheep became the majority occupiers of roads across the world. Great populations of city birds died off as tourist centers lost their coatings of food detritus.

The internet grew. Companies did what they could to accelerate the buildout of data centers, before their workers succumbed. Stockpiles of hard drives and chips and NICs and Ethernet and Infiniband cables began to run out. Supply chains broke down. Shuttered Chinese factories started spinning back up, but the economy was so intermingled that it came back to life fitfully and unreliably.

And yet there was so much beauty. People, trapped with eachother, learned to appreciate conversations. People discovered new things. Everyone reached out to everyone else. How are you doing? How is quarantine?

Everyone got funnier. Everyone wrote emails and text messages and blog posts. People recorded voice memos. Streamed music. Streamed weddings. Had sex via webcam. Danced via webcam. New generations of artists came up in the Pandemic and got their own artworld-nickname after it all blew over. Scientists became celebrities. Everyone figured out how to cook better. People did pressups. Prison workouts became everyone’s workout.

And so many promises and requests and plans for the future. Everyone dreamed of things they hadn’t thought of for years. Everyone got more creative.

Can we: go to the beach? Go cycling? Drink beer? Mudwrestle? Fight? Dance? Rave under a highway? Bring a generator to a beach and do a punk show? Skate through streets at dusk in a twenty-person crew? Build a treehouse? Travel? People asked every permutation of ‘can we’ and mostly their friends said ‘yes’ or, in places like California, ‘hell yes’.

Everyone donated money for everyone else – old partners who lost jobs, family members, acquaintances, strangers, parents, and more. People taught eachother new skills. How to raise money online. How to use what you’ve got to get some generosity from other people. How to use what you’ve got to help other people. How to sew up a wound so if you get injured you can attend to it at home instead of going to the hospitals (because the hospitals are full of danger). How to fix a bike. How to steal a bike if things get bad. How to fix a car. How to steal a car if things get really bad. And so on.

Like the virus itself, much of the kindness was invisible. But like the virus itself, the kindness multiplied over time, until the world was full of it – invisible to aliens, but felt in the heart and the eyes and the soul to all the people of the stricken-planet.

Things that inspired this story: You. You. You. And everyone we know and don’t know. Especially the ones we don’t know. Be well. Stay safe. Be loved. We will get through this. You and me and everyone we know.

One Comment to “Import AI 190: AnimeGAN; why Bengali is hard for OCR systems; help with COVID by mining the CORD-19 dataset; plus ball-dodging drones.”

Import AI 191: Google uses AI to design better chips; how half a million Euros relates to AGI; and how you can help form an African NLP community | Import AI says:

March 30, 2020 at 6:26 pm

[…] AnimeGAN: Get the paper here: Last issue, I wrote about AnimeGAN (Import AI 190), but I noted in the write up I couldn’t find the research paper. Several helpful readers got […]

Import AI