CMU method makes more data available for training self-driving cars

June 17, 2020

PITTSBURGH--For safety's sake, a self-driving car must accurately track the movement of pedestrians, bicycles and other vehicles around it. Training those tracking systems may now be more effective thanks to a new method developed at Carnegie Mellon University.

Generally speaking, the more road and traffic data available for training tracking systems, the better the results. And the CMU researchers have found a way to unlock a mountain of autonomous driving data for this purpose.

"Our method is much more robust than previous methods because we can train on much larger datasets," said Himangi Mittal, a research intern working with David Held, assistant professor in CMU's Robotics Institute.

Most autonomous vehicles navigate primarily based on a sensor called a lidar, a laser device that generates 3D information about the world surrounding the car. This 3D information isn't images, but a cloud of points. One way the vehicle makes sense of this data is by using a technique known as scene flow. This involves calculating the speed and trajectory of each 3D point. Groups of points moving together are interpreted via scene flow as vehicles, pedestrians or other moving objects.

In the past, state-of-the-art methods for training such a system have required the use of labeled datasets -- sensor data that has been annotated to track each 3D point over time. Manually labeling these datasets is laborious and expensive, so, not surprisingly, little labeled data exists. As a result, scene flow training is instead often performed with simulated data, which is less effective, and then fine-tuned with the small amount of labeled real-world data that exists.

Mittal, Held and robotics Ph.D. student Brian Okorn took a different approach, using unlabeled data to perform scene flow training. Because unlabeled data is relatively easy to generate by mounting a lidar on a car and driving around, there's no shortage of it.

The key to their approach was to develop a way for the system to detect its own errors in scene flow. At each instant, the system tries to predict where each 3D point is going and how fast it's moving. In the next instant, it measures the distance between the point's predicted location and the actual location of the point nearest that predicted location. This distance forms one type of error to be minimized.

The system then reverses the process, starting with the predicted point location and working backward to map back to where the point originated. At this point, it measures the distance between the predicted position and the actual origination point, and the resulting distance forms the second type of error.

The system then works to correct those errors.

"It turns out that to eliminate both of those errors, the system actually needs to learn to do the right thing, without ever being told what the right thing is," Held said.

As convoluted as that might sound, Okorn found that it worked well. The researchers calculated that scene flow accuracy using a training set of synthetic data was only 25%. When the synthetic data was fine-tuned with a small amount of real-world labeled data, the accuracy increased to 31%. When they added a large amount of unlabeled data to train the system using their approach, scene flow accuracy jumped to 46%.

The research team presented their method at the Computer Vision and Pattern Recognition (CVPR) conference, which was held virtually June 14-19. The CMU Argo AI Center for Autonomous Vehicle Research supported this research, with additional support from a NASA Space Technology Research Fellowship.

Carnegie Mellon University

Related Data Articles from Brightsurf:

Keep the data coming
A continuous data supply ensures data-intensive simulations can run at maximum speed.

Astronomers are bulging with data
For the first time, over 250 million stars in our galaxy's bulge have been surveyed in near-ultraviolet, optical, and near-infrared light, opening the door for astronomers to reexamine key questions about the Milky Way's formation and history.

Novel method for measuring spatial dependencies turns less data into more data
Researcher makes 'little data' act big through, the application of mathematical techniques normally used for time-series, to spatial processes.

Ups and downs in COVID-19 data may be caused by data reporting practices
As data accumulates on COVID-19 cases and deaths, researchers have observed patterns of peaks and valleys that repeat on a near-weekly basis.

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.

Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.

Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.

Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.

Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.

Read More: Data News and Data Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to