Recovering data: NIST's neural network model finds small objects in dense images

August 04, 2020

In efforts to automatically capture important data from scientific papers, computer scientists at the National Institute of Standards and Technology (NIST) have developed a method that can accurately detect small, geometric objects such as triangles within dense, low-quality plots contained in image data. Employing a neural network approach designed to detect patterns, the NIST model has many possible applications in modern life.

NIST's neural network model captured 97% of objects in a defined set of test images, locating the objects' centers to within a few pixels of manually selected locations.

"The purpose of the project was to recover the lost data in journal articles," NIST computer scientist Adele Peskin explained. "But the study of small, dense object detection has a lot of other applications. Object detection is used in a wide range of image analyses, self-driving cars, machine inspections, and so on, for which small, dense objects are particularly hard to locate and separate."

The researchers took the data from journal articles dating as far back as the early 1900s in a database of metallic properties at NIST's Thermodynamics Research Center (TRC). Often the results were presented only in graphical format, sometimes drawn by hand and degraded by scanning or photocopying. The researchers wanted to extract the locations of data points to recover the original, raw data for additional analysis. Until now such data have been extracted manually.

The images present data points with a variety of different markers, mainly circles, triangles, and squares, both filled and open, of varying size and clarity. Such geometrical markers are often used to label data in a scientific graph. Text, numbers and other symbols, which can falsely appear to be data points, were manually removed from a subset of the figures with graphics editing software before training the neural networks.

Accurately detecting and localizing the data markers was a challenge for several reasons. The markers are inconsistent in clarity and exact shape; they may be open or filled and are sometimes fuzzy or distorted. Some circles appear extremely circular, for example, whereas others do not have enough pixels to fully define their shape. In addition, many images contain very dense patches of overlapping circles, squares, and triangles.

The researchers sought to create a network model that identified plot points at least as accurately as manual detection--within 5 pixels of the actual location on a plot size of several thousand pixels per side.

As described in a new journal paper, NIST researchers adopted a network architecture originally developed by German researchers for analyzing biomedical images, called U-Net. First the image dimensions are contracted to reduce spatial information, and then layers of feature and context information are added to build up precise, high-resolution results.

To help train the network to classify marker shapes and locate their centers, the researchers experimented with four ways of marking the training data with masks, using different-sized center markings and outlines for each geometric object.

The researchers found that adding more information to the masks, such as thicker outlines, increased the accuracy of classifying object shapes but reduced the accuracy of pinpointing their locations on the plots. In the end, the researchers combined the best aspects of several models to get the best classification and smallest location errors. Altering the masks turned out to be the best way to improve network performance, more effective than other approaches such as small changes at the end of the network.

The network's best performance--an accuracy of 97% in locating object centers--was possible only for a subset of images in which plot points were originally represented by very clear circles, triangles, and squares. The performance is good enough for the TRC to use the neural network to recover data from plots in newer journal papers.

Although NIST researchers currently have no plans for follow-up studies, the neural network model "absolutely" could be applied to other image analysis problems, Peskin said.
Paper: A. Peskin, B. Wilthan, and M. Majurski. Detection of Dense, Overlapping, Geometric ObjectsInternational Journal of Artificial Intelligence and Applications. Posted online Aug. 4, 2020.

National Institute of Standards and Technology (NIST)

Related Data Articles from Brightsurf:

Keep the data coming
A continuous data supply ensures data-intensive simulations can run at maximum speed.

Astronomers are bulging with data
For the first time, over 250 million stars in our galaxy's bulge have been surveyed in near-ultraviolet, optical, and near-infrared light, opening the door for astronomers to reexamine key questions about the Milky Way's formation and history.

Novel method for measuring spatial dependencies turns less data into more data
Researcher makes 'little data' act big through, the application of mathematical techniques normally used for time-series, to spatial processes.

Ups and downs in COVID-19 data may be caused by data reporting practices
As data accumulates on COVID-19 cases and deaths, researchers have observed patterns of peaks and valleys that repeat on a near-weekly basis.

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.

Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.

Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.

Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.

Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.

Read More: Data News and Data Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to