Big data used to predict the future

November 09, 2018

Technology is taking giant leaps and bounds, and with it, the information with which society operates daily. Nevertheless, the volume of data needs to be organized, analyzed and crossed to predict certain patterns. This is one of the main functions of what is known as 'Big Data', the 21st century crystal ball capable of predicting the response to a specific medical treatment, the workings of a smart building and even the behavior of the Sun based on certain variables.

Researcher in the KIDS research group from the University of Cordoba's Department of Computer Science and Numerical Analysis were able to improve the models that predict several variables simultaneously based on the same set of input variables, thus reducing the size of data necessary for the forecast to be exact. One example of this is a method that predicts several parameters related to soil quality based on a set of variables such as crops planted, tillage and the use of pesticides.

"When you are dealing with a large volume of data, there are two solutions. You either increase computer performance, which is very expensive, or you reduce the quantity of information needed for the process to be done properly," says researcher Sebastian Ventura, one of the authors of the research article.

When building a predictive model there are two issues that need to be dealt with: the number of variables that come into play and the number of examples entered into the system for the most reliable results. With the idea that less is more, the study has been able to reduce the number of examples, by eliminating those that are redundant or "noisy," and that therefore do not contribute any useful information for the creation of a better predictive model.

As Oscar Reyes, the lead author of the research, points out "we have developed a technique that can tell you which set of examples you need so that the forecast is not only reliable but could even be better." In some databases, of the 18 that were analyzed, they were able to reduce the amount of information by 80% without affecting the predictive performance, meaning that less than half the original data was used. All of this, says Reyes, "means saving energy and money in the building of a model, as less computing power is required." In addition, it also means saving time, which is interesting for applications that work in real-time, since "it doesn't make sense for a model to take half an hour to run if you need a prediction every five minutes."

As pointed out by the authors of the research, these systems that predict several variables simultaneously (which could be related to one another), based on several variables -known as multi-output regression models,- are gaining more notable importance due to the wide range of applications that "could be analyzed under this paradigm of automatic learning," such as for example those related to healthcare, water quality, cooling systems for buildings and environmental studies.

Reyes, O; Fardoun, HM; Ventura, S. An ensemble-based method for the selection of instances in the multi-target regression problem. INTEGRATED COMPUTER-AIDED ENGINEERING. Vol. 25, no. 4, pp. 305-320, 2018. 5 September 2018. DOI: 10.3233/ICA-180581

University of Córdoba

Related Data Articles from Brightsurf:

Keep the data coming
A continuous data supply ensures data-intensive simulations can run at maximum speed.

Astronomers are bulging with data
For the first time, over 250 million stars in our galaxy's bulge have been surveyed in near-ultraviolet, optical, and near-infrared light, opening the door for astronomers to reexamine key questions about the Milky Way's formation and history.

Novel method for measuring spatial dependencies turns less data into more data
Researcher makes 'little data' act big through, the application of mathematical techniques normally used for time-series, to spatial processes.

Ups and downs in COVID-19 data may be caused by data reporting practices
As data accumulates on COVID-19 cases and deaths, researchers have observed patterns of peaks and valleys that repeat on a near-weekly basis.

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.

Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.

Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.

Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.

Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.

Read More: Data News and Data Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to