Science Current Events | Science News | Brightsurf.com
 
Email a Friend Send to a friend
Printer Friendly Print Statistical Analysis of Complex data sets with Robust Statistical methods

Statistical Analysis of Complex data sets with Robust Statistical methods

April 12, 2007

Robust statistical analysis methods capable of dealing with large complex data sets are required more than ever before in almost all branches of science. The European Science Foundation's three-year SACD network, which was completed in December 2006, developed new methods for extracting key structural features within the data. Such features can include outlying values that may be particularly significant within the increasingly large and complex data sets generated in financial markets, medical diagnostics, environmental surveys, and other sources.

"Outliers often indicate the most interesting data points, like polluted areas for environmental data, or irregularities in online monitoring of patients," said SACD chair Christophe Croux. On this front the programme has almost completely achieved its objectives, according to Croux. "A lot of work has been done in developing new methods, especially for analyzing large data sets, that can cope with outlying atypical values. This resulted in a number of publications related to the subject of the network".




Particular progress has been made detecting outliers in multivariate time series, Croux added. This is a significant development for a number of analysis and monitoring applications involving measurements of different but related quantities that vary over time. Among many such applications are: monitoring of telecommunication networks to assess how performance and reliability are affected by events such as upgrades, surges in demand, and local link failures; monitoring noise in the vicinity of an airport; modeling the behaviour of financial markets in response to geopolitical events; and tracking the condition of patients in intensive care via several measurements such as pulse rate, blood pressure, lung water etc.

Without robust analysis methods it is easy to miss significant outlyers in such multivariate data. In some cases the outlyers only show up clearly when considering all the variables together, and yet may indicate something significant that could easily be missed, such as a sudden deterioration in a critical patient's condition.

SACD has also advanced the field of chemometrics, which is the application of multivariate analysis methods to data of chemical interest, with some of the developments now implemented in software written by members of the network. The same principles have been applied to analysis of risks of stock investments, and measuring volatility of financial markets.

In some cases it is desirable to eliminate outlyers from data sets in order to identify the most likely response of a particular variable to different events. Within SACD, a method was developed to do this for analysis of the relationship between various economic parameters and the yield of stocks. For this it is necessary to concentrate on the bulk of the data rather than the exceptions or outlyers. "In order to do so we have to identify these extreme observations in order to downweight or reject them from the computations," said Croux. When there are multiple variables this is more difficult, and one of the major achievements of SACD has been to find new ways of condensing and summarizing the data in such a way that the main structure of the data can be retrieved, making it also easier to detect the outliers.

Croux admits there is more work to be done, particularly in dealing with highly complex data sets, and with problems involving many variables and small sample sizes. "Important steps to be taken include robust methods that can deal with categorical data and missing values."

SACD has laid the ground for progress in all these areas, having stimulated interest within its three workshops by presenting data from real research in progress, rather than artificial samples. "We got much more interest than expected from colleagues in other fields. This was due to the interesting network workshops, where cutting edge scientific research was presented," said Croux.

Most important of all, presented material in the workshops held in 2004, 2005 and 2006 was often work in progress, leading to exchange of ideas and the initiation of joint research projects among the partners. In this way the work of SACD will continue and expand after the Network itself is over.

The European Science Foundation is an association of 75 member organisations devoted to scientific research in 30 European countries. Since its inception in 1974, it has coordinated a wide range of pan-European scientific initiatives.

European Science Foundation



Related Statistical Analysis Current Events and Statistical Analysis News Articles Statistical Analysis Current Events and Statistical Analysis News RSS Statistical Analysis Current Events and Statistical Analysis News RSS
Sea level rise alters bay's salinity
While global-warming-induced coastal flooding moves populations inland, the changes in sea level will affect the salinity of estuaries, which influences aquatic life, fishing and recreation.

Novel imaging technique reveals brain abnormalities that may play key role in ADHD
A study published today in the online advance edition of The American Journal of Psychiatry for the first time reveals shape differences in the brains of children with ADHD, which could help pinpoint the specific neural circuits involved in the disorder.

Global warming link to amphibian declines in doubt
Evidence that global warming is causing the worldwide declines of amphibians may not be as conclusive as previously thought, according to biologists. The findings, which contradict two widely held views, could help reveal what is killing the frogs and toads and aid in their conservation.

Fibromyalgia can no longer be called the 'invisible' syndrome
Using single photon emission computed tomography (SPECT), researchers in France were able to detect functional abnormalities in certain regions in the brains of patients diagnosed with fibromyalgia, reinforcing the idea that symptoms of the disorder are related to a dysfunction in those parts of the brain where pain is processed.

Warming in Yosemite National Park sends small mammals packing to higher, cooler elevations
Global warming is causing major shifts in the range of small mammals in Yosemite National Park, one of the nation's treasures that was set aside as a public trust 144 years ago, according to a new study by University of California, Berkeley, biologists.

Tamoxifen chemoprevention tied to early detection of breast cancer
The drug tamoxifen does not prevent or treat estrogen receptor (ER) negative breast cancer, but it can make the disease easier to find, researchers report in the Oct. 1 Journal of the National Cancer Institute.

Method of predicting clear air turbulence could make flights smoother in the future
It comes blasting out of the blue on your airplane flight: sudden bumpiness and sometimes even a violent plummeting. It arrives without warning, and it can be more than frightening, since it causes tens of millions of dollars in injury claims every year.

Racial disparities decline for cancer in Missouri
Cancer death rates in the United States are highest among African Americans, but a new report shows that in Missouri the disparity in cancer incidence and death between African Americans and whites is declining.

Brown-Assisted Trial Finds New Colorectal Screening Procedure Is Accurate and Less Invasive
More patients stand to benefit from a comprehensive, less invasive method to accurately detect colorectal cancer and precancerous polyps, a multicenter study involving Brown University and institutions nationwide has found.

The prevalence of hepatitis B virus infection in inflammatory bowel disease patients
Patients with IBD have high risk of infection by hepatitis viruses B or C because during the course of their disease, they need blood transfusions, and sometimes surgical and endoscopic procedures for diagnosis and treatment.
More Statistical Analysis Current Events and Statistical Analysis News Articles


Statistical Power Analysis for the Behavioral Sciences (2nd Edition)
by Jacob Cohen

New York University, New York City. Statistical...



Applied Multivariate Statistical Analysis (6th Edition)
by Richard A. Johnson, Dean W. Wichern

  This market leader offers a readable introduction to the statistical analysis of multivariate observations. Gives readers the knowledge necessary to make proper interpretations and select appropriate techniques for analyzing multivariate data. Starts with a formulation of the population models, delineates the corresponding sample results, and liberally illustrates everything with examples....



An Introduction to Statistical Methods and Data Analysis
by R. Lyman Ott, Micheal T. Longnecker

Statistics is a thought process. In this comprehensive introduction to statistical methods and data analysis, the process is presented utilizing a four-step approach: 1) gathering data, 2) summarizing data, 3) analyzing data, and 4) communicating the results of data...



Statistical Methods for Spatial Data Analysis (Texts in Statistical Science Series)
by Oliver Schabenberger, Carol A. Gotway

Understanding spatial statistics requires tools from applied and mathematical statistics, linear model theory, regression, time series, and stochastic processes. It also requires a mindset that focuses on the unique characteristics of spatial data and the development of specialized analytical tools designed explicitly for spatial data analysis. Statistical Methods for Spatial Data Analysis...



Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests
by Kevin R. Murphy, Brett Myors

This book presents a simple and general method for conducting statistical power analysis based on the widely used F statistic. The book illustrates how these analyses work and how they can be applied to problems of studying design, to evaluate others' research, and to choose the appropriate criterion for defining "statistically significant" outcomes. Statistical Power Analysis examines the four...



The Statistical Analysis of Failure Time Data (Wiley Series in Probability and Statistics)
by John D. Kalbfleisch, Ross L. Prentice

* Contains additional discussion and examples on left truncation as well as material on more general censoring and truncation patterns. * Introduces the martingale and counting process formulation swil lbe in a new chapter. * Develops multivariate failure time data in a separate chapter and extends the material on Markov and semi Markov formulations. * Presents new examples and...



Longitudinal Data Analysis: A Handbook of Modern Statistical Methods (Handbooks of Modern Statistical Methods)

Although many books currently available describe statistical models and methods for analyzing longitudinal data, they do not highlight connections between various research threads in the statistical literature. Responding to this void, Longitudinal Data Analysis provides a clear, comprehensive, and unified overview of state-of-the-art theory and applications. It also focuses on the assorted...



Basic Statistical Analysis (8th Edition)
by Richard C. Sprinthall

The material in this user-friendly text is presented as simply as possible to ensure that students will gain a solid understanding of statistical procedures and analysis. The goal of this book is to demystify and present statistics in a clear, cohesive manner. The student is presented with rules of evidence and the logic behind those rules. The book is divided into three major units: Descriptive...



The Statistical Analysis of Recurrent Events (Statistics for Biology and Health)
by Richard J. Cook, Jerald F. Lawless

Recurrent event data arise in diverse fields such as medicine, public health, insurance, social science, economics, manufacturing and reliability. The purpose of this book is to present models and statistical methods for the analysis of recurrent event data. No single comprehensive treatment of these areas currently exists. The authors provide broad but detailed coverage of the major approaches...



Statistical Decision Theory and Bayesian Analysis (Springer Series in Statistics)
by James O. Berger

"The outstanding strengths of the book are its topic coverage, references, exposition, examples and problem sets... This book is an excellent addition to any mathematical statistician's library." -Bulletin of the American Mathematical Society In this new edition the author has added substantial material on Bayesian analysis, including lengthy new sections on such important topics as...

© 2008 BrightSurf.com