Nav: Home

Advances in bayesian methods for big data

May 31, 2017

In the Big Data era, many scientific and engineering domains are producing massive data streams, with petabyte and exabyte scales becoming increasingly common. Besides the explosive growth in volume, Big Data also has high velocity, high variety, and high uncertainty. These complex data streams require ever-increasing processing speeds, economical storage, and timely response for decision making in highly uncertain environments, and have raised various challenges to conventional data analysis.

With the primary goal of building intelligent systems that automatically improve from experiences, machine learning (ML) is becoming an increasingly important field to tackle the big data challenges, with an emerging field of Big Learning, which covers theories, algorithms and systems on addressing big data problems.

Bayesian methods have been widely used in machine learning and many other areas. However, skepticism often arises when we talking about Bayesian methods for Big Data. Practitioners also criticize that Bayesian methods are often too slow for even small-scaled problems, owning to many factors such as the non-conjugacy models with intractable integrals. Nevertheless, Bayesian methods have several advantages.

First, Bayesian methods provide a principled theory for combining prior knowledge and uncertain evidence to make sophisticated inference of hidden factors and predictions.

Second, Bayesian methods are conceptually simple and flexible, where hierarchical Bayesian modeling offers a flexible tool for characterizing uncertainty, missing values, latent structures, and more. Moreover, regularized Bayesian inference (RegBayes) further augments the flexibility by introducing an extra dimension (i.e., a posterior regularization term) to incorporate domain knowledge or to optimize a learning objective.

Finally, there exist very flexible algorithms (e.g., Markov Chain Monte Carlo) to perform posterior inference.

In a new overview published in the Beijing-based National Science Review, scientists at Tsinghua University, China present the latest advances in Bayesian methods for Big Data analysis. Co-authors Jun Zhu, Jianfei Chen, Wenbo Hu, and Bo Zhang cover the basic concepts of Bayesian methods, and review the latest progress on flexible Bayesian methods, efficient and scalable algorithms, and distributed system implementations.

These scientists likewise outline the potential development directions of future Bayesian methods.

"Bayesian methods are becoming increasingly relevant in the Big Data era to protect high capacity models against overfitting, and to allow models adaptively updating their capacity. However, the application of Bayesian methods to big data problems runs into a computational bottleneck that needs to be addressed with new (approximate) inference methods."

The scientists overview the recent advances on nonparametric Bayesian methods, regularized Bayesian inference, scalable algorithms, and system implementation.

The scientists also discuss on the connection with deep learning, "A natural and important question that remains under addressed is how to conjoin the flexibility of deep learning and the learning efficiency of Bayesian methods for robust learning", they anticipate.

Finally, the scientists make the comment that "The current machine learning methods in general still require considerable human expertise in devising appropriate features, priors, models, and algorithms. Much work has to be done in order to make ML more widely used and eventually become a common part of our day to day tools in data sciences".
This research received funding from the National 973 Project (2013CB329403), NSFC Projects (Nos. 61620106010, 61621136008, 61332007), and the National Youth Top-notch Talent Support Program.

See the article: Jun Zhu, JIanfei Chen, Wenbo Hu, Bo Zhang
Big Learning with Bayesian Methods
Natl Sci Rev (May 2017), DOI: 10.1093/nsr/nwx044

The National Science Review is the first comprehensive scholarly journal released in English in China that is aimed at linking the country's rapidly advancing community of scientists with the global frontiers of science and technology. The journal also aims to shine a worldwide spotlight on scientific research advances across China.

Science China Press

Related Big Data Articles:

Molecular big data, a new weapon for medicine
Being able to visualize the transmission of a virus in real-time during an outbreak, or to better adapt cancer treatment on the basis of the mutations present in a tumor's individual cells are only two examples of what molecular Big Data can bring to medicine and health globally.
Big data clarifies emotional circuit development
Several brain circuits that identify emotions are solidified early in development and include diverse regions beyond the amygdala, according to new research in children, adolescents, and young adults published in JNeurosci.
Big data says food is too sweet
New research from the Monell Center analyzed nearly 400,000 food reviews posted by Amazon customers to gain real-world insight into the food choices that people make.
Querying big data just got universal
A universal query engine for big data that works across computing platforms could accelerate analytics research.
What 'Big Data' reveals about the diversity of species
'Big data' and large-scale analyses are critical for biodiversity research to find out how animal and plant species are distributed worldwide and how ecosystems function.
More Big Data News and Big Data Current Events

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Rethinking Anger
Anger is universal and complex: it can be quiet, festering, justified, vengeful, and destructive. This hour, TED speakers explore the many sides of anger, why we need it, and who's allowed to feel it. Guests include psychologists Ryan Martin and Russell Kolts, writer Soraya Chemaly, former talk radio host Lisa Fritsch, and business professor Dan Moshavi.
Now Playing: Science for the People

#537 Science Journalism, Hold the Hype
Everyone's seen a piece of science getting over-exaggerated in the media. Most people would be quick to blame journalists and big media for getting in wrong. In many cases, you'd be right. But there's other sources of hype in science journalism. and one of them can be found in the humble, and little-known press release. We're talking with Chris Chambers about doing science about science journalism, and where the hype creeps in. Related links: The association between exaggeration in health related science news and academic press releases: retrospective observational study Claims of causality in health news: a randomised trial This...