Nav: Home

Advances in bayesian methods for big data

May 31, 2017

In the Big Data era, many scientific and engineering domains are producing massive data streams, with petabyte and exabyte scales becoming increasingly common. Besides the explosive growth in volume, Big Data also has high velocity, high variety, and high uncertainty. These complex data streams require ever-increasing processing speeds, economical storage, and timely response for decision making in highly uncertain environments, and have raised various challenges to conventional data analysis.

With the primary goal of building intelligent systems that automatically improve from experiences, machine learning (ML) is becoming an increasingly important field to tackle the big data challenges, with an emerging field of Big Learning, which covers theories, algorithms and systems on addressing big data problems.

Bayesian methods have been widely used in machine learning and many other areas. However, skepticism often arises when we talking about Bayesian methods for Big Data. Practitioners also criticize that Bayesian methods are often too slow for even small-scaled problems, owning to many factors such as the non-conjugacy models with intractable integrals. Nevertheless, Bayesian methods have several advantages.

First, Bayesian methods provide a principled theory for combining prior knowledge and uncertain evidence to make sophisticated inference of hidden factors and predictions.

Second, Bayesian methods are conceptually simple and flexible, where hierarchical Bayesian modeling offers a flexible tool for characterizing uncertainty, missing values, latent structures, and more. Moreover, regularized Bayesian inference (RegBayes) further augments the flexibility by introducing an extra dimension (i.e., a posterior regularization term) to incorporate domain knowledge or to optimize a learning objective.

Finally, there exist very flexible algorithms (e.g., Markov Chain Monte Carlo) to perform posterior inference.

In a new overview published in the Beijing-based National Science Review, scientists at Tsinghua University, China present the latest advances in Bayesian methods for Big Data analysis. Co-authors Jun Zhu, Jianfei Chen, Wenbo Hu, and Bo Zhang cover the basic concepts of Bayesian methods, and review the latest progress on flexible Bayesian methods, efficient and scalable algorithms, and distributed system implementations.

These scientists likewise outline the potential development directions of future Bayesian methods.

"Bayesian methods are becoming increasingly relevant in the Big Data era to protect high capacity models against overfitting, and to allow models adaptively updating their capacity. However, the application of Bayesian methods to big data problems runs into a computational bottleneck that needs to be addressed with new (approximate) inference methods."

The scientists overview the recent advances on nonparametric Bayesian methods, regularized Bayesian inference, scalable algorithms, and system implementation.

The scientists also discuss on the connection with deep learning, "A natural and important question that remains under addressed is how to conjoin the flexibility of deep learning and the learning efficiency of Bayesian methods for robust learning", they anticipate.

Finally, the scientists make the comment that "The current machine learning methods in general still require considerable human expertise in devising appropriate features, priors, models, and algorithms. Much work has to be done in order to make ML more widely used and eventually become a common part of our day to day tools in data sciences".
This research received funding from the National 973 Project (2013CB329403), NSFC Projects (Nos. 61620106010, 61621136008, 61332007), and the National Youth Top-notch Talent Support Program.

See the article: Jun Zhu, JIanfei Chen, Wenbo Hu, Bo Zhang
Big Learning with Bayesian Methods
Natl Sci Rev (May 2017), DOI: 10.1093/nsr/nwx044

The National Science Review is the first comprehensive scholarly journal released in English in China that is aimed at linking the country's rapidly advancing community of scientists with the global frontiers of science and technology. The journal also aims to shine a worldwide spotlight on scientific research advances across China.

Science China Press

Related Big Data Articles:

Discrimination, lack of diversity, & societal risks of data mining highlighted in big data
A special issue of Big Data presents a series of insightful articles that focus on Big Data and Social and Technical Trade-Offs.
'Charliecloud' simplifies Big Data supercomputing
At Los Alamos National Laboratory, home to more than 100 supercomputers since the dawn of the computing era, elegance and simplicity of programming are highly valued but not always achieved.
Advances in bayesian methods for big data
Big Data has imposed great challenges for machine learning. Bayesian methods provide a profound framework for characterizing the intrinsic uncertainty and performing probabilistic inference and decision-making.
Compiling big data in a human-centric way
When a group of researchers in the Undiagnosed Disease Network at Baylor College of Medicine realized they were spending days combing through databases searching for information regarding gene variants, they decided to do something about it.
Story of silver birch from genomic big data
Researchers at University of Helsinki, Finland and University at Buffalo, USA have analyzed the evolutionary history of silver birch using big data from the genomes of 150 birches.
Night lights, big data
Researchers from the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and the Environmental Defense Fund (EDF) have developed an online tool that incorporates 21 years of night-time lights data to understand and compare changes in human activities in countries around the world.
Big data approach to predict protein structure
Nothing works without proteins in the body, they are the molecular all-rounders in our cells.
Is your big data messy? We're making an app for that
Vizier, software under development by a University at Buffalo-led research team, aims to proactively catch big data errors.
Big data for the universe
Astronomers at Lomonosov Moscow State University in cooperation with their French colleagues and with the help of citizen scientists have released 'The Reference Catalog of galaxy SEDs,' which contains value-added information about 800,000 galaxies.
Using Big Data to understand immune system responses
An enzyme found in many bacteria, including the bacterium that gives us strep throat, has given mankind a cheap and effective tool with which to edit our own genes.

Related Big Data Reading:

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Digital Manipulation
Technology has reshaped our lives in amazing ways. But at what cost? This hour, TED speakers reveal how what we see, read, believe — even how we vote — can be manipulated by the technology we use. Guests include journalist Carole Cadwalladr, consumer advocate Finn Myrstad, writer and marketing professor Scott Galloway, behavioral designer Nir Eyal, and computer graphics researcher Doug Roble.
Now Playing: Science for the People

#530 Why Aren't We Dead Yet?
We only notice our immune systems when they aren't working properly, or when they're under attack. How does our immune system understand what bits of us are us, and what bits are invading germs and viruses? How different are human immune systems from the immune systems of other creatures? And is the immune system so often the target of sketchy medical advice? Those questions and more, this week in our conversation with author Idan Ben-Barak about his book "Why Aren't We Dead Yet?: The Survivor’s Guide to the Immune System".