Nav: Home

Querying big data just got universal

June 23, 2019

To solve one of the key obstacles in big-data science, KAUST researchers have created a framework for searching very large datasets that runs easily on different computing architectures. Their achievement allows researchers to concentrate on advancing the search engine, or query engine, itself rather than on painstakingly coding for specific computing platforms.

Big data is one of the most promising yet challenging aspects of today's information-heavy world. While the huge and ever-expanding sets of information, such as online-collected data or genetic information, could hold powerful insights for science and humanity, processing and interrogating all this data require highly sophisticated techniques.

Many different approaches to querying big data have been explored. But one of the most powerful and computationally effective is based on analyzing data with a subject-predicate-object triplestore structure of the form (e.g., apple, is a, fruit). This structure lends itself to being treated like a graph with edges and vertices, and this characteristic has been used to code query engines for specific computing architectures for maximum efficiency. However, such architecture-specific approaches cannot be readily ported to different platforms, limiting the opportunities for innovation and advancement in analytics.

"Modern computing systems provide diverse platforms and accelerators, and programming them can be intimidating and time consuming," say Fuad Jamour and Yanzhao Chen, Ph.D. candidates in Panos Kalnis's group in KAUST's Extreme Computing Research Center. "Our research group focuses on building systems and algorithms for processing and analyzing very large datasets. This research addresses the desire to write a program once and then use it across different platforms."

Rather than the previously used graph-traversal or exhaustive relational-indexing approaches, the group queried triplestore data by using an applied mathematical approach called sparse-matrix algebra.

"Our paper describes the first research graph-query engine with matrix algebra at its core to address the issue of portability," says Jamour. "Most existing graph-query engines are designed for single computers or small distributed-memory systems. And porting existing engines to large distributed-memory systems, like supercomputers, involves significant engineering effort. Our sparse-matrix algebra scheme can be used to build scalable, portable and efficient graph-query engines."

The team's experiments on large-scale real and synthetic datasets achieved performance comparable with, or better than, existing specialized approaches for complex queries. Their scheme also has the capacity to scale up to very large computing infrastructures handling datasets of up to 512 billion triples.

"These ideas can facilitate building analytics components in graph databases with cutting-edge performance, which is currently in high demand," says Chen.
-end-


King Abdullah University of Science & Technology (KAUST)

Related Big Data Articles:

Predicting sports performance with "big data"
Smartphones and wearable devices are not simple accessories for athletes.
Big data could yield big discoveries in archaeology, Brown scholar says
Parker VanValkenburgh, an assistant professor of anthropology, curated a journal issue that explores the opportunities and challenges big data could bring to the field of archaeology.
Army develops big data approach to neuroscience
A big data approach to neuroscience promises to significantly improve our understanding of the relationship between brain activity and performance.
'Big data' for life sciences
Scientists have produced a co-regulation map of the human proteome, which was able to capture relationships between proteins that do not physically interact or co-localize.
Molecular big data, a new weapon for medicine
Being able to visualize the transmission of a virus in real-time during an outbreak, or to better adapt cancer treatment on the basis of the mutations present in a tumor's individual cells are only two examples of what molecular Big Data can bring to medicine and health globally.
Big data says food is too sweet
New research from the Monell Center analyzed nearly 400,000 food reviews posted by Amazon customers to gain real-world insight into the food choices that people make.
Querying big data just got universal
A universal query engine for big data that works across computing platforms could accelerate analytics research.
What 'Big Data' reveals about the diversity of species
'Big data' and large-scale analyses are critical for biodiversity research to find out how animal and plant species are distributed worldwide and how ecosystems function.
Big data takes aim at a big human problem
A James Cook University scientist is part of an international team that's used new 'big data' analysis to achieve a major advance in understanding neurological disorders such as Epilepsy, Alzheimer's and Parkinson's disease.
Small babies, big data
The first week of a newborn's life is a time of rapid biological change as the baby adapts to living outside the womb, suddenly exposed to new bacteria and viruses.
More Big Data News and Big Data Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Listen Again: The Power Of Spaces
How do spaces shape the human experience? In what ways do our rooms, homes, and buildings give us meaning and purpose? This hour, TED speakers explore the power of the spaces we make and inhabit. Guests include architect Michael Murphy, musician David Byrne, artist Es Devlin, and architect Siamak Hariri.
Now Playing: Science for the People

#576 Science Communication in Creative Places
When you think of science communication, you might think of TED talks or museum talks or video talks, or... people giving lectures. It's a lot of people talking. But there's more to sci comm than that. This week host Bethany Brookshire talks to three people who have looked at science communication in places you might not expect it. We'll speak with Mauna Dasari, a graduate student at Notre Dame, about making mammals into a March Madness match. We'll talk with Sarah Garner, director of the Pathologists Assistant Program at Tulane University School of Medicine, who takes pathology instruction out of...
Now Playing: Radiolab

What If?
There's plenty of speculation about what Donald Trump might do in the wake of the election. Would he dispute the results if he loses? Would he simply refuse to leave office, or even try to use the military to maintain control? Last summer, Rosa Brooks got together a team of experts and political operatives from both sides of the aisle to ask a slightly different question. Rather than arguing about whether he'd do those things, they dug into what exactly would happen if he did. Part war game part choose your own adventure, Rosa's Transition Integrity Project doesn't give us any predictions, and it isn't a referendum on Trump. Instead, it's a deeply illuminating stress test on our laws, our institutions, and on the commitment to democracy written into the constitution. This episode was reported by Bethel Habte, with help from Tracie Hunte, and produced by Bethel Habte. Jeremy Bloom provided original music. Support Radiolab by becoming a member today at Radiolab.org/donate.     You can read The Transition Integrity Project's report here.