Nav: Home

A system purely for developing high-performance, big data codes

June 11, 2018

HOUSTON -- (June 11, 2018) -- Computer scientists from Rice University's DARPA-funded Pliny Project believe they have the answer for every stressed-out systems programmer who has struggled to implement complex objects and workflows on 'big data' platforms like Spark and thought: "Isn't there a better way?"

Rice's PlinyCompute will be unveiled here Thursday at the 2018 ACM SIGMOD conference. In a peer-reviewed conference paper, the team describes PlinyCompute as "a system purely for developing high-performance, big data codes."

Like Spark, PlinyCompute aims for ease of use and broad versatility, said Chris Jermaine, the Rice computer science professor leading the platform's development. Unlike Spark, PlinyCompute is designed to support the intense kinds of computation that have only previously been possible with supercomputers, or high-performance computers (HPC).

"With machine learning, and especially deep learning, people have seen what complex analytics algorithms can do when they're applied to big data," Jermaine said. "Everyone, from Fortune 500 executives to neuroscience researchers, is clamoring for more and more complex algorithms, but systems programmers have mostly bad options for providing that today. HPC can provide the performance, but it takes years to learn to write code for HPC, and perhaps worse, a tool or library that might take days to create with Spark can take months to program on HPC.

"Spark was built for big data, and it supports things that HPC doesn't, like easy load balancing, fault tolerance and resource allocation, which are an absolute must for data-intensive tasks," he said. "Because of that, and because development times are far shorter than with HPC, people are building new tools that run on top of Spark for complex tasks like machine learning, graph analytics and more."

Because Spark wasn't designed with complex computation in mind, its computational performance can only be pushed so far, said Jia Zou, a Rice research scientist and first author of the ACM SIGMOD paper describing PlinyCompute.

"Spark is built on top of the Java Virtual Machine, or JVM, which manages runtimes and abstracts away most of the details regarding memory management," said Zou, who spent six years researching large-scale analytics and data management systems at IBM Research-China before joining Rice in 2015. "Spark's performance suffers from its reliance on the JVM, especially as computational demands increase for tasks like training deep neural networks for deep learning.

"PlinyCompute is different because it was designed for high performance from the ground up," Zou said. "In our benchmarking, we found PlinyCompute was at least twice as fast and in some cases 50 times faster at implementing complex object manipulation and library-style computations as compared to Spark."

She said the tests showed that PlinyCompute outperforms comparable tools for construction of high-performance tools and libraries.

Jermaine said not all programmers will find it easy to write code for PlinyCompute. Unlike the Java-based coding required for Spark, PlinyCompute libraries and models must be written in C++.

"There's more flexibility with PlinyCompute," Jermaine said. "That can be a challenge for people who are less experienced and knowledgeable about C++, but we also ran a side-by-side comparison of the number of lines of code that were needed to complete various implementations, and for the most part there was no significant difference between PlinyCompute and Spark."

The Pliny Project, which launched in 2014, is an $11 million, DARPA-funded effort to create sophisticated programming tools that can both "autocomplete" and "autocorrect" code for programmers, in much the same way that software completes search queries and corrects spelling on web browsers and smartphones. Pliny uses machine learning to read and learn from billions of lines of open-source computer programs, and Jermaine said PlinyCompute was born from this effort.

"It's a computationally complex machine learning application, and there really wasn't a good tool for creating it," he said. "Early on, we recognized that PlinyCompute was a tool that could be applied to problems far beyond what we were using it for in the Pliny Project."
-end-
Installation and deployment information, an API, FAQ, tutorials and more are available at plinycompute.rice.edu.

The research was also supported by the National Science Foundation.

Additional co-authors on the PlinyCompute SIGMOD paper include Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian and Binhang Yuan, all of Rice.

High-resolution IMAGES are available for download at:

http://news.rice.edu/files/2018/06/0611_PLINYCOMPUTE-grp-lg-1xou8nw.jpg
CAPTION: Rice University's PlinyCompute team includes (from left) Shangyu Luo, Sourav Sikdar, Jia Zou, Tania Lorido, Binhang Yuan, Jessica Yu, Chris Jermaine, Carlos Monroy, Dimitrije Jankov and Matt Barnett. (Photo by Jeff Fitlow/Rice University)

http://news.rice.edu/files/2018/06/0611_PLINYCOMPUTE-cj-lg-20v1t5r.jpg
CAPTION: Rice University computer scientist Chris Jermaine leads the PlinyCompute project. (Photo by Jeff Fitlow/Rice University)

http://news.rice.edu/files/2018/06/0611_PLINYCOMPUTE-jz-lg-1nimwwi.jpg
CAPTION: Rice University research scientist Jia Zou is first author of a new peer-reviewed study about PlinyCompute. (Photo by Jeff Fitlow/Rice University)

The DOI of the SIGMOD paper is: 10.1145/3183713.3196933

A copy of the SIGMOD paper is available at: https://dl.acm.org/citation.cfm?id=3196933

PlinyCompute homepage: http://plinycompute.rice.edu/

Related research from Rice:

Rice U. turns deep-learning AI loose on software development -- April 25, 2018 http://news.rice.edu/2018/04/25/rice-u-turns-deep-learning-ai-loose-on-software-development/

Next for DARPA: 'Autocomplete' for programmers -- Nov. 5, 2014 http://news.rice.edu/2014/11/05/next-for-darpa-autocomplete-for-programmers/

This release can be found online at news.rice.edu.

Follow Rice News and Media Relations via Twitter @RiceUNews.

Located on a 300-acre forested campus in Houston, Rice University is consistently ranked among the nation's top 20 universities by U.S. News & World Report. Rice has highly respected schools of Architecture, Business, Continuing Studies, Engineering, Humanities, Music, Natural Sciences and Social Sciences and is home to the Baker Institute for Public Policy. With 3,970 undergraduates and 2,934 graduate students, Rice's undergraduate student-to-faculty ratio is just under 6-to-1. Its residential college system builds close-knit communities and lifelong friendships, just one reason why Rice is ranked No. 1 for quality of life and for lots of race/class interaction and No. 2 for happiest students by the Princeton Review. Rice is also rated as a best value among private universities by Kiplinger's Personal Finance. To read "What they're saying about Rice," go to http://tinyurl.com/RiceUniversityoverview.

Rice University

Related Rice Articles:

New rice fights off drought
Scientists at the RIKEN Center for Sustainable Resource Science (CSRS) have developed strains of rice that are resistant to drought in real-world situations.
Domesticated rice goes rogue
We tend to assume that domestication is a one-way street and that, once domesticated, crop plants stay domesticated.
Protecting rice crops at no extra cost
A newly identified genetic mechanism in rice can be utilized to maintain resistance to a devastating disease, without causing the typical tradeoff -- a decrease in grain yield, a new study reports.
Every grain of rice: Ancient rice DNA data provides new view of domestication history
Now, using new data collected samples of ancient, carbonized rice, a team of Japanese and Chinese scientists have successfully determined DNA sequences to make the first comparisons between modern and ancient rice.
Four newly identified genes could improve rice
A Japanese research team have applied a method used in human genetic analysis to rice and rapidly discovered four new genes that are potentially significant for agriculture.
Infants who ate rice, rice products had higher urinary concentrations of arsenic
Although rice and rice products are typical first foods for infants, a new study found that infants who ate rice and rice products had higher urinary arsenic concentrations than those who did not consume any type of rice, according to an article published online by JAMA Pediatrics.
New resource for managing the Mexican rice borer
A new article in the Journal of Integrated Pest Management provides information on the biology and life cycle of the Mexican rice borer (Eoreuma loftini), and offers suggestions about how to manage them.
Fighting rice fungus
Plant scientists are uncovering more clues critical to disarming a fungus that leads to rice blast disease and devastating crop losses.
The origin and spread of 'Emperor's rice'
Black rice was prized in ancient times for its color and is prized in modern times for its high levels of antioxidants, but its early history has been shrouded in mystery until now.
Trigger found for defense to rice disease
Biologists have discovered how the rice plant's immune system is triggered by disease, in a discovery that could boost crop yields and lead to more disease-resistant types of rice.

Related Rice Reading:

Best Science Podcasts 2019

We have hand picked the best science podcasts for 2019. Sit back and enjoy new science podcasts updated daily from your favorite science news services and scientists.
Now Playing: TED Radio Hour

Jumpstarting Creativity
Our greatest breakthroughs and triumphs have one thing in common: creativity. But how do you ignite it? And how do you rekindle it? This hour, TED speakers explore ideas on jumpstarting creativity. Guests include economist Tim Harford, producer Helen Marriage, artificial intelligence researcher Steve Engels, and behavioral scientist Marily Oppezzo.
Now Playing: Science for the People

#524 The Human Network
What does a network of humans look like and how does it work? How does information spread? How do decisions and opinions spread? What gets distorted as it moves through the network and why? This week we dig into the ins and outs of human networks with Matthew Jackson, Professor of Economics at Stanford University and author of the book "The Human Network: How Your Social Position Determines Your Power, Beliefs, and Behaviours".