New framework pushes the limits of high-performance computing

November 12, 2018

Large-scale, advanced high-performance computing, often called supercomputing, is essential to solving both complex and large questions.

Everything from answering metaphysical queries about the origins of the universe to discovering cancer-fighting drugs to supporting high-speed streaming services, requires processing huge amounts of data.

But storage platforms essential for these advanced computer systems have been stuck in a rigid framework that required users to either choose between customization of features or high availability.

Now, Virginia Tech researchers have found a way to give high-performance computing (HPC) data systems the flexibility to thrive with a first-of-its-kind framework called BespoKV, perhaps helping to one day achieve the HPC goal of performing at the exascale, or 1 billion billion calculations per second.

The researchers will present their findings at the Association of Computing Machinery/IEEE Supercomputing Conference in Dallas, Texas, on Nov. 13.

The main ingredient to the functioning of the new platform is key value (KV) systems. KV systems store and retrieve important data from very fast memory-based storage instead of slower disks. These systems are increasingly used in today's high-performance applications that use distributed systems, which are made up of many computers to solve a problem. High-performance computing relies on having computers intake, process, and analyze huge amounts of data at unprecedent speeds. Currently, the best systems operate at a quadrillion calculations per second, or a petaflop.

The research is relevant to industries that process large amounts of data, whether it be the space-hogging, intense visual graphics of movie streaming sites; millions of financial transactions at large credit card companies; or user-generated content at social media outlets. Think large media sites like Facebook where content is everchanging and continually accessed. When users upload content to their profile pages, that information resides on multiple servers.

But if you have to continually access certain content, KV systems can be far more efficient as a storage medium because content loads from the faster in-memory store nearby, not the far-away storage server. This allows the system to provide very high performance in completing tasks or requests.

"I got interested in key value systems because this very fundamental and simple storage platform has not been exploited in high-performance computing systems where it can provide a lot of benefits," said Ali Anwar, first author on the paper being presented and a recent Virginia Tech graduate who is currently employed at IBM Research. "BespoKV is a novel framework that can enable HPC systems to provide a lot of flexibility and performance and not be chained to rigid storage design."

The main innovation of BespoKV is that it supports composing a range of KV stores with desirable features. It works by taking a single-server KV store called a datalet and enables immediate and ready-to-use distributed KV stores. Now, instead of redesigning a system from scratch to accomplish a specific task, a developer can drop a datalet into BespoKV and offload the "messy plumbing" of distributed systems to the framework. BespoKV decouples the KV store design into the control plane for distributed management and the data plane for local data storage.

The framework also enables new HPC services for workloads that businesses and institutions have yet to anticipate.

One of the major limiting effects of current state-of-the-art KV stores is that they are designed with pre-existing distributed services in mind and are often specialized for one specific setting. Another limiting factor is the inflexible monolithic design where distributed features are deeply baked into a system with backend data stores that do things like manage inventory, orders, and supply. The rigid design of these KV stores is not adaptive to everchanging user demands for myriad backend, topology, consistency, and a host of other services.

"Developers from large companies can really sink their teeth into designing innovative HPC storage systems with BespoKV," said Ali Butt, professor of computer science. "Data-access performance is a major limitation in HPC storage systems and generally employs a mix of solutions to provide flexibility along with performance, which is cumbersome. We have created a way to significantly accelerate the system behavior to comply with desired performance, consistency, and reliability levels."

BespoKV can be nimble because it allows an arbitrary mapping between desired services and available components while supporting distributed management services to realize and enable the distributed KV stores associated with the datalet.

"Now that we have proven that we can make the efficient and simple action of using KV systems in powerful HPC systems, customers won't have to choose between scalability and flexibility," said Butt.
This research is funded by the National Science Foundation. In addition to Anwar and Butt, collaborators include Dongyoon Lee, an assistant professor of computer science at Virginia Tech; Jingoo Han, also from Virginia Tech; Oak Ridge National Laboratory; George Mason University; and Perspecta Labs.

Written by Amy Loeffler

Virginia Tech

Related Data Articles from Brightsurf:

Keep the data coming
A continuous data supply ensures data-intensive simulations can run at maximum speed.

Astronomers are bulging with data
For the first time, over 250 million stars in our galaxy's bulge have been surveyed in near-ultraviolet, optical, and near-infrared light, opening the door for astronomers to reexamine key questions about the Milky Way's formation and history.

Novel method for measuring spatial dependencies turns less data into more data
Researcher makes 'little data' act big through, the application of mathematical techniques normally used for time-series, to spatial processes.

Ups and downs in COVID-19 data may be caused by data reporting practices
As data accumulates on COVID-19 cases and deaths, researchers have observed patterns of peaks and valleys that repeat on a near-weekly basis.

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.

Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.

Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.

Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.

Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.

Read More: Data News and Data Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to