SDSC, Calit2 awarded $1.4 million NSF grant for new bioinformatics tools

October 18, 2011

Researchers at the San Diego Supercomputer Center (SDSC) and the California Institute for Telecommunications and Information Technology (Calit2) at the University of California, San Diego, have been awarded a three-year, $1.4 million grant from the National Science Foundation (NSF) to create a Kepler Scientific Workflow System module. Researchers will develop new tools to help manage ever-growing data sets used in next-generation DNA sequencing.

"Next-generation DNA sequencing is now creating such a large amount of sequence data that it is overwhelming current computational tools and resources," said Ilkay Altintas, director of the Scientific Workflow Automation Technologies (SWAT) Lab within SDSC's Cyberinfrastructure Research, Education And Development (CI-RED) group, and Principal Investigator for the project. "New computational techniques and efficient implementation mechanisms for this data-intensive workload are needed to enable rapid analysis of these next-generation sequence data."

The project receiving the NSF award is called Advances in Biological Informatics Development: bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data. Bioinformatics refers to a field of science that combines biology, information technology, computers and statistical techniques to create research-driven solutions such as customized medications and treatments to help prevent disease, three-dimensional models of genomes and proteins, and advanced agricultural technologies.

"The enormous growth in data-intensive research means that as these data sets get larger, moving data over the network becomes more complicated, error-prone and costly to maintain," said Altintas, who also serves as SDSC's deputy coordinator for research.

The bioKepler project is motivated by the following three challenges that remain unsolved: To create such an environment, the bioKepler project will create scientific workflow components to develop an array of bioinformatics tools using distributed execution techniques. Once customized, these components will be used on multiple distributed platforms, including various cloud and grid computing platforms. The tools will be selected to meet the diverse needs of researchers, and organized into eight groups covering most aspects of bioinformatics applications: sequence database searches; mapping; sequence assembly; gene prediction; clustering; multiple sequence alignment, phylogeny and taxonomy; protein annotation; and other miscellaneous utilities such as data format transformation and parsing.

Training Next-Generation Scientists

"These tools will be applicable to a wide range of bioinformatics and computational biology problems," said Altintas, noting that "a key part of this project will also focus on education and outreach efforts, underscoring the importance of training next-generation scientists, as well as the need to narrow the gap between bioinformatics and technology."

All the resources, materials, and open-source software products produced by the bioKepler project will be integrated with Calit2's Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA), a data repository and a bioinformatics resource for metagenomic analysis.

"The Kepler workflow system has already been used comprehensively in the CAMERA project," said project co-investigator Weizhong Li, a research scientist at Calit2 and the Center for Research in Biological Systems (CRBS), and Bioinformatics group leader for CAMERA. "With the proposed developments in bioKepler, the CAMERA project and its large user communities will benefit from a larger set of next generation sequence analysis tools with much better scalability and flexibility. Other projects that heavily rely on next-generation sequencing, such as various microbiome projects, can also take advantage of the bioKepler software."

Moreover, bioKepler will be packaged to be installed on diverse, distributed execution environments (e.g., as a Web service and as virtual machines tuned for various Grid and Cloud systems), which in turn will enable deployment of bioKepler on public and private clusters and clouds.
-end-
In addition to Altintas and Li, the bioKepler research team includes Eric E. Allen, assistant professor of marine biology at the Scripps Oceanography Institute (SIO); Jianwu Wang, project scientist with SWAT; Daniel Crawl, workflow specialist with SWAT; and Shulei Sun and Sitao Wu, bioinformaticians at CRBS.

The bioKepler project is funded by NSF DBI-1062565 under the CI Reuse and Advances in Bioinformatics programs.

University of California - San Diego

Related Bioinformatics Articles from Brightsurf:

Glyphosate may affect human gut microbiota
More than half of bacterial species in the core of the human gut microbiome are potentially sensitive to glyphosate, shows new research.

Next-gen bioinformatics tool enables big data analysis without programming expertise
A new data analysis tool developed by MD Anderson researchers incorporates a user-friendly, natural-language interface to aid biomedical researchers without bioinformatics or programming expertise to conduct intuitive data.

Embracing bioinformatics in gene banks
Scientists from the IPK have explored, within a perspective paper, the upcoming challenges and possibilities of the future of gene banks.

New bioinformatics tool identifies and classifies CRISPR-Cas systems
Designed to improve the utility and availability of increasingly diverse CRISPR-Cas genome editing systems, the new CRISPRdisco automated pipeline helps researchers identify CRISPR repeats and cas genes in genome assemblies.

AMP Iissues consensus guideline recommendations for NGS bioinformatics pipelines
The Association for Molecular Pathology, the premier global, non-profit molecular diagnostics professional society, today published 17 consensus recommendations to help clinical laboratory professionals achieve high-quality sequencing results and deliver better patient care.

Advances: Bioinformatics applied to development & evaluation of boron-containing compounds
The interest for developing boron-containing compounds as drugs is increasing after some successful cases.

When life sciences become data sciences
The University of Freiburg offers Europe-wide infrastructure and service in Bioinformatics.

Bioinformatics brings to light new combinations of drugs to fight breast cancer
A bioinformatics analysis of pairing 64 drugs used to treat breast cancer allows researchers at IRB Barcelona to identify 10 previously untested combinations with potential to tackle resistance to breast cancer treatment.

New bioinformatics tool tests methods for finding mutant genes that 'drive' cancer
Computational scientists and cancer experts have devised bioinformatics software to evaluate how well current strategies distinguish cancer-promoting mutations from benign mutations in cancer cells.

EDGE bioinformatics brings genomics to everyone
A new bioinformatics platform called Empowering the Development of Genomics Expertise (EDGE) will help democratize the genomics revolution by allowing users with limited bioinformatics expertise to quickly analyze and interpret genomic sequence data.

Read More: Bioinformatics News and Bioinformatics Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.