Next-gen bioinformatics tool enables big data analysis without programming expertise

September 24, 2020

HOUSTON -- A new data analysis tool developed by researchers at The University of Texas MD Anderson Cancer Center incorporates a user-friendly, natural-language interface to allow biomedical researchers without specialized expertise in bioinformatics or programming languages to conduct intuitive analysis of large datasets.

The open-access, artificial intelligence (AI)-driven program, called DrBioRight, was created to lower barriers for all researchers to make full use of the increasingly large amounts of data generated in modern research methods. A report of this platform was published today in Cancer Cell.

"We felt that we could improve the current model for conducting routine bioinformatics analysis and greatly speed up turnaround time by creating a tool that any researcher could use," said Han Liang, Ph.D., professor of Bioinformatics and Computational Biology. "Our long-term goal for DrBioRight is to be an intelligent collaborator for every researcher."

High-throughput technologies used in modern biomedical research generate large, complex datasets that provide comprehensive information about patients, animal models or cell lines being studied. These may include, for example, studying the whole of genetic information (genomics), gene expression (transcriptomics), or protein expression (proteomics).

Because these "omics" datasets are so complex, it can be challenging to answer specific biological questions without specialized analytical approaches, explained Liang. These analyses are usually done with using a computer script written in a variety of programming languages, which requires some understanding of both programming and bioinformatics.

Bioinformaticians can help to navigate and process these complex datasets, but the work can be time consuming. Therefore, the research team developed DrBioRight to enable researchers to more easily conduct routine analyses of their own data through a user-friendly chat interface with natural-language interactions.

The natural language-oriented program allows users to ask questions of the program as if they were speaking naturally rather than in complex programming languages, explained Liang.

DrBioRight is freely available to academic researchers. Initially, the program has a number of modules ready-built to handle the most common types of bioinformatics questions and includes some of most frequently used public cancer datasets available, such as The Cancer Genome Atlas and Cancer Cell Line Encyclopedia.

As a confirmation of the approach, the researchers replicated the analysis of a classic cancer genomics paper using DrBioRight and found it to accurately reproduce the previously published results.

Because the program is driven by AI, it also has the ability to learn from each inquiry and improve analysis, becoming a more useful tool over time. Going forward, the researchers hope to improve DrBioRight to enable users to analyze their own datasets as well as allow open development for new modules.

"As we work to improve the program, we also want to enable other bioinformaticians to contribute their algorithms and teach DrBioRight," said Liang. "Involvement from the entire research community will help to create a tool that is useful in answering complex research questions more efficiently."
-end-
This research was supported by the National Institutes of Health (U24CA209851, U01CA217842, P50CA221703 and P30CA016672), the MD Anderson Faculty Scholar Award to Liang and The Lorraine Dell Bioinformatics for Personalization of Cancer Medicine Program.

Additional collaborators include: Jun Li, Ph.D., Hu Chen, Yumeng Wang, Ph.D. and Mei-Ju May Chen, Ph.D., all of Bioinformatics and Computational Biology. H. Chen and Y. Wang also are members of the graduate program in Quantitative and Computational Biosciences at the Baylor College of Medicine, Houston, TX. A full list of author disclosures can be found with the full paper here.

About MD Anderson

The University of Texas MD Anderson Cancer Center in Houston ranks as one of the world's most respected centers focused on cancer patient care, research, education and prevention. The institution's sole mission is to end cancer for patients and their families around the world. MD Anderson is one of only 51 comprehensive cancer centers designated by the National Cancer Institute (NCI). MD Anderson is ranked No.1 for cancer care in U.S. News & World Report's "Best Hospitals" survey. It has ranked as one of the nation's top two hospitals for cancer care since the survey began in 1990, and has ranked first 16 times in the last 19 years. MD Anderson receives a cancer center support grant from the NCI of the National Institutes of Health (P30 CA016672).

University of Texas M. D. Anderson Cancer Center

Related Bioinformatics Articles from Brightsurf:

Glyphosate may affect human gut microbiota
More than half of bacterial species in the core of the human gut microbiome are potentially sensitive to glyphosate, shows new research.

Next-gen bioinformatics tool enables big data analysis without programming expertise
A new data analysis tool developed by MD Anderson researchers incorporates a user-friendly, natural-language interface to aid biomedical researchers without bioinformatics or programming expertise to conduct intuitive data.

Embracing bioinformatics in gene banks
Scientists from the IPK have explored, within a perspective paper, the upcoming challenges and possibilities of the future of gene banks.

New bioinformatics tool identifies and classifies CRISPR-Cas systems
Designed to improve the utility and availability of increasingly diverse CRISPR-Cas genome editing systems, the new CRISPRdisco automated pipeline helps researchers identify CRISPR repeats and cas genes in genome assemblies.

AMP Iissues consensus guideline recommendations for NGS bioinformatics pipelines
The Association for Molecular Pathology, the premier global, non-profit molecular diagnostics professional society, today published 17 consensus recommendations to help clinical laboratory professionals achieve high-quality sequencing results and deliver better patient care.

Advances: Bioinformatics applied to development & evaluation of boron-containing compounds
The interest for developing boron-containing compounds as drugs is increasing after some successful cases.

When life sciences become data sciences
The University of Freiburg offers Europe-wide infrastructure and service in Bioinformatics.

Bioinformatics brings to light new combinations of drugs to fight breast cancer
A bioinformatics analysis of pairing 64 drugs used to treat breast cancer allows researchers at IRB Barcelona to identify 10 previously untested combinations with potential to tackle resistance to breast cancer treatment.

New bioinformatics tool tests methods for finding mutant genes that 'drive' cancer
Computational scientists and cancer experts have devised bioinformatics software to evaluate how well current strategies distinguish cancer-promoting mutations from benign mutations in cancer cells.

EDGE bioinformatics brings genomics to everyone
A new bioinformatics platform called Empowering the Development of Genomics Expertise (EDGE) will help democratize the genomics revolution by allowing users with limited bioinformatics expertise to quickly analyze and interpret genomic sequence data.

Read More: Bioinformatics News and Bioinformatics Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.