A cure for medical researchers' big data headache

December 07, 2015

As medical research has become more specialized, the scientific community's understanding of the human body has increased, resulting in enhanced treatments, new drugs, and better health outcomes.

A side effect of this information explosion, however, is the fragmentation of knowledge. With thousands of new articles being published by medical journals every day, developments that could inform and add context to medicine's global body of knowledge often go unnoticed.

Uncovering these overlooked gaps is the primary objective of literature-based discovery, a practice that seeks to connect existing knowledge. The advent of online databases and advanced search techniques has aided this pursuit, but existing methods still lean heavily on researchers' intuition and chance discovery. Better tools could help uncover previously unrecognized relationships, such as the link between a gene and a disease, a drug and a side effect, or an individual's environment and risk of developing cancer.

For the past five years, Sreenivas Rangan Sukumar, a data scientist at the Department of Energy's Oak Ridge National Laboratory, has been working with health data and the high-performance computing resources of ORNL's Compute and Data Environment for Science (CADES) to improve health care in the United States. His most recent success, called Oak Ridge Graph Analytics for Medical Innovation (ORiGAMI), supplies researchers with an advanced data tool for literature-based discovery that has the potential to accelerate medical research and discovery.

"Humans' limited bandwidth constrains the ability to reason with the vast amounts of available medical information," Sukumar said. "By design, ORiGAMI can reason with the knowledge of every published medical paper every time a clinical researcher uses the tool. This helps researchers find unexplored connections in the medical literature. By allowing computers to do what they do best, doctors can do better at answering health-related questions."

The result of collaboration between ORNL and the US National Library of Medicine (NLM), a division of the National Institutes of Health, ORiGAMI unites three emerging technologies that are shaping the future of health care: big data, graph computing, and the Semantic Web, a common framework that allows data to be shared more freely between people and machines.

A Better Way to Search

When medical researchers and clinicians want to know the latest biomedical research, they turn to MEDLINE, NLM's comprehensive database of life sciences and biomedical information. MEDLINE draws from more than 5,600 journals worldwide, adding 2,000 to 4,000 new citations each day to its archive.

A conventional search engine query of MEDLINE can yield results in the thousands--more information than a researcher can review. To improve the usefulness of MEDLINE searches, NLM information research specialist Tom Rindflesch developed software called Semantic MEDLINE that is capable of "reading" key words pulled from the titles and abstracts of articles and summarizing the most relevant information in an interactive graph. The graph, a network of words connected by lines, draws attention to key relationships between the texts and serves as a guide to further exploration. Currently, more than 70 million articles in the MEDLINE database can be searched in this way.

"Semantic MEDLINE is kind of like having a research assistant who looks at a ton of articles and organizes them for you," Rindflesch said.

One of the primary limitations of NLM's Semantic MEDLINE, however, is computing. To produce its results, the application must plow through millions of subject-verb-object groupings pulled from each article and identify the strongest relationships or--better still--the strongest potential relationships, a specialized task that requires a specialized computer. Although conventional data analysis computers excel at reducing large datasets to smaller, more significant datasets, they struggle to compute large graphs capable of linking concepts and weak--yet relevant--associations.

Fortunately, CADES, an integrated compute and data ecosystem within ORNL's Computing and Computational Sciences Directorate, houses a machine with just the right attributes. Apollo, a Cray Urika graph computer, possesses massive multithreaded processors and 2 terabytes of shared memory, attributes that allow it to host the entire MEDLINE database and compute multiple pathways on multiple graphs simultaneously. Combined with Helios, CADES' Cray Urika extreme analytics platform, Sukumar's team had the cutting-edge hardware needed to process large datasets quickly--about 1,000 times faster than a workstation--and at scale.

Once the MEDLINE database was brought into the CADES environment, Sukumar's team applied advanced graph theory models that implement semantic, statistical, and logical reasoning algorithms to create ORiGAMI. The result is a free online application capable of delivering health insights in less than a second based on the combined knowledge of a worldwide medical community.

The Future of Research

In the hands of medical experts and clinicians, ORiGAMI has the potential to increase the efficiency of medical research by directing researchers toward the right questions, an outcome that could reduce costs and speed up delivery of new treatments. The tool is currently being enhanced beyond literature-based reasoning to data-driven, evidence-supported reasoning using cohort and intervention assessment methods.

Georgia Tourassi, director of ORNL's Health Data Sciences Institute, offered an example of how ORiGAMI is impacting research. Tourassi's team is investigating environmental factors and migration patterns that affect people's cancer risk for a study proposed by the National Cancer Institute's Provocative Questions Initiative. As part of the investigation, the team searched for connections between lung cancer and airborne carcinogens recognized by the US Environmental Protection Agency (EPA).

"When we threw the EPA's top 10 carcinogens at ORiGAMI, we noticed that there were a few elements that appeared over and over as connecting links. Some of these elements made sense from a reasoning point of view, but there was one that we had never seen before," Tourassi said.

The surprising connection was xylene, a common solvent used in the printing, rubber, paint, and leather industries. Past EPA studies focused on xylene as a potential carcinogen have proven inconclusive, but ORiGAMI's results suggested further inquiry. Using publicly available health-related datasets and an advanced web crawler called iCRAWL, Tourassi's team built profiles of xylene exposure for lung cancer patients and non-cancer patients and compared the two.

"The people who had lung cancer had much larger and longer exposures to xylene than the people without cancer," Tourassi said. "This is not confirmation that xylene causes cancer--in order to have confirmation, we need a carefully designed longitudinal cohort study--but this is one more red flag that we should be looking at xylene closely."

In addition to population health, Tourassi's team has used ORiGAMI to explore genomic literature. Tourassi refers to the utility of ORiGAMI as "computer-assisted serendipity," meaning the tool enhances rather than replaces the person making the discovery.

"All of us have those moments of epiphany when certain thoughts click into our head and we move on to explore hypotheses deeper," Tourassi said. "This tool enables that serendipity. It helps guide you in certain ways."
The development of ORiGAMI was supported by ORNL's Laboratory Directed Research and Development program.

Oak Ridge National Laboratory is supported by the US Department of Energy's Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

DOE/Oak Ridge National Laboratory

Related Lung Cancer Articles from Brightsurf:

State-level lung cancer screening rates not aligned with lung cancer burden in the US
A new study reports that state-level lung cancer screening rates were not aligned with lung cancer burden.

The lung microbiome may affect lung cancer pathogenesis and prognosis
Enrichment of the lungs with oral commensal microbes was associated with advanced stage disease, worse prognosis, and tumor progression in patients with lung cancer, according to results from a study published in Cancer Discovery, a journal of the American Association for Cancer Research.

New analysis finds lung cancer screening reduces rates of lung cancer-specific death
Low-dose CT screening methods may prevent one death per 250 at-risk adults screened, according to a meta-analysis of eight randomized controlled clinical trials of lung cancer screening.

'Social smokers' face disproportionate risk of death from lung disease and lung cancer
'Social smokers' are more than twice as likely to die of lung disease and more than eight times as likely to die of lung cancer than non-smokers, according to research presented at the European Respiratory Society International Congress.

Lung cancer therapy may improve outcomes of metastatic brain cancer
A medication commonly used to treat non-small cell lung cancer that has spread, or metastasized, may have benefits for patients with metastatic brain cancers, suggests a new review and analysis led by researchers at St.

Cancer mortality continues steady decline, driven by progress against lung cancer
The cancer death rate declined by 29% from 1991 to 2017, including a 2.2% drop from 2016 to 2017, the largest single-year drop in cancer mortality ever reported.

Cancer-sniffing dogs 97% accurate in identifying lung cancer, according to study in JAOA
The next step will be to further fractionate the samples based on chemical and physical properties, presenting them back to the dogs until the specific biomarkers for each cancer are identified.

Lung transplant patients face elevated lung cancer risk
In an American Journal of Transplantation study, lung cancer risk was increased after lung transplantation, especially in the native (non-transplanted) lung of single lung transplant recipients.

Proposed cancer treatment may boost lung cancer stem cells, study warns
Epigenetic therapies -- targeting enzymes that alter what genes are turned on or off in a cell -- are of growing interest in the cancer field as a way of making a cancer less aggressive or less malignant.

Are you at risk for lung cancer?
This question isn't only for people who've smoked a lot.

Read More: Lung Cancer News and Lung Cancer Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.