Bluesky Facebook Reddit Email

Comprehensive evaluation of large language models in mining gene relations and pathway knowledge

07.10.24 | Higher Education Press

SAMSUNG T9 Portable SSD 2TB

SAMSUNG T9 Portable SSD 2TB transfers large imagery and model outputs quickly between field laptops, lab workstations, and secure archives.


Understanding complex biological pathways, such as gene-gene interactions and gene regulatory networks, is crucial for exploring disease mechanisms and advancing drug development. However, manual literature curation of these pathways cannot keep pace with the exponential growth of discoveries. Large-scale language models (LLMs) trained on extensive text corpora contain rich biological information and can be leveraged as a biological knowledge graph for pathway curation.

Recently, Quantitative Biology published a study titled "A Comprehensive Evaluation of Large Language Models in Mining Gene Relations and Pathway Knowledge." This research assesses 21 large language models (LLMs), including both API-based and open-source models, in their ability to retrieve biological knowledge. The evaluation focuses on predicting gene regulatory relations (activation, inhibition, and phosphorylation) and identifying gene components in pathways, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway as the ground truth, as illustrated in Figure 1.

The results reveal a significant disparity in model performance, with API-based models outperforming their open-source counterparts. The findings suggest that while LLMs are informative in gene network analysis and pathway mapping, their effectiveness varies, necessitating careful model selection. GPT-4 and Claude-Pro emerged as top performers in predicting gene regulatory relations, achieving higher precision and recall rates than other models. This study underscores the importance of selecting appropriate computational tools for specific tasks in biological research. It also provides a case study illustrating the use of LLMs as knowledge graphs for data mining in general.

Quantitative Biology

10.1002/qub2.57

Experimental study

Not applicable

A comprehensive evaluation of large language models in mining gene relations and pathway knowledge

19-Jun-2024

Keywords

Article Information

Contact Information

Rong Xie
Higher Education Press
xierong@hep.com.cn

Source

How to Cite This Article

APA:
Higher Education Press. (2024, July 10). Comprehensive evaluation of large language models in mining gene relations and pathway knowledge. Brightsurf News. https://www.brightsurf.com/news/LMJPN6NL/comprehensive-evaluation-of-large-language-models-in-mining-gene-relations-and-pathway-knowledge.html
MLA:
"Comprehensive evaluation of large language models in mining gene relations and pathway knowledge." Brightsurf News, Jul. 10 2024, https://www.brightsurf.com/news/LMJPN6NL/comprehensive-evaluation-of-large-language-models-in-mining-gene-relations-and-pathway-knowledge.html.