Spanish researchers review the state-of-the-art text mining technologies for chemistry

June 21, 2017

In a recent Chemical Reviews article, the Biological Text Mining Unit at the Spanish National Cancer Research Centre (CNIO) together with with researchers at the Center for Applied Medical Research (CIMA), of the University of Navarra, in Pamplona, and the Barcelona Supercomputing Centre (BSC-CNS) have published the first exhaustive revision of the state-of-the-art methodologies underlying chemical search engines, named entity recognition and text mining systems.

The rapidly growing field of big data applications in biomedical research together with the use of machine learning and artificial intelligence technologies for text data mining has resulted in promising tools. "This review -state the authors- is organised to serve as a practical guide to researchers entering in this field but also to help them to envision the next steps in this emerging data science field".

"Through the release of Gold Standard datasets and the organisation of several community challenge benchmark events, the Biological Text Mining Unit has played a critical role in the development and evaluation of current chemical text mining systems, as highlighted in this article," explains Martin Krallinger, head of the Unit and co-first author of the review.


A considerable fraction of biomedical-relevant data is only available in the form of unstructured data. This type of data includes the rapidly growing scientific literature, medicinal chemistry patents, electronic health records or clinical trial documents. In fact, every year, over 20,000 new compounds are published in medicinal and biological chemistry journals.

Being able to transform unstructured biomedical research data into structured databases that can be more efficiently processed by machines or queried by humans is becoming critical for a range of very heterogeneous applications. These include the identification of new drug targets and chemical probes to validate/discard those new potential targets, re-purposing of approved drugs, the identification of adverse drug events or retrieval of systems biology associated with chemical-disease or chemical-gene networks.

Chemical compounds constitute a key entity type of critical relevance for biomedical research; as a therapeutic strategy to treat medical needs. In fact, "the construction of large chemical knowledge bases, integrating chemical information with biological and clinical data, is crucial to identify and validate new therapeutic targets for unmet medical needs as well as to speed up the drug discovery process" explains Julen Oyarzabal, Director of Translational Sciences at CIMA and co-leader of this report.

Centro Nacional de Investigaciones Oncológicas (CNIO)

Related Biomedical Research Articles from Brightsurf:

General data protection regulation hinders global biomedical research
The European Union (EU) General Data Protection Regulation (GDPR) was designed to give EU citizens greater protection and control of their personal data, particularly when transferred to entities outside the EU.

Novel educational program puts a human face on biomedical research
The goal of translational research is to speed research breakthroughs into clinical practice.

Biomedical research may miss key information by ignoring genetic ancestry
A new study of Black residents of four distinct US cities reveals variations in genetic ancestry and social status that underscore the inadequacy of using skin color as a proxy for race in research.

Advances in cryo-EM materials may aid cancer and biomedical research
Cryogenic-Electron Microscopy (cryo-EM) has been a game changer in the field of medical research, but the substrate, used to freeze and view samples under a microscope, has not advanced much in decades.

World-first program uncovers errors in biomedical research results
Just like the wrong ingredients can spoil a cake, so too can the wrong ingredients spoil the results in biomedical research.

Scientists poised to study reproducibility of Brazilian biomedical research
A project to assess the reproducibility of biomedical research in Brazil has been described today in the open-access journal eLife.

Transparency and reproducibility of biomedical research is improving
New research publishing Nov. 20 in the open-access journal PLOS Biology from Joshua Wallach, Kevin Boyack, and John Ioannidis suggests that progress has been made in key areas of research transparency and reproducibility.

As private funding of biomedical research soars, new risks arise
Academic medical centers (AMCs) in the US are navigating an increasing shift in research funding from historic public funding (e.g., NIH) to private sources such as pharma and biotech companies, foundations, and charities, raising a host of new issues related to collaborative research models, intellectual property rights, and scientific and ethical oversight.

BGRF scientists co-publish research paper on blockchain & AI for biomedical applications
Biogerontology Research Foundation Chief Science Officer (CSO) co-authored the landmark paper in the journal Oncotarget on the convergence of blockchain and AI to decentralize and galvanize healthcare and biomedical research.

Promising new drug for Hep B tested at Texas Biomedical Research Institute
Research at the Southwest National Primate Research Center (SNPRC) on the campus of Texas Biomedical Research Institute helped advance a new treatment now in human trials for chronic hepatitis B virus (HBV) infection.

Read More: Biomedical Research News and Biomedical Research Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to