Analysis of COVID-19 publications identifies research gaps

September 17, 2020

Since the start of the COVID-19 pandemic, scientific and medical journals have published over 100,000 studies on SARS-CoV-2. But according to data scientists who created a machine-learning tool to analyze the deluge of publications, basic lab-based studies on the microbiology of the virus, including research on its pathogenesis and mechanisms of viral transmission, are lacking. Their analysis appears September 16 in the journal Patterns.

"In a crisis like this pandemic, we would expect research outside the lab to happen at a faster pace than lab research," says first author Anhvinh Doanvo (@AnhvinhDoanvo), a volunteer data scientist with the COVID-19 Dispersed Volunteer Research Network. "Nevertheless, the relative lack of lab-based studies seems to be unique to SARS-CoV-2, compared to other human coronaviruses. This shortage of lab-based research means that the scientific community may miss key aspects of the virus that could impact our ability to contain this pandemic and to counter future ones."

The investigators used research abstracts obtained from CORD-19 (COVID-19 Open Research Dataset). CORD-19 is updated daily and includes peer-reviewed studies from PubMed Central, as well as preprints from bioRxiv and medRxiv. At the time they conducted their first analysis at the end of May, the dataset included more than 137,000 studies. The analysis was later updated with data through July 31.

The team used two computational methods to analyze the data. The first was dimensionality reduction, which helps to find big patterns across many documents, such as abstracts from scientific studies, and to identify trends based on those patterns. The second method, topic modeling, allowed them to group the documents into different topics and to compare research on SARS-CoV-2 to research on other coronaviruses. Unlike previous studies that have focused only on keywords, both of these tools enabled them to review the full text of the abstracts.

"Broadly speaking, we found that the research community has produced a lot of work on the clinical manifestations of the virus, epidemiological models of its spread, and other work based on data collected from the field," says senior author Maimuna Majumder (@maiamajumder), a computational epidemiologist at Harvard Medical School and Boston Children's Hospital's Computational Health Informatics Program.

The researchers also note that research has changed over time, with an acceleration in studies examining public health responses, clinical issues related to the virus, the societal impact of the outbreak, and how the disease spreads across populations, while reporting on the status of the outbreak has begun to plateau. "This is a positive development, as it indicates that the scientific community has transitioned from the role of a passive observer of the virus into a group studying ways to fight its spread," Majumder says.

"But basic microbiological research has been slow to pick up the pace, leaving potential knowledge gaps in its wake," Doanvo says. "It's possible that stronger resourcing in these time- and resource-intensive efforts would better enable the scientific community to respond quickly to this virus."

The researchers hope this analysis will help raise awareness about the importance of prioritizing lab-based studies on SARS-CoV-2 moving forward. They plan to conduct another analysis of scientific studies in about a year, using the tools they have already developed.
This project is part of the COVID-19 Dispersed Volunteer Research Network.

Patterns, Doanvo et al. "Machine Learning Maps Research Needs in COVID-19 Literature"

Patterns (@Patterns_CP), published by Cell Press, is a data science journal publishing original research focusing on solutions to the cross-disciplinary problems that all researchers face when dealing with data, as well as articles about datasets, software code, algorithms, infrastructures, etc., with permanent links to these research outputs. Visit: To receive Cell Press media alerts, please contact

Cell Press

Related Pandemic Articles from Brightsurf:

Areas where the next pandemic could emerge are revealed
An international team of human- and animal health experts has incorporated environmental, social and economic considerations -- including air transit centrality - to identify key areas at risk of leading to the next pandemic.

Narcissists love being pandemic 'essential workers'
There's one group of essential workers who especially enjoy being called a ''hero'' during the COVID-19 pandemic: narcissists.

COVID-19: Air quality influences the pandemic
An interdisciplinary team from the University of Geneva and the ETH Z├╝rich spin-off Meteodat investigated possible interactions between acutely elevated levels of fine particulate matter and the virulence of the coronavirus disease.

People who purchased firearms during pandemic more likely to be suicidal
People who purchase a firearm during the pandemic are more likely to be suicidal than other firearm owners, according to a Rutgers study.

Measles outbreaks likely in wake of COVID-19 pandemic
Major measles outbreaks will likely occur during 2021 as an unexpected consequence of the COVID-19 pandemic, according to a new academic article.

The COVID-19 pandemic: How US universities responded
A new George Mason University study found that the majority of university announcements occurred on the same day as the World Health Organization's pandemic declaration.

Researchers find evidence of pandemic fatigue
A new study from the USC Leonard Davis School of Gerontology shows that the behavioral responses to COVID-19 differed by age.

Excessive alcohol consumption during the COVID-19 pandemic
The full impact of COVID-19 on alcohol use is not yet known, but rates have been rising during the first few months of the pandemic.

How fear encourages physical distancing during pandemic
Despite guidelines plastered on the walls and floors of grocery and retail stores encouraging customers to maintain six-feet of physical distance during the pandemic, many do not.

COVID-19 pandemic and $16 trillion virus
This Viewpoint aggregates mortality, morbidity, mental health conditions, and direct economic losses to estimate the total cost of the pandemic in the US on the optimistic assumption that it will be substantially contained by the fall of 2021.

Read More: Pandemic News and Pandemic Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to