Google Scholar renders documents not in English invisible

February 10, 2021

The visibility of scientific articles and conference papers is conditional upon being easily found in academic search engines, especially Google Scholar. To enhance this visibility, search engine optimization (SEO) has been applied in recent years to academic search engines in order to optimize documents and, thereby, ensure they are better ranked in search pages (i.e., academic search engine optimization or ASEO).

Recent research, published in Future Internet, has found out whether the language of the document is a factor involved in the sorting algorithm of search results on Google Scholar. The study authors are Cristòfol Rovira, Lluís Codina and Carlos Lopezosa, members of the Department of Communication at UPF.

"To implement this optimization we need to further our understanding of Google Scholar's relevance ranking algorithm, so that, based on this knowledge, we can highlight or improve those characteristics that academic documents already present and which are taken into account by the algorithm", says Rovira, first author of the study. To prevent fraudulent practices, Google Scholar does not explain this algorithm and, therefore, this kind of research becomes necessary.

For the study, the authors applied an inverse engineering research methodology based on statistical analysis using Spearman's correlation coefficient. Three different types of search were conducted yielding a sample of 45 searches each with 1,000 results (45,000 documents): by author, by year, and by keyword.

Quality articles with hundreds of citations are treated in a discriminatory manner

The results show that when a search is performed on Google Scholar with results in various languages, the vast majority (90%) of documents in languages other than English are systematically relegated to positions that render them totally invisible. These documents are almost always placed in positions above rank position 900, even though they are quality articles with hundreds of citations. Thus, it can be stated that Google Scholar discriminates against documents not written in English in searches with multilingual results.

A lack of awareness of this factor could be detrimental to researchers from all over the non-English-speaking world, making them believe that there is no literature in their national language when they conduct searches with multilingual results.

"This is particularly the case in the most frequent searches, that is, those conducted by year. Nevertheless, it can also occur in searches using certain keywords that are the same in languages around the world, including trademarks, chemical compounds, industrial products, acronyms, drugs, and diseases, with Covid-19 being the most recent example", the study authors reveal.

And they add "moreover, if we consider the results of this study from the perspective of ASEO, it is more than evident that until this bias is addressed, the chances of being ranked in a multilingual Google Scholar search increase remarkably if the researchers opt for publication in English".

Graph of the results of the study

The scatter plot above summarizes the research results. There are 45,000 dots, one per document. The grey dots represent documents written in English, other languages in red, and blue shows the median positions.

The graph shows how articles written in languages other than English appear above 900th position in the Google Scholar ranking. This is so even for quality documents that have hundreds of citations and are well placed in the ranking for number of citations.

The most striking cases are the red dots located in the bottom-right corner. They correspond to documents written in languages other than English that are ranked by number of citations below 100 and have a Google Scholar ranking over 900. This means that all of them receive over a thousand citations and appear in Google Scholar in the same positions as documents in English cited just a few dozen times.

Universitat Pompeu Fabra - Barcelona

Related Algorithm Articles from Brightsurf:

CCNY & partners in quantum algorithm breakthrough
Researchers led by City College of New York physicist Pouyan Ghaemi report the development of a quantum algorithm with the potential to study a class of many-electron quantums system using quantum computers.

Machine learning algorithm could provide Soldiers feedback
A new machine learning algorithm, developed with Army funding, can isolate patterns in brain signals that relate to a specific behavior and then decode it, potentially providing Soldiers with behavioral-based feedback.

New algorithm predicts likelihood of acute kidney injury
In a recent study, a new algorithm outperformed the standard method for predicting which hospitalized patients will develop acute kidney injury.

New algorithm could unleash the power of quantum computers
A new algorithm that fast forwards simulations could bring greater use ability to current and near-term quantum computers, opening the way for applications to run past strict time limits that hamper many quantum calculations.

QUT algorithm could quash Twitter abuse of women
Online abuse targeting women, including threats of harm or sexual violence, has proliferated across all social media platforms but QUT researchers have developed a sophisticated statistical model to identify misogynistic content and help drum it out of the Twittersphere.

New learning algorithm should significantly expand the possible applications of AI
The e-prop learning method developed at Graz University of Technology forms the basis for drastically more energy-efficient hardware implementations of Artificial Intelligence.

Algorithm predicts risk for PTSD after traumatic injury
With high precision, a new algorithm predicts which patients treated for traumatic injuries in the emergency department will later develop posttraumatic stress disorder.

New algorithm uses artificial intelligence to help manage type 1 diabetes
Researchers and physicians at Oregon Health & Science University have designed a method to help people with type 1 diabetes better manage their glucose levels.

A new algorithm predicts the difficulty in fighting fire
The tool completes previous studies with new variables and could improve the ability to respond to forest fires.

New algorithm predicts optimal materials among all possible compounds
Skoltech researchers have offered a solution to the problem of searching for materials with required properties among all possible combinations of chemical elements.

Read More: Algorithm News and Algorithm Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to