Nav: Home

CLICS: World's largest database of cross-linguistic lexical associations

January 13, 2020

Every language has cases in which two or more concepts are expressed by the same word, such as the English word fly, which refers to both the act of flying and to the insect. By comparing patterns in these cases, which linguists call colexifications, across languages, researchers can gain insights into a wide range of issues, including human perception, language evolution, and language contact. The third installment of the CLICS database significantly increases the number of languages, concepts, and data sources available in earlier versions, allowing researchers to study colexifications on a global scale in unprecedented detail and depth.

With detailed computer-assisted workflows, CLICS facilitates the standardization of linguistic datasets and provides solutions to many of the persistent challenges in linguistic research. "While data aggregation was generally based on ad-hoc procedures in the past, our new workflows and guidelines for best practice are an important step to guarantee the reproducibility of linguistic research," says Tiago Tresoldi.

Effectiveness of CLICS demonstrated in research applications

The ability of CLICS to provide new evidence to address cutting-edge questions in psychology and cognition has already been illustrated in a recent study published in Science, which concentrated on the world-wide coding of emotion concepts. The study compared colexification networks of words for emotion concepts from a global sample of languages, and revealed that the meanings of emotions vary greatly across language families.

"In this study, CLICS was used to study differences in the lexical coding of emotion in languages around the world, but the potential of the database is not limited to emotion concepts. Many more interesting questions can be tackled in the future," says Johann-Mattis List.

New standards and workflows allow for the reproducible harvesting of global lexical data

Building on the new guidelines for standardized data formats in cross-linguistic research, which were first presented in 2018 (DOI: 10.1038/sdata.2018.205), the CLICS team was able to increase the amount of data from 300 language varieties and 1200 concepts in the original database to 3156 language varieties and 2906 concepts in the current installation. The new version also guarantees the reproducibility of the data aggregation process, conforming to best practices in research data management. "Thanks to the new standards and workflows we developed, our data is not only FAIR (findable, accessible, interoperable, and reproducible), but the process of lifting linguistic data from their original forms to our cross-linguistic standards is also much more efficient than in the past," says Robert Forkel.

The effectiveness of the workflow developed for CLICS has been tested and confirmed in various validation experiments involving a large range of scholars and students. Two different student tasks were conducted, resulting in the creation of new datasets and the progressive improvement of the existing data. Students were tasked with working through the different steps of data set creation described in the study, e.g. data extraction, data mapping (to reference catalogs), and identification of sources. "Having people from outside of the core team use and test your tools is essential and helps tremendously in fine-tuning all processes," says Christoph Rzymski.

With CLICS and its workflow being accessible to a wider audience, scholars cannot only directly contribute to the database in the future; they can also profit from the established machinery and start their own targeted collections. "The number of linguists who actively use our standards and workflows is constantly increasing. We hope that the release of this new version of CLICS will propagate them further," says Simon Greenhill.

Max Planck Institute for the Science of Human History

Related Language Articles:

Human language most likely evolved gradually
One of the most controversial hypotheses for the origin of human language faculty is the evolutionary conjecture that language arose instantaneously in humans through a single gene mutation.
'She' goes missing from presidential language
MIT researchers have found that although a significant percentage of the American public believed the winner of the November 2016 presidential election would be a woman, people rarely used the pronoun 'she' when referring to the next president before the election.
How does language emerge?
How did the almost 6000 languages of the world come into being?
New research quantifies how much speakers' first language affects learning a new language
Linguistic research suggests that accents are strongly shaped by the speaker's first language they learned growing up.
Why the language-ready brain is so complex
In a review article published in Science, Peter Hagoort, professor of Cognitive Neuroscience at Radboud University and director of the Max Planck Institute for Psycholinguistics, argues for a new model of language, involving the interaction of multiple brain networks.
Do as i say: Translating language into movement
Researchers at Carnegie Mellon University have developed a computer model that can translate text describing physical movements directly into simple computer-generated animations, a first step toward someday generating movies directly from scripts.
Learning language
When it comes to learning a language, the left side of the brain has traditionally been considered the hub of language processing.
Learning a second alphabet for a first language
A part of the brain that maps letters to sounds can acquire a second, visually distinct alphabet for the same language, according to a study of English speakers published in eNeuro.
Sign language reveals the hidden logical structure, and limitations, of spoken language
Sign languages can help reveal hidden aspects of the logical structure of spoken language, but they also highlight its limitations because speech lacks the rich iconic resources that sign language uses on top of its sophisticated grammar.
Lying in a foreign language is easier
It is not easy to tell when someone is lying.
More Language News and Language Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Teaching For Better Humans 2.0
More than test scores or good grades–what do kids need for the future? This hour, TED speakers explore how to help children grow into better humans, both during and after this time of crisis. Guests include educators Richard Culatta and Liz Kleinrock, psychologist Thomas Curran, and writer Jacqueline Woodson.
Now Playing: Science for the People

#556 The Power of Friendship
It's 2020 and times are tough. Maybe some of us are learning about social distancing the hard way. Maybe we just are all a little anxious. No matter what, we could probably use a friend. But what is a friend, exactly? And why do we need them so much? This week host Bethany Brookshire speaks with Lydia Denworth, author of the new book "Friendship: The Evolution, Biology, and Extraordinary Power of Life's Fundamental Bond". This episode is hosted by Bethany Brookshire, science writer from Science News.
Now Playing: Radiolab

One of the most consistent questions we get at the show is from parents who want to know which episodes are kid-friendly and which aren't. So today, we're releasing a separate feed, Radiolab for Kids. To kick it off, we're rerunning an all-time favorite episode: Space. In the 60's, space exploration was an American obsession. This hour, we chart the path from romance to increasing cynicism. We begin with Ann Druyan, widow of Carl Sagan, with a story about the Voyager expedition, true love, and a golden record that travels through space. And astrophysicist Neil de Grasse Tyson explains the Coepernican Principle, and just how insignificant we are. Support Radiolab today at