What 26,000 books reveal when it comes to learning language

October 25, 2019

BUFFALO, N.Y. - What can reading 26,000 books tell researchers about how language environment affects language behavior? Brendan T. Johns, an assistant professor of communicative disorders and sciences in the University at Buffalo's College of Arts and Sciences, has some answers that are helping to inform questions ranging from how we use and process language to better understanding the development of Alzheimer's disease.

But let's be clear: Johns didn't read all of those books. He's an expert in computational cognitive science who has published a computational modeling study that suggests our experience and interaction with specific learning environments, like the characteristics of what we read, leads to differences in language behavior that were once attributed to differences in cognition.

"Previously in linguistics it was assumed a lot of our ability to use language was instinctual and that our environmental experience lacked the depth necessary to fully acquire the necessary skills," says Johns. "The models that we're developing today have us questioning those earlier conclusions. Environment does appear to be shaping behavior."

Johns' findings, with his co-author, Randall K. Jamieson, a professor in the University of Manitoba's Department of Psychology, appear in the journal Behavior Research Methods.

Advances in natural language processing and computational resources allow researchers like Johns and Jamieson to examine once intractable questions.

The models, called distributional models, serve as analogies to the human language learning process. The 26,000 books that support the analysis of this research come from 3,000 different authors (about 2,000 from the U.S. and roughly 500 from the U.K.) who used over 1.3 billion total words.

George Bernard Shaw is often credited with saying Britain and America are two countries separated by a common language. But the languages are not identical, and in order to establish and represent potential cultural differences, the researchers considered where each of the 26,000 books was located in both time (when the author was born) and place (where the book was published).

With that information established, the researchers analyzed data from 10 different studies involving more than 1,000 participants, using multiple psycholinguistic tasks.

"The question this paper tries to answer is, 'If we train a model with similar materials that someone in the U.K. might have read versus what someone in the U.S. might have read, will they become more like these people?'" says Johns. "We found that the environment people are embedded in seems to shape their behavior."

The culture-specific books in this study explain much of the variance in the data, according to Johns.

"It's a huge benefit to have a culture-specific corpus, and an even greater benefit to have a time-specific corpus," says Johns. "The differences we find in language environment and behavior as a function of time and place is what we call the 'selective reading hypothesis.'"

Using these machine-learning approaches demonstrates the richly informative nature of these environments, and Johns has been working toward building machine-learning frameworks to optimize education. This latest paper shows how you can take a person's language behavior and estimate the types of materials they've read.

"We want to take someone's past experience with language and develop a model of what that person knows," says Johns. "That lets us identify which information can maximize that person's learning potential."

But Johns also studies clinical populations, and his work with Alzheimer's patients has him thinking about how to apply his models to potentially help people at risk of developing the disease.

He says some people show slight memory loss without other indications of cognitive decline. These patients with mild cognitive impairment have a 10-15% chance of being diagnosed with Alzheimer's in any given year, compared to 2% of the general population over age 65.

"We're finding that people who go on to develop Alzheimer's across time are showing specific types of language loss and production where they seem to be losing long-distance semantic associations between words, as well as low-frequency words," he says. "Can we develop tasks and stimuli that will allow that group to retain their language ability for longer, or develop a more personalized assessment to understand what type of information they're losing in their cognitive system?

"This research program has the potential to inform these important questions."

University at Buffalo

Related Language Articles from Brightsurf:

Learning the language of sugars
We're told not to eat too much sugar, but in reality, all of our cells are covered in sugar molecules called glycans.

How effective are language learning apps?
Researchers from Michigan State University recently conducted a study focusing on Babbel, a popular subscription-based language learning app and e-learning platform, to see if it really worked at teaching a new language.

Chinese to rise as a global language
With the continuing rise of China as a global economic and trading power, there is no barrier to prevent Chinese from becoming a global language like English, according to Flinders University academic Dr Jeffrey Gil.

'She' goes missing from presidential language
MIT researchers have found that although a significant percentage of the American public believed the winner of the November 2016 presidential election would be a woman, people rarely used the pronoun 'she' when referring to the next president before the election.

How does language emerge?
How did the almost 6000 languages of the world come into being?

New research quantifies how much speakers' first language affects learning a new language
Linguistic research suggests that accents are strongly shaped by the speaker's first language they learned growing up.

Why the language-ready brain is so complex
In a review article published in Science, Peter Hagoort, professor of Cognitive Neuroscience at Radboud University and director of the Max Planck Institute for Psycholinguistics, argues for a new model of language, involving the interaction of multiple brain networks.

Do as i say: Translating language into movement
Researchers at Carnegie Mellon University have developed a computer model that can translate text describing physical movements directly into simple computer-generated animations, a first step toward someday generating movies directly from scripts.

Learning language
When it comes to learning a language, the left side of the brain has traditionally been considered the hub of language processing.

Learning a second alphabet for a first language
A part of the brain that maps letters to sounds can acquire a second, visually distinct alphabet for the same language, according to a study of English speakers published in eNeuro.

Read More: Language News and Language Current Events
Brightsurf.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com.