Machine learning uncovers missing info about ethnicity in population health data: Study

November 18, 2020

Machine learning can be used to fill a significant gap in Canadian public health data related to ethnicity and Aboriginal status, according to research published today in PLOS ONE by a University of Alberta research epidemiologist.

Kai On Wong, senior data scientist at the Real World Evidence unit of the Northern Alberta Clinical Trials and Research Centre (NACTRC), said ethnicity and Aboriginal status are recognized as key social determinants of health but are often not reported in large databases that track acute and chronic diseases such as asthma, influenza, cancer, cardiovascular diseases, diabetes, disability and mental illness.

"If a database currently lacks ethnicity information, we will not be able to tell whether certain ethnic groups have higher rates of disease or worse clinical outcomes," Wong said, "This is a way to unlock that missing dimension from existing data sources, which may help us understand, monitor and address issues such as social inequities and racism in Canada."

Wong created a machine learning framework to analyze the names and geographic locations of 4.8 million people surveyed in the 1901 census, examining features such as spelling and phonetics to predict whether they belonged to one of 13 ethnic groups.

"Different ethnic and linguistic groups have different manifestations of features such as how the name sounds, how many letters in the name, how many vowels and unique letter sequences, and so on," said Wong, who created the program and shared it as a public GitHub repository as part of his doctoral thesis at the U of A's School of Public Health.

"Machine learning is like having a team of agents who are given vast amounts of information. They are instructed to detect and retain useful patterns to solve practical problems such as predicting the ethnicity from the readily available information," he said.

Wong said the program performed best at identifying individuals of Chinese, French, Japanese and Russian heritage based on name only, while the accuracy was improved for the Aboriginal classification when locations were also included.

Both the World Health Organization and the Government of Canada recognize ethnicity and Indigeneity as determinants of health, along with other factors such as income, education and gender. Wong first became interested in inequities in health care that affect Indigenous groups when he served as acting territorial epidemiologist for the Government of the Northwest Territories.

Wong said while American health records tend to include questions about ethnicity, this information is not collected consistently in Canadian databases ranging from hospital discharge records to cancer registries.

By using machine learning to uncover this missing information, researchers and policy-makers will be able to learn more from existing records rather than having to carry out new population-level surveys, which are expensive and time-consuming.

"A future step forward will be to validate this research with real-world applications using health evidence augmented with ethnicity generated by the machine learning framework and comparing it with existing literature, particularly on health and social inequities," Wong said.

Wong recommends first updating the ethnicity prediction tool using more recent census information and testing its accuracy when applied to various health records.

"It is unrealistic to expect machine learning predictions to be 100 per cent accurate at all times," Wong said. "The goal is to make predictions that are accurate and generalizable enough to discern underlying patterns in a meaningful way for a particular problem or application."
Wong's research was funded by the Canadian Institutes of Health Research Frederick Banting and Charles Best Doctoral Research Award, the University of Alberta President's Doctoral Prize of Distinction and Queen Elizabeth II Doctoral Scholarship, and the Alberta Machine Intelligence Institute (Amii).

University of Alberta Faculty of Medicine & Dentistry

Related Public Health Articles from Brightsurf:

COVID-19 and the decolonization of Indigenous public health
Indigenous self-determination, leadership and knowledge have helped protect Indigenous communities in Canada during the coronavirus disease 2019 (COVID-19) pandemic, and these principles should be incorporated into public health in future, argue the authors of a commentary in CMAJ (Canadian Medical Association Journal)

Public health consequences of policing homelessness
In a new study examining homelessness, researchers find that policy such a lifestyle has massive public health implications, making sleeping on the street even MORE unhealthy.

Electronic health information exchange improves public health disease reporting
Disease tracking is an important area of focus for health departments in the midst of the COVID-19 pandemic.

Pandemic likely to cause long-term health problems, Yale School of Public Health finds
The coronavirus pandemic's life-altering effects are likely to result in lasting physical and mental health consequences for many people--particularly those from vulnerable populations--a new study led by the Yale School of Public Health finds.

The Lancet Public Health: US modelling study estimates impact of school closures for COVID-19 on US health-care workforce and associated mortality
US policymakers considering physical distancing measures to slow the spread of COVID-19 face a difficult trade-off between closing schools to reduce transmission and new cases, and potential health-care worker absenteeism due to additional childcare needs that could ultimately increase mortality from COVID-19, according to new modelling research published in The Lancet Public Health journal.

The Lancet Public Health: Access to identification documents reflecting gender identity may improve trans mental health
Results from a survey of over 20,000 American trans adults suggest that having access to identification documents which reflect their identified gender helps to improve their mental health and may reduce suicidal thoughts, according to a study published in The Lancet Public Health journal.

The Lancet Public Health: Study estimates mental health impact of welfare reform, Universal Credit, in Great Britain
The 2013 Universal Credit welfare reform appears to have led to an increase in the prevalence of psychological distress among unemployed recipients, according to a nationally representative study following more than 52,000 working-age individuals from England, Wales, and Scotland over nine years between 2009-2018, published as part of an issue of The Lancet Public Health journal on income and health.

BU researchers: Pornography is not a 'public health crisis'
Researchers from the Boston University School of Public Health (BUSPH) have written an editorial in the American Journal of Public Health special February issue arguing against the claim that pornography is a public health crisis, and explaining why such a claim actually endangers the health of the public.

The Lancet Public Health: Ageism linked to poorer health in older people in England
Ageism may be linked with poorer health in older people in England, according to an observational study of over 7,500 people aged over 50 published in The Lancet Public Health journal.

Study: Public transportation use linked to better public health
Promoting robust public transportation systems may come with a bonus for public health -- lower obesity rates.

Read More: Public Health News and Public Health Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to