Nav: Home

Altered data sets can still provide statistical integrity and preserve privacy

February 16, 2019

Synthetic networks may increase the availability of some data while still protecting individual or institutional privacy, according to a Penn State statistician.

"My key interest is in developing methodology that would enable broader sharing of confidential data in a way that can aid in scientific discovery," said Aleksandra Slavkovic, professor of statistics and associate dean for graduate education, Eberly College of Science, Penn State. "Being able to share confidential data with minimal quantifiable risk for discovery of sensitive information and still ensure statistical accuracy and integrity, is the goal."

Slavkovic has found solutions to this data privacy problem through interdisciplinary collaborations, especially with computer and social scientists. Her research focuses on various data, including network data that capture relationship information between entities such as individuals or institutions. She reported her approaches to providing synthetic networks that satisfy a notion of differential privacy today (Feb 16) during the 2019 annual meeting of the American Association for the Advancement of Science in Washington, D.C.

Differential privacy provides a mathematically provable guarantee of the level of privacy loss to individuals.

Scientists want access to data collected by others for their research, but such access could also compromise personal privacy, even after removal of so-called personally identifiable data.

"An abundance of auxiliary data is the main culprit," said Slavkovic. "With methodological and technological advances in data collection and record linkage, easier access to variety of data sources that could be linked with a dataset in hand, and funding agencies requirements to share data, the risks to data privacy are increasing. But, finding good solutions for managing privacy loss are essential for enabling sound scientific discovery."

Publicly available information from a drug trial on an HIV drug, for example, would indicate who was in the treatment group and who was in the control group. The treatment group would contain only people diagnosed with HIV and even though the data owners withheld personal particulars from that data set, some identifying information would remain. Because so much information is today available online in social media and in other datasets, it is possible to connect the dots and identify people, potentially revealing their HIV status.

"Techniques to link two data sets, say voter records and health insurance data, have greatly improved," said Slavkovic. "In one of the earliest findings, Latanya Sweeny (now at Harvard) showed that by linking these type of data, you can identify 87 percent of the people in the U.S. Census from 1990 based on their date of birth, gender and 5-digit zip code. More recently, researchers used tweets and associated Twitter metadata to show that they can identify users with 96.7 percent accuracy."

Slavkovic notes that it is not just people or institutions whose data are contained in the databases, but that people outside the database can also suffer from invasion of privacy, directly or by association. Linkages between information in a dataset and information on social media might lead to a serious privacy breech -- something like HIV status or sexual orientation could have severe repercussions if revealed.

While privacy is important, collected datasets make up an essential source of information for researchers. Currently, in some cases when the data are exceptionally sensitive, researchers must physically go to the data repositories to do their research, making research more difficult and expensive.

Slavkovic is interested in network data. Information that shows the interconnectedness of people or institutions -- the nodes -- and the connections between nodes. Her approach is to create slightly altered, mirrored network datasets with a few of the nodes moved, connections shifted or edges altered.

"The aim is to create new networks that satisfy the rigorous differential privacy requirements and at the same time capture most of the statistical features from the original network," said Slavkovic.

These synthetic datasets might be sufficient for some researchers to satisfy their research needs. For others, it would be sufficient to test their approaches and hypothesis before having to go to the data storage site. Researchers could test code, do exploratory research and perhaps basic analysis while waiting for permission to use the original data in its repository site.

"We can't satisfy demands for all statistical analysis with the same type of altered data," said Slavkovic. "Some people will need the original data, but others might go a long way with synthetic data such as synthetic networks."
-end-


Penn State

Related Hiv Articles:

The Lancet HIV: Severe anti-LGBT legislations associated with lower testing and awareness of HIV in African countries
This first systematic review to investigate HIV testing, treatment and viral suppression in men who have sex with men in Africa finds that among the most recent studies (conducted after 2011) only half of men have been tested for HIV in the past 12 months.
The Lancet HIV: Tenfold increase in number of adolescents on HIV treatment in South Africa since 2010, but many still untreated
A new study of more than 700,000 one to 19-year olds being treated for HIV infection suggests a ten-fold increase in the number of adolescents aged 15 to 19 receiving HIV treatment in South Africa, according to results published in The Lancet HIV journal.
Starting HIV treatment in ERs may be key to ending HIV spread worldwide
In a follow-up study conducted in South Africa, Johns Hopkins Medicine researchers say they have evidence that hospital emergency departments (EDs) worldwide may be key strategic settings for curbing the spread of HIV infections in hard-to-reach populations if the EDs jump-start treatment and case management as well as diagnosis of the disease.
NIH HIV experts prioritize research to achieve sustained ART-free HIV remission
Achieving sustained remission of HIV without life-long antiretroviral therapy (ART) is a top HIV research priority, according to a new commentary in JAMA by experts at the National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health.
First ever living donor HIV-to-HIV kidney transplant
For the first time, a person living with HIV has donated a kidney to a transplant recipient also living with HIV.
The Lancet HIV: PrEP implementation is associated with a rapid decline in new HIV infections
Study from Australia is the first to evaluate a population-level roll-out of pre-exposure prophylaxis (PrEP) in men who have sex with men.
Researchers date 'hibernating' HIV strains, advancing BC's leadership in HIV cure research
Researchers have developed a novel way for dating 'hibernating' HIV strains, in an advancement for HIV cure research.
HIV RNA expression inhibitors may restore immune function in HIV-infected individuals
Immune activation and inflammation persist in the majority of treated HIV-infected individuals and is associated with excess risk of mortality and morbidity.
HIV vaccine elicits antibodies in animals that neutralize dozens of HIV strains
An experimental vaccine regimen based on the structure of a vulnerable site on HIV elicited antibodies in mice, guinea pigs and monkeys that neutralize dozens of HIV strains from around the world.
State-of-the-art HIV drug could curb HIV transmission, improve survival in India
An HIV treatment regimen already widely used in North America and Europe would likely increase the life expectancy of people living with HIV in India by nearly three years and reduce the number of new HIV infections by 23 percent with minimal impact on the country's HIV/AIDS budget.
More Hiv News and Hiv Current Events

Top Science Podcasts

We have hand picked the top science podcasts of 2019.
Now Playing: TED Radio Hour

Risk
Why do we revere risk-takers, even when their actions terrify us? Why are some better at taking risks than others? This hour, TED speakers explore the alluring, dangerous, and calculated sides of risk. Guests include professional rock climber Alex Honnold, economist Mariana Mazzucato, psychology researcher Kashfia Rahman, structural engineer and bridge designer Ian Firth, and risk intelligence expert Dylan Evans.
Now Playing: Science for the People

#541 Wayfinding
These days when we want to know where we are or how to get where we want to go, most of us will pull out a smart phone with a built-in GPS and map app. Some of us old timers might still use an old school paper map from time to time. But we didn't always used to lean so heavily on maps and technology, and in some remote places of the world some people still navigate and wayfind their way without the aid of these tools... and in some cases do better without them. This week, host Rachelle Saunders...
Now Playing: Radiolab

Dolly Parton's America: Neon Moss
Today on Radiolab, we're bringing you the fourth episode of Jad's special series, Dolly Parton's America. In this episode, Jad goes back up the mountain to visit Dolly's actual Tennessee mountain home, where she tells stories about her first trips out of the holler. Back on the mountaintop, standing under the rain by the Little Pigeon River, the trip triggers memories of Jad's first visit to his father's childhood home, and opens the gateway to dizzying stories of music and migration. Support Radiolab today at Radiolab.org/donate.