Nav: Home

Altered data sets can still provide statistical integrity and preserve privacy

February 16, 2019

Synthetic networks may increase the availability of some data while still protecting individual or institutional privacy, according to a Penn State statistician.

"My key interest is in developing methodology that would enable broader sharing of confidential data in a way that can aid in scientific discovery," said Aleksandra Slavkovic, professor of statistics and associate dean for graduate education, Eberly College of Science, Penn State. "Being able to share confidential data with minimal quantifiable risk for discovery of sensitive information and still ensure statistical accuracy and integrity, is the goal."

Slavkovic has found solutions to this data privacy problem through interdisciplinary collaborations, especially with computer and social scientists. Her research focuses on various data, including network data that capture relationship information between entities such as individuals or institutions. She reported her approaches to providing synthetic networks that satisfy a notion of differential privacy today (Feb 16) during the 2019 annual meeting of the American Association for the Advancement of Science in Washington, D.C.

Differential privacy provides a mathematically provable guarantee of the level of privacy loss to individuals.

Scientists want access to data collected by others for their research, but such access could also compromise personal privacy, even after removal of so-called personally identifiable data.

"An abundance of auxiliary data is the main culprit," said Slavkovic. "With methodological and technological advances in data collection and record linkage, easier access to variety of data sources that could be linked with a dataset in hand, and funding agencies requirements to share data, the risks to data privacy are increasing. But, finding good solutions for managing privacy loss are essential for enabling sound scientific discovery."

Publicly available information from a drug trial on an HIV drug, for example, would indicate who was in the treatment group and who was in the control group. The treatment group would contain only people diagnosed with HIV and even though the data owners withheld personal particulars from that data set, some identifying information would remain. Because so much information is today available online in social media and in other datasets, it is possible to connect the dots and identify people, potentially revealing their HIV status.

"Techniques to link two data sets, say voter records and health insurance data, have greatly improved," said Slavkovic. "In one of the earliest findings, Latanya Sweeny (now at Harvard) showed that by linking these type of data, you can identify 87 percent of the people in the U.S. Census from 1990 based on their date of birth, gender and 5-digit zip code. More recently, researchers used tweets and associated Twitter metadata to show that they can identify users with 96.7 percent accuracy."

Slavkovic notes that it is not just people or institutions whose data are contained in the databases, but that people outside the database can also suffer from invasion of privacy, directly or by association. Linkages between information in a dataset and information on social media might lead to a serious privacy breech -- something like HIV status or sexual orientation could have severe repercussions if revealed.

While privacy is important, collected datasets make up an essential source of information for researchers. Currently, in some cases when the data are exceptionally sensitive, researchers must physically go to the data repositories to do their research, making research more difficult and expensive.

Slavkovic is interested in network data. Information that shows the interconnectedness of people or institutions -- the nodes -- and the connections between nodes. Her approach is to create slightly altered, mirrored network datasets with a few of the nodes moved, connections shifted or edges altered.

"The aim is to create new networks that satisfy the rigorous differential privacy requirements and at the same time capture most of the statistical features from the original network," said Slavkovic.

These synthetic datasets might be sufficient for some researchers to satisfy their research needs. For others, it would be sufficient to test their approaches and hypothesis before having to go to the data storage site. Researchers could test code, do exploratory research and perhaps basic analysis while waiting for permission to use the original data in its repository site.

"We can't satisfy demands for all statistical analysis with the same type of altered data," said Slavkovic. "Some people will need the original data, but others might go a long way with synthetic data such as synthetic networks."

Penn State

Related Hiv Articles:

The Lancet HIV: Study suggests a second patient has been cured of HIV
A study of the second HIV patient to undergo successful stem cell transplantation from donors with a HIV-resistant gene, finds that there was no active viral infection in the patient's blood 30 months after they stopped anti-retroviral therapy, according to a case report published in The Lancet HIV journal and presented at CROI (Conference on Retroviruses and Opportunistic Infections).
Children with HIV score below HIV-negative peers in cognitive, motor function tests
Children who acquired HIV in utero or during birth or breastfeeding did not perform as well as their peers who do not have HIV on tests measuring cognitive ability, motor function and attention, according to a report published online today in Clinical Infectious Diseases.
Efforts to end the HIV epidemic must not ignore people already living with HIV
Efforts to prevent new HIV transmissions in the US must be accompanied by addressing HIV-associated comorbidities to improve the health of people already living with HIV, NIH experts assert in the third of a series of JAMA commentaries.
The Lancet HIV: Severe anti-LGBT legislations associated with lower testing and awareness of HIV in African countries
This first systematic review to investigate HIV testing, treatment and viral suppression in men who have sex with men in Africa finds that among the most recent studies (conducted after 2011) only half of men have been tested for HIV in the past 12 months.
The Lancet HIV: Tenfold increase in number of adolescents on HIV treatment in South Africa since 2010, but many still untreated
A new study of more than 700,000 one to 19-year olds being treated for HIV infection suggests a ten-fold increase in the number of adolescents aged 15 to 19 receiving HIV treatment in South Africa, according to results published in The Lancet HIV journal.
Starting HIV treatment in ERs may be key to ending HIV spread worldwide
In a follow-up study conducted in South Africa, Johns Hopkins Medicine researchers say they have evidence that hospital emergency departments (EDs) worldwide may be key strategic settings for curbing the spread of HIV infections in hard-to-reach populations if the EDs jump-start treatment and case management as well as diagnosis of the disease.
NIH HIV experts prioritize research to achieve sustained ART-free HIV remission
Achieving sustained remission of HIV without life-long antiretroviral therapy (ART) is a top HIV research priority, according to a new commentary in JAMA by experts at the National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health.
First ever living donor HIV-to-HIV kidney transplant
For the first time, a person living with HIV has donated a kidney to a transplant recipient also living with HIV.
The Lancet HIV: PrEP implementation is associated with a rapid decline in new HIV infections
Study from Australia is the first to evaluate a population-level roll-out of pre-exposure prophylaxis (PrEP) in men who have sex with men.
Researchers date 'hibernating' HIV strains, advancing BC's leadership in HIV cure research
Researchers have developed a novel way for dating 'hibernating' HIV strains, in an advancement for HIV cure research.
More HIV News and HIV Current Events

Trending Science News

Current Coronavirus (COVID-19) News

Top Science Podcasts

We have hand picked the top science podcasts of 2020.
Now Playing: TED Radio Hour

Our Relationship With Water
We need water to live. But with rising seas and so many lacking clean water – water is in crisis and so are we. This hour, TED speakers explore ideas around restoring our relationship with water. Guests on the show include legal scholar Kelsey Leonard, artist LaToya Ruby Frazier, and community organizer Colette Pichon Battle.
Now Playing: Science for the People

#568 Poker Face Psychology
Anyone who's seen pop culture depictions of poker might think statistics and math is the only way to get ahead. But no, there's psychology too. Author Maria Konnikova took her Ph.D. in psychology to the poker table, and turned out to be good. So good, she went pro in poker, and learned all about her own biases on the way. We're talking about her new book "The Biggest Bluff: How I Learned to Pay Attention, Master Myself, and Win".
Now Playing: Radiolab

First things first: our very own Latif Nasser has an exciting new show on Netflix. He talks to Jad about the hidden forces of the world that connect us all. Then, with an eye on the upcoming election, we take a look back: at two pieces from More Perfect Season 3 about Constitutional amendments that determine who gets to vote. Former Radiolab producer Julia Longoria takes us to Washington, D.C. The capital is at the heart of our democracy, but it's not a state, and it wasn't until the 23rd Amendment that its people got the right to vote for president. But that still left DC without full representation in Congress; D.C. sends a "non-voting delegate" to the House. Julia profiles that delegate, Congresswoman Eleanor Holmes Norton, and her unique approach to fighting for power in a virtually powerless role. Second, Radiolab producer Sarah Qari looks at a current fight to lower the US voting age to 16 that harkens back to the fight for the 26th Amendment in the 1960s. Eighteen-year-olds at the time argued that if they were old enough to be drafted to fight in the War, they were old enough to have a voice in our democracy. But what about today, when even younger Americans are finding themselves at the center of national political debates? Does it mean we should lower the voting age even further? This episode was reported and produced by Julia Longoria and Sarah Qari. Check out Latif Nasser's new Netflix show Connected here. Support Radiolab today at