Protecting confidential data with math

December 16, 2011

Statistical databases (SDBs) are collections of data that are used to gather and analyze information from a variety of sources. The data may be derived from sales transactions, customer files, voter registrations, medical records, employee rosters, product inventories, or other compilations of facts and figures.

Because database security requires multiple processes and controls, it presents huge security challenges to organizations. With the computerization of databases in healthcare, forensics, telecommunications, and other fields, ensuring this kind of security has become increasingly important.

In a paper published Thursday in the SIAM Journal on Discrete Mathematics, authors Rudolf Ahlswede and Harout Aydinian analyze a security-control model for statistical databases.

"Providing privacy and confidentiality in SDBs is not a new issue," Aydinian points out. "Privacy interests have evolved from the very first census in the United States. Recorded protests until the mid-20th century reflect constitutional issues resulting from the requirement for U.S. residents to provide sensitive personal information. Questions on census forms about diseases, mortgage values, and other items have raised many concerns."

While such databases are very helpful in aggregating data, there is a risk that confidential information about an individual's record may be deliberately compromised. "Since such data sets also contain sensitive information, such as the disease of an individual, or the salary of an employee, it is necessary to provide security against the disclosure of confidential information," says Aydinian. "Even in cases where a user has no direct access to sensitive information, sometimes confidential data about an individual can be inferred by correlating enough statistics."

Typically, statistical databases are designed to only accept queries that involve specific statistical functions (such as sum, average, count, min, max, etc.). However, the use of these queries may render databases susceptible to compromise. For instance, it may be possible to infer information about specific individuals by putting together data from a sequence of statistical queries, using prior knowledge of an individual, or through collusion among users.

An SDB is considered secure if no protected data can be inferred from available queries. "In the literature, many scenarios of compromise and inference control methods have been proposed to protect SDBs," Aydinian says. "However, to date no one security control method is capable of completely preventing compromise."

Query restriction is one of several general approaches used for security control. A "query request" retrieves a subset of data from a database that meets a set of conditions. In query restriction, the kind and amount of data that can be retrieved by such queries is limited, for example, the size of the data, or the amount of overlap between data that is returned.

In one type of query restriction method, only certain sums of individual records (called "SUM queries") that meet a minimum specified size or number, and satisfy a specified set of conditions, are available to users.

Aydinian explains with an example. "Consider a company with a large number of employees. Suppose that for each member of the company, the sex, age, rank, length of employment, salary etc. is recorded. The salaries of individual employees are confidential. Suppose that only SUM queries are allowed, i.e. the sum of the salaries of the specified people is returned. Then one might pose the query: What is the sum of salaries for males, above 50, and during the last 10 years?"

The task addressed in the paper is to provide an optimal collection of SUM queries that prevents compromise of confidential information--such as individual salaries, for instance. A natural solution is to maximize the number of available SUM queries. The authors obtain tight bounds for the maximum number of such queries that return subsets of data without compromising groups of entries.

"Future work in the query-restriction approach includes evaluation of new security-control mechanisms, which are easy to implement and guarantee absolute security," says Aydinian. "At the same time, it is desirable that these methods satisfy other criteria like richness of available queries, consistency, cost etc. It also seems promising to develop methods combining different security control mechanisms."
Source article:

On Security of Statistical Databases. Rudolf Ahlswede and Harout Aydinian, SIAM Journal on Discrete Mathematics, 25, pp 1778-1791 (Online publish date: December 15, 2011)

Society for Industrial and Applied Mathematics

Related Data Articles from Brightsurf:

Keep the data coming
A continuous data supply ensures data-intensive simulations can run at maximum speed.

Astronomers are bulging with data
For the first time, over 250 million stars in our galaxy's bulge have been surveyed in near-ultraviolet, optical, and near-infrared light, opening the door for astronomers to reexamine key questions about the Milky Way's formation and history.

Novel method for measuring spatial dependencies turns less data into more data
Researcher makes 'little data' act big through, the application of mathematical techniques normally used for time-series, to spatial processes.

Ups and downs in COVID-19 data may be caused by data reporting practices
As data accumulates on COVID-19 cases and deaths, researchers have observed patterns of peaks and valleys that repeat on a near-weekly basis.

Data centers use less energy than you think
Using the most detailed model to date of global data center energy use, researchers found that massive efficiency gains by data centers have kept energy use roughly flat over the past decade.

Storing data in music
Researchers at ETH Zurich have developed a technique for embedding data in music and transmitting it to a smartphone.

Life data economics: calling for new models to assess the value of human data
After the collapse of the blockchain bubble a number of research organisations are developing platforms to enable individual ownership of life data and establish the data valuation and pricing models.

Geoscience data group urges all scientific disciplines to make data open and accessible
Institutions, science funders, data repositories, publishers, researchers and scientific societies from all scientific disciplines must work together to ensure all scientific data are easy to find, access and use, according to a new commentary in Nature by members of the Enabling FAIR Data Steering Committee.

Democratizing data science
MIT researchers are hoping to advance the democratization of data science with a new tool for nonstatisticians that automatically generates models for analyzing raw data.

Getting the most out of atmospheric data analysis
An international team including researchers from Kanazawa University used a new approach to analyze an atmospheric data set spanning 18 years for the investigation of new-particle formation.

Read More: Data News and Data Current Events is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to