## Researchers Explore Differential Privacy on Pure AI

I contributed to an article titled “Researchers Explore Differential Privacy” on November 2021 edition of the Pure AI web site. See https://pureai.com/articles/2021/11/08/differential-privacy.aspx.

In a nutshell, differential privacy consists of a set of techniques designed to prevent the leakage of sensitive data. In most situations, when a dataset is queried, instead of returning an exact answer, a small amount of random noise is added to the query result. The query result is now an approximation, but an adversary cannot easily use the result to discover sensitive information.

Differential privacy is complicated and cannot be explained quickly. But loosely, one way to think about DP is that an algorithm is differentially private if you generate a data query result and can’t use the result to determine information about a particular person. For example, suppose you have an artificially tiny dataset of just five items with sensitive age information:

ID    Name    Age
-----------------
002   Baker   30
003   Chang   50
004   Dunne   20
005   Eason   36

And suppose you can query the dataset for average age of two or more data items. If the average age of people 001 – 003 is 40.0 (therefore sum of ages is 120.0) and the average age of people 001 – 004 is 35.0 (therefore sum of ages is 140.0) then the age of person 004 is the difference of the sums = 140.0 – 120.0 = 20.0 years. Sensitive data has been leaked so this system is not differentially private.

When adding noise to a query result, the noise is usually selected from the Laplace distribution which has theoretical advantages over the Gaussian (bell-shaped) distribution.

The term “differential” in differential privacy is related to the idea of query results that differ by one item. Differentially private systems allow aggregate results, such as the average age of residents in a city from census data, to be made available without revealing specific information about an individual.

Differential privacy is similar in some respects to cryptography in the sense that DP is paradoxically simple and complicated at the same time. In cryptography, to make a message secret all you have to do is scramble the message in some way. But scrambling a message so that an adversary cannot de-scramble it is astonishingly tricky. In the same way, to make a data query differentially private, all you have to do is add some random noise to the result. But the details and tradeoffs between query result accuracy and system privacy are quite tricky.

There’s a lot that can go wrong in computer security. And there’s a lot that can go wrong when taking photographs with monkeys. Left: A baby monkey climbing on arm is not a happy experience. Center: The monkey is a little bit too unfriendly. Right: The male monkey is feeling a little bit too romantic.

This entry was posted in Machine Learning. Bookmark the permalink.