One of the biggest challenges corporations face to unlock the full value of their data assets is to address regulatory and security risks relating to private information. Past years witnessed the advent of Privacy Enhancing Technologies (PETs), aiming to reconcile the vast data needs for Analytics and AI projects, and privacy protection. Differential Privacy, one of the PETs, was invented in 2006 and has emerged as the best candidate to define what "anonymous" may mean, on scientific grounds. Unlike most PETs that focus on how to carry out a data processing task without revealing data during processing (“input privacy”), Differential Privacy offers protection to sensitive data by limiting the personal information that is revealed by the output of a computation, for example specific statistics or a trained machine learning model (“output privacy”). Hence, it allows to share insights about a group of people without putting at risk personal information of every single individual present in the collected data.
Differential Privacy is based on the addition of statistical noise to computation results. The noise introduces a level of uncertainty limiting how much information about one individual may be revealed. Such noise shall be sufficient to hide the effect of one single individual, but not excessive, to keep the result accurate.
Metaphorically, you can imagine a beautiful movement of a flock of birds: murmuration. If you take out or add one bird, you will not even notice a difference in the entire movement. Differential Privacy is an ideal tool for studying the movements of groups, without revealing any individual information!
Differential Privacy allows to irreversibly prevent re-identification no matter what additional information one may possess, today or tomorrow, unlike legacy data protection methods such as data masking or pseudonymization. Imagine a statistics - average salary of employees - is published every year. If you know which one person left the company this year, you can calculate their salary. This simple example illustrates how published statistics can lead to a reconstruction of a substantial part of personal information on which such statistics were calculated. To learn more, we recommend this paper about a successive reconstruction attack on the 2010 US Census:
The 2010 Census Confidentiality Protections Failed, Here's How and Why. Differential Privacy prevents a possibility of such an attack, which makes it a recognized gold standard in privacy protection.
This page is a list of useful resources about Differential Privacy.
If you have more questions on Differential Privacy and how it is implemented in Sarus - contact us!