Working with privacy-sensitive data today is a complex and slow process often involving long manual reviews by compliance teams. The recent development of differential privacy helped standardize what privacy protection means. As such it has the potential to unlock the automation and generalization of data analysis and ML on privacy sensitive data.
To help realize this promise, we designed and built a framework in which a data owner, responsible for data protection, can open an access to an analyst or a data scientist that is no expert in privacy, where they can have a look at safe synthetic data, write data analysis and ML jobs with common data-science tools and languages: SQL, numpy, pandas, scikit-learn, with no change in their habits, and have them compiled into differentially private jobs executed remotely on the sensitive data.
We are proud to present our work at PEPR ’22 alongside Tumult Labs, University of Massachusetts Amherst, Wikimedia Foundation, Google and Colgate University. In our talk, we will present how a description of a data science job is provided in python, how the job is then analyzed and compiled with DP to be executed on the hidden data in a fully declarative manner with lazy evaluation of each step in the program. If you are interested in the future of privacy preserving analytics and AI, come listen to us.