Technology

Sarus packages the state of the art of privacy research for you. 

Differential Privacy

Differential Privacy is a rigorous mathematical definition of privacy. It has emerged as a leading technique to analyze a dataset in a way where one cannot determine whether a particular individual was present in the data or not. DP can be applied to any statistical analyses from simple analytics to advanced AI modeling. It relies on making the analyses slightly noisy so that the results do not allow to reveal personal information of a given individual, even when combined with auxiliary data.
Sarus implements Differential Privacy into queries that range from simple and complex SQL queries to entire data processing pipelines in python, or anything in between. This approach is based on years of applied research that has been presented at tier one conferences (PEPR 22, PSD 22, PPAI 24).
Sarus automatically optimizes privacy/utility trade-off when enforcing Differential Privacy. This way, neither data owners nor data scientists need to have privacy expertise to leverage sensitive data assets in a privacy-preserving manner.

Privacy-preserving
Synthetic Data

Sarus synthetic data generator is a multi-table and composable model based on generative AI. It preserves multivariate distributions as well as links across tables without the need of manual adjustments by the data owner.

And of course, it is fitted with differential privacy so that synthetic data can truly be considered anonymous.

Read the paper published by the Sarus Research Team to know more!

Privacy-first
Query Rewriting

Any query from a data scientist may pose privacy risks, from a simple SQL to a full data science pipeline. But in most cases, scientists only want to extract non sensitive information. How can they be sure that what they ask complies with the privacy policy set by the data owner?

Sarus solves this tension thanks to the privacy rewriter. The scientist writes queries in SQL or programs using their favorite python libraries (pandas, numpy, sklearn..). The programs are sent to Sarus by the BI connector or the SDK where the rewriter transforms them to comply with the privacy policy. Rewriting may involve transforming into a differentially-private mechanism, running against the synthetic dataset or the real source data on exception basis.

This is fully automated, even for extremely complex computation graphs. Sarus automates and enforces the privacy protection so that neither the data administrator nor the analysts have to worry about it.

Output-level Control

The application distinguishes the transformations that the scientists can perform and the output that they can retrieve. This enables scientists to work on datasets that they cannot retrieve and carry out several preprocessing tasks before running a final transformation that may be authorized.

Each time an output is requested, the application will validate the entire computational graph against the privacy policy (and possibly rewrite it) before sharing the output with the scientist.

Advancing privacy research

Sarus relies on open-source peer-reviewed primitives
and significantly contributes to Privacy Enhancing Technologies research. Check out our scientific work!

PEPR

OpenDP

NIST

Subscribe to our newsletter

You're on the list! Thank you for signing up.
Oops! Something went wrong while submitting the form.
128 rue La Boétie
75008 Paris — France
Resources
Blog
©2023 Sarus Technologies.
All rights reserved.