Qrlew: the library that turns any SQL query into a privacy-safe one

A new open source product to easily rewrite a wide variety of SQL queries into differentially private ones.

Differential Privacy
SQL
Rust
Open Source
Analytics
Andi Cuko

We are excited to announce that Sarus released Qrlew, a powerful open source library that turns any SQL query into a privacy-safe one.

A lot of statistical analysis is done in SQL today. In many cases, the row-level data is privacy sensitive but the outcome of the analysis (e.g: aggregates) does not have to be. Running those SQL analytics in a privacy safe way unlocks plenty of opportunities to extract value from sensitive datasets. Differential privacy (DP) is the standard for privacy protection, but despite various open-source tools for DP analysis, we lack a universal SQL-to-SQL query rewriter that makes analyses differentially private across any SQL database. That’s where Qrlew comes in, filling this gap!

Qrlew simplifies the process of converting a wide range of SQL queries into privacy-safe ones. It thus enables data owners to develop applications for data analysis with strong privacy protection, without requiring a deep understanding of privacy theory. As a data analyst, you can unlock the value of sensitive datasets by having each SQL query rewritten by qrlew.

The main features are:

  • Ease of use: standard SQL queries can be easily transformed into DP ones as shown below.
  • Extensive SQL grammar support: Qrlew’s parser supports many dialects (e.g. BigQuery, MSSQL, PostgreSQL, etc.).
  • Extensive SQL query coverage: it supports an extensive range of SQL queries, including JOINs, Common Table Expressions (CTEs), and nested queries.
  • Privacy-optimized: Maintains accurate bounds and ranges at each step, minimizing the required noise to achieve differential privacy.
  • Speed: it is developed in Rust to leverage its efficiency, memory and type safety. Python bindings are also provided in pyqrlew for python applications.
  • Versatility beyond privacy: Parses and compiles SQL queries into an intermediate relation suitable for various rewriting tasks, including acting as an SQL dialect translator to convert between dialects (e.g., PostgreSQL to BigQuery and vice versa)

Why open source

When it comes to privacy and security, it’s wise not to take anyone’s word for it. We open-sourced Qrlew for transparency so that users and the privacy community can access and scrutinize the source code to understand how the software works and what security measures are in place. Moreover, Qrlew’s white paper is peer reviewed by top researchers in the field and it was accepted by and presented in one of the most important privacy-specialized workshops, PPAI 2024 (an AAAI workshop).

How it works

Let’s assume that you want to run SQL queries on some private data and want to make sure that results will not disclose anything about an individual. Even if you wrote the query, it may not be possible to know whether there is a privacy leakage risk in the output (what if there are buckets of one person in the result? What if there is public information that enables isolating a user?). And if the query has been submitted by an external user, it’s even more dangerous!

Qrlew can be used as a layer so that the inbound queries can be sanitized. Here we show with python how easy it is to do so.

Let’s start by installing pyqrlew.

Let’s Import the needed libraries and functions

For the purpose of this example we use a Postgres database containing a census_schema and census_table with sensitive data about individuals such as first_name, last_name, age, gender, occupation, city and state. We are using sqlalchemy engine to interact with it.

As a simple analysis we would like to know the number of rows for each gender in the dataset. We could run such query using panda’s read_sql function as in the following:

This result is unsafe since individuals’ information can be reidentified (e.g. via a reconstruction attack). Rewriting the query into a privacy-safe one with qrlew is straightforward:

Notice that to rewrite the query we need some inputs such as the privacy_unit which defines what we want to protect (each row in this case) and a privacy budget epsilon_delta, a parameter to control for the amount of protection we want to provide. As we can see from the results, they contain some noise which is proportional to the privacy budget used. The privacy consumption for our analysis, dp_event, is compatible with Google’s dp accounting library, a popular library for tracking differential privacy budgets.

Find out more about technical details in the Qrlew documentation. What has been shown here is a very simple example but there are several more complex ones here. Join us in the discord channel, we would love to hear from you for any question or feedback. Give Qrlew a star on github if you like the project.

About the author

Andi Cuko

Data Scientist R&D @ Sarus

Ready?

Ready to unlock the value of your data? We can set you up in no time.
main.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Shell

Subscribe to our newsletter

You're on the list! Thank you for signing up.
Oops! Something went wrong while submitting the form.
128 rue La Boétie
75008 Paris — France
Resources
Blog
©2023 Sarus Technologies.
All rights reserved.