Sarus and NIST release SDNist

An open-source python library to benchmark differentially private synthetic data generators

To benchmark its synthetic data model, Sarus joined forces with NIST to develop SDNist: an open source library enabling easy benchmarking of synthetic data on the NIST PSCR Differential Privacy Temporal Map Challenge.

The library is really easy to use, first install the sdnist python package:

# Optionally create a virtualenv
# and install the package from pypi.org
pip install sdnist importlib_resources

You can then use the library to evaluate a synthetic dataset:

import sdnist
# Fetch data
dataset, schema = sdnist.census()
# Synthesize data
# (replace this line with your synthetic data generator function)
synthetic = dataset.sample(n=20000)
# Evaluate your synthetic data
result = sdnist.score(dataset, synthetic, schema, challenge="census")
# Print the score
print(result.score)
# Display the results on a map
result.html()

You can also submit the generative model itself:

# You can also subclass sdnist.challenge.submission.Model
from sdnist.challenge.submission import run
from sdnist.challenge.subsample import SubsampleModel
model = SubsampleModel()
run(model, challenge="census")

And get the score for various levels of privacy loss (ε).

The results can be displayed on a map to figure out where the synthetic data model performed better.

Some examples using sdnist to evaluate some of the top performing generative models from the Differential Privacy Temporal Map Challenge have been implemented and shared on Github.

This work was presented at a AAAI-22 workshop.

The Third AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-22)

See the video, and the paper.

If you like sdnist, feel free to star it on Github.

‍

Sarus and NIST release SDNist

About the author

Nicolas Grislain

Ready?

Subscribe to our newsletter

Sarus tech

Resources

Company