We are thrilled to introduce the latest version of our synthetic data generation model. This new model now preserves the multivariate distributions between all columns of a table. This makes synthetic data an even more useful tool for analysts and data scientists to gain insight into data they cannot directly access.
It is extremely useful to prepare analyses, design machine learning pipelines, debug or test code. It is the natural first step before carrying out the analyses on the source data, which remains fully protected all along:
This new deep-learning model was designed by the Sarus research team, based on Transformers and implemented in JAX, a state-of-the-art and powerful Python library that allows for high performance. If you want to learn more, we published a research paper on the topic.
Of course, this model integrates Differential Privacy to ensure that the generated synthetic data protects all personal information stored in the source data (more info on how to train a model in JAX with differential privacy).
This new model certainly helps analysts and data scientists work with sensitive data that they cannot directly access, opening up many opportunities for privacy-safe analysis use cases in healthcare, finance, energy, HR, and more. It's useful everywhere companies or public authorities want to leverage data to innovate, but the data must be protected for security, compliance, and ethics!
Want to see what the high fidelity synthetic data looks like? Reach out!