Sarus is now compatible with big data sources stored in data warehouses like BigQuery, Redshift, Synapse, or Hive. With this new integration, all SQL queries are analyzed by Sarus and rewritten to comply with privacy policies before being sent to the data warehouse engine for execution. Besides when a source is big data, the synthetic data sample is scaled down to avoid unnecessary cost. This is the perfect solution for BI analyses on sensitive data of any size.
And this is not all! If you want to use in-memory python libraries like pandas or scikit-learn, you can run a big data SQL job to do the extraction and then work with your favorite data science libraries using Sarus Private Learning SDK, just as usual. Of course, in this case both the SQL extract and the machine learning code run behind the Sarus proxy and all final results are produced with privacy guarantees.
This way, size is never a limitation for using Sarus to do analytics and data science in a privacy-safe way. A new step towards manipulating any data asset with full security and compliance!
Coming next: support of additional big data libraries and tools like spark.
Want to try this new integration in just a few minutes? Reach out!