In this video series, we explore how Sarus empowers data practitioners to conduct powerful, privacy-first analysis across various data landscapes. From initial setup to advanced processing, each video dives into essential steps and insights. Here’s a breakdown of what you’ll learn:
Getting started with Sarus installation
The series kicks off with a step-by-step guide on installing Sarus, whether on a single VM using Docker or across a Kubernetes cluster. You’ll discover:
- Installation flexibility: deploy Sarus on a standalone VM or scale it with Kubernetes, making it adaptable to different infrastructure needs.
- Configuration essentials: learn how to set up environment variables, admin credentials, and unpackage the configuration files.
- Monitoring your setup: use Kubernetes tools to monitor pod statuses and ensure everything runs smoothly.
This foundational tutorial ensures you’re set up to leverage Sarus’s full capabilities.
Performing privacy-compliant data science with the Sarus SDK
Next we delve into the Sarus SDK demonstrating how to perform data science tasks remotely without accessing private data directly. Highlights include:
- Privacy-preserving processing: conduct data analysis and model training without exposing sensitive data, thanks to Sarus’s secure remote processing.
- Synthetic data for testing: practice preprocessing and model training with synthetic data, ensuring workflows are privacy-compliant before real-world application.
- Seamless model deployment: once trained, models can be retrieved without ever viewing the underlying data, maintaining security at every step.
This video empowers data scientists to work with real data securely and effectively.
Sarus demo - Data science basics
Connecting Metabase to Sarus for data visualization
In the third video, we integrate Metabase with Sarus using the Spark SQL connector, creating a powerful data visualization platform. Key takeaways include:
- Streamlined configuration: set up the connection for a seamless data flow between Sarus and Metabase.
- Intuitive data exploration: leverage Metabase’s interface to explore transaction datasets and analyze purchase trends.
- Cost-effective BI solution: by using open-source Metabase, organizations can conduct in-depth data exploration without heavy financial investment.
This integration opens up new possibilities for interactive data analysis, enhancing reporting capabilities for your team.
Sarus demo - Connecting a BI tool (Metabase)
Ensuring privacy with output-level controls and differential privacy
This video dives deep into the privacy mechanisms Sarus provides, focusing on output-level controls and differential privacy:
- Privacy unit tracking: track data transformations to maintain privacy across complex computations.
- Differential privacy protections: Ensure individual data points remain anonymous, meeting compliance standards.
These privacy controls highlight Sarus’s commitment to security, especially when handling sensitive information.
Sarus Deep Dive - Understanding output-level controls
Leveraging Sarus LLM for privacy-preserving analysis of unstructured data
Finally, we showcase Sarus LLM, which enables privacy-protected analysis of unstructured data, such as medical records. Key insights include:
- Synthetic data generation: generate realistic datasets that analysts can work with safely.
- Fine-tuning with privacy: train generative models on synthetic data to gain insights without breaching privacy.
- Structured insights from complex data: use Sarus LLM to transform JSON data into actionable insights, ideal for sensitive fields like healthcare.
This advanced tutorial demonstrates how Sarus LLM makes it possible to derive valuable insights from unstructured data while prioritizing data protection.
Sarus demo - Use Sarus LLM to deal with unstructured data
The goal of this series is to offer a comprehensive, self-serve guide to setting up, integrating and analyzing data with Sarus. Whether you’re setting up for the first time or conducting advanced analysis on sensitive data Sarus provides the tools and flexibility needed for privacy-first data science.
If you want to start testing Sarus, need additional information or are just curious, book a slot here.