BREAKING NEWS

Sarus-AI at ELSA General Assembly

We are proud to be one of the five winners of the first ELSA Industry Call. During this program, we extended Sarus' capabilities with Private LLM Fine-Tuning.
This achievement was made possible thanks to the valuable advice from world-class AI researchers within ELSA and the ELLIS network.
Our work was presented this week at the ELSA General Assembly meeting in Windermere, UK.

What is Sarus-AI?

Sarus-AI is a novel way of working with privacy sensitive data. With Sarus-AI, you can:

  • Build a data pipeline with Sarus, relying on Differentially Private (DP) Synthetic Data, so no private data is exposed.
  • Fine-Tune a pre-trained LLM such as Mistral or Llama 3 and output a specialized LLM with DP guarantees.
  • Use your LLM for conditional generation, such as regression or classification, event when the data is loosely structured (regressors or class may be expressed in natural language).

Why Fine-Tune an LLM with Differential Privacy?

A fine-tuned LLM can be used for tasks like data generation (e.g., synthetic data) or conditional generation (e.g., data completion based on prompts). At first glance, using an LLM for regression or classification tasks might seem like overkill—and, in general, it is, unless your data is expressed in natural language or loosely structured (e.g., JSON without a strict schema).

However, when privacy constraints come into play, the situation changes. Learning from private datasets inevitably reveals some information. Differential Privacy is about limiting the amount of information revealed. Given this, it’s easy to understand that learning public knowledge from a private dataset is something we want to avoid. You don't want to risk leaking private information just to learn something that's already publicly available.

Pre-trained LLMs already contain a vast amount of public knowledge, so when you use them for regression or classification, you avoid having to:

  • Learn that the typical age range for patients is between 0 and 130 from private data.
  • Teach the model how to understand the English language from Electronic Health Records (EHRs).
  • Risk leaking private information to learn things that are already publicly known, since that knowledge is embedded in the LLM.

By using privately fine-tuned LLMs for regression or classification, you can focus on what is truly private during the learning process. This allows you to use the limited freedom afforded by strict privacy constraints in the most efficient way.

Subscribe to our newsletter

You're on the list! Thank you for signing up.
Oops! Something went wrong while submitting the form.
128 rue La Boétie
75008 Paris — France
Resources
Blog
©2023 Sarus Technologies.
All rights reserved.