Clean rooms have emerged as the best way to combine data from multiple parties in privacy-sensitive contexts. The processing happens in an environment that guarantees that the source data is not exposed to the other party or the data vendor.
But there is a catch, they leave it to the data owners to assess whether some confidential data may leak into the output of the processing. This pre-approval is done manually, which raises three significant problems:
- Understanding privacy leakage risk is hard. Naive approaches (PII removal, masking, aggregation) have all shown their weaknesses with high profile leakage cases. Not everyone is familiar with recent privacy research and concepts like Differential Privacy!
- It does not scale. It is probably ok if there is one simple computation. But what if you want to try multiple user matching strategies? Adjust preprocessing tune machine learning models with different parameters?
- The processing cannot be kept secret. The analyst may want to keep secret what they are working on. Whether it is their area of focus or the parameters of their machine learning, this may include sensitive information.
The missing component of traditional clean rooms is a solution that can automate the burden of validating each processing.
Enter Sarus’s use of Azure Confidential Clean Rooms, announced today in preview at Microsoft Ignite in Chicago. Azure Confidential Clean Rooms is designed for organizations that require secure multi-party data collaboration.
By having Sarus serve as a proxy layer between the data analyst and the clean room, input parties can fully automate the validation of processing workloads with strong privacy guarantees instead of intuition. This brings scale and strong privacy in one go!
How it works
Azure Confidential Clean Rooms uses the latest confidential computing technology to provide both software and hardware security to a clean room. Only code that has been cryptographically attested by all input parties is allowed to retrieve the corresponding data and process it.
To allow for dynamic validation of processing jobs, all we have to do is to validate the deployment of the Sarus app within the clean room, alongside all the access rights and parameters. Once it is running in the Azure Confidential Clean Rooms, it can accept new data processing requests at run-time, validate them, and return compliant results without any manual intervention.
The critical part is the deployment script of Sarus. It defines all output privacy policies in a way that cannot be modified later. No administrator will be able to punch a hole to extract more data than they should.
Once the clean room is provisioned with Sarus, authorized data practitioners are able to dynamically send any computational jobs. Each of them will be analyzed by Sarus and validated against the privacy policy, possibly falling back to differentially private evaluations or synthetic data (see our video explaining privacy policies).
Now the clean room can be queried in real time with any number of data processing pipelines, each time, results are automatically made privacy-safe before being returned.
Use case: Pooling financial transactions to tackle human trafficking
Sarus is joining forces with EY to revolutionize how to combine transactions data to address financial crime. Combining data from multiple banks undeniably improves the ability to go after sophisticated criminals. It also raises enormous compliance and security challenges.
This project calls for the most robust security environment to justify bringing such hyper sensitive data sets under one roof, hence the choice of Azure Confidential Clean Rooms.
But tracking financial crime is not as easy as counting overlapping customers between a retailer and an advertiser. One will want to test out many different matching patterns, explore and tune a large variety of rules-based or machine learning-based models. There will be hundreds if not thousands of different computation tasks to execute, none of which are trivially safe. This is only conceivable if we can automate the validation dynamically using Sarus.
We set up Azure Confidential Clean Rooms so that the researcher can receive only strictly anonymous (read differentially-private) outputs but a special data sink managed by the Financial Intelligence Authority can receive suspicious transactions. The researchers can try any number of iterations and validate them (only receiving counts on the number of suspicious activity caught or statistics related to them). Once they are confident with the detection strategy, they push the suspicious transactions to the regulator’s data sink where they appear in clear (of course inaccessible to the researchers themselves).
Sarus is able to discriminate different types of outputs in real-time: transaction-level outputs can be sent to the regulator, only anonymous outputs can be returned to the researchers, realizing the full promise of confidential clean rooms.