Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Apr 19, 2024
Open Peer Review Period: May 16, 2024 - Jul 11, 2024
Date Accepted: Jan 30, 2025
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Distributed Analytics for Research in Hospitals (DARAH): Federated Analysis with Differential Privacy in a Real-World Oncology Study
ABSTRACT
Background:
Federated analytics in healthcare allows researchers to perform statistical queries on remote data sets without access to the raw data. This method arose from the need to perform statistical analysis on larger datasets collected at multiple healthcare centers while avoiding regulatory, governance, and privacy issues that might arise if raw data were collected at a central location outside the healthcare centers. Despite some pioneering work, federated analytics is still not widely used on real-world data, and to our knowledge, no real-world study has yet combined it with other privacy-enhancing techniques such as differential privacy.federated analysis, differential privacy, real-world oncology study, non-small cell lung cancer, COVID-19federated analysis, differential privacy, real-world oncology study, non-small cell lung cancer, COVID-19
Objective:
The first objective of this study was to deploy a federated architecture in a real-world setting. The oncology study used for this deployment compared the medical healthcare management of patients with metastatic non-small cell lung cancer before and during/after the 1st wave of COVID-19. The second goal was to test differential privacy in this real-world scenario to assess its practicality and utility as a privacy enhancing technology.
Methods:
A federated architecture platform was set up in the Toulouse, Reims and Foch centers. After harmonization of the data in each center, statistical analyses were performed using DataSHIELD, a federated analysis R library and a new open source differential privacy DataSHIELD package was implemented: dsPrivacy.
Results:
50 patients were enrolled in the Toulouse and Reims centers and 49 in the Foch center. We have shown that DataSHIELD is a practical tool to efficiently conduct our study across all 3 centers without exposing data on a central node, once sufficient setup has been made to configure a secure network between hospitals. All planned aggregated results were successfully generated. We also observed that differential privacy can be implemented in practice with promising trade-offs between privacy and accuracy, and we built a library that will prove useful for future work.
Conclusions:
The federated architecture platform enabled a multicenter study to be conducted on real-world oncology data with strong privacy guarantees thanks to differential privacy.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.