Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 16, 2025
Open Peer Review Period: Jan 15, 2025 - Mar 12, 2025
Date Accepted: Jun 23, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Privacy-by-Design Approach to Generate Two Virtual Clinical Trials for Multiple Sclerosis and Release Them as Open Datasets: Evaluation Study

Demuth S, Rousseau O, Faddeenkov I, Paris J, De Sèze J, Baciotti B, Payet M, Guillaudeux M, Barreteau AF, Laplaud D, Edan G, Gourraud PA

Privacy-by-Design Approach to Generate Two Virtual Clinical Trials for Multiple Sclerosis and Release Them as Open Datasets: Evaluation Study

J Med Internet Res 2025;27:e71297

DOI: 10.2196/71297

PMID: 41032725

PMCID: 12488035

Privacy-By-Design Generation of Two Virtual Clinical Trials in Multiple Sclerosis and their Release as Open Datasets: Evaluation Study

  • Stanislas Demuth; 
  • Olivia Rousseau; 
  • Igor Faddeenkov; 
  • Julien Paris; 
  • Jérôme De Sèze; 
  • Béatrice Baciotti; 
  • Marianne Payet; 
  • Morgan Guillaudeux; 
  • Alban-Félix Barreteau; 
  • David Laplaud; 
  • Gilles Edan; 
  • Pierre-Antoine Gourraud

ABSTRACT

Background:

Sharing information provided by individual patient data is restricted by regulatory frameworks due to privacy concerns. Generative artificial intelligence could generate shareable virtual patient populations, as proxies of sensitive reference datasets. Explicit demonstration of privacy is demanded.

Objective:

Here, we determined whether a privacy-by-design technique called “avatars” can generate synthetic datasets replicating all reported information from randomized clinical trials (RCTs).

Methods:

We generated 2160 synthetic datasets from two phase 3 RCTs in multiple sclerosis (NCT00213135 and NCT00906399) with different configurations to select one synthetic dataset with optimal privacy and utility for each. Several privacy metrics were computed, including protection against distance-based membership inference attacks. We assessed utility by comparing variable distributions and checking that all endpoints reported in the publications had the same effect directions, were within the reported 95% confidence intervals, and had the same statistical significance.

Results:

Protection against membership inference attacks was the hardest privacy metric to optimize (85.0% and 93.2%), but the technique yielded robust privacy and replication of the primary endpoints. Utility was uneven across the variables and endpoints, such that information about some endpoints could not be captured. With optimized generation configurations, we could select one dataset from each RCT replicating all efficacy endpoints of the placebo and commercial treatment arms with satisfying privacy.

Conclusions:

Generating synthetic RCT datasets replicating primary and secondary efficacy endpoints is possible while achieving a satisfying and explicit level of privacy. To show the potential to unlock health data sharing, we released both placebo arms as open datasets.


 Citation

Please cite as:

Demuth S, Rousseau O, Faddeenkov I, Paris J, De Sèze J, Baciotti B, Payet M, Guillaudeux M, Barreteau AF, Laplaud D, Edan G, Gourraud PA

Privacy-by-Design Approach to Generate Two Virtual Clinical Trials for Multiple Sclerosis and Release Them as Open Datasets: Evaluation Study

J Med Internet Res 2025;27:e71297

DOI: 10.2196/71297

PMID: 41032725

PMCID: 12488035

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.