Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Privacy-By-Design Generation of Two Virtual Clinical Trials in Multiple Sclerosis and their Release as Open Datasets: Evaluation Study
Stanislas Demuth;
Olivia Rousseau;
Igor Faddeenkov;
Julien Paris;
Jérôme De Sèze;
Béatrice Baciotti;
Marianne Payet;
Morgan Guillaudeux;
Alban-Félix Barreteau;
David Laplaud;
Gilles Edan;
Pierre-Antoine Gourraud
ABSTRACT
Background:
Sharing information provided by individual patient data is restricted by regulatory frameworks due to privacy concerns. Generative artificial intelligence could generate shareable virtual patient populations, as proxies of sensitive reference datasets. Explicit demonstration of privacy is demanded.
Objective:
Here, we determined whether a privacy-by-design technique called “avatars” can generate synthetic datasets replicating all reported information from randomized clinical trials (RCTs).
Methods:
We generated 2160 synthetic datasets from two phase 3 RCTs in multiple sclerosis (NCT00213135 and NCT00906399) with different configurations to select one synthetic dataset with optimal privacy and utility for each. Several privacy metrics were computed, including protection against distance-based membership inference attacks. We assessed utility by comparing variable distributions and checking that all endpoints reported in the publications had the same effect directions, were within the reported 95% confidence intervals, and had the same statistical significance.
Results:
Protection against membership inference attacks was the hardest privacy metric to optimize (85.0% and 93.2%), but the technique yielded robust privacy and replication of the primary endpoints. Utility was uneven across the variables and endpoints, such that information about some endpoints could not be captured. With optimized generation configurations, we could select one dataset from each RCT replicating all efficacy endpoints of the placebo and commercial treatment arms with satisfying privacy.
Conclusions:
Generating synthetic RCT datasets replicating primary and secondary efficacy endpoints is possible while achieving a satisfying and explicit level of privacy. To show the potential to unlock health data sharing, we released both placebo arms as open datasets.
Citation
Please cite as:
Demuth S, Rousseau O, Faddeenkov I, Paris J, De Sèze J, Baciotti B, Payet M, Guillaudeux M, Barreteau AF, Laplaud D, Edan G, Gourraud PA
Privacy-by-Design Approach to Generate Two Virtual Clinical Trials for Multiple Sclerosis and Release Them as Open Datasets: Evaluation Study