Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 30, 2023
Open Peer Review Period: May 29, 2023 - Jul 24, 2023
Date Accepted: Feb 13, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

The Costs of Anonymization: Case Study Using Clinical Data

Pilgram L, Meurers T, Malin B, GCKD Investigators , Schaeffner E, Eckardt KU, Prasser F

The Costs of Anonymization: Case Study Using Clinical Data

J Med Internet Res 2024;26:e49445

DOI: 10.2196/49445

PMID: 38657232

PMCID: 11079766

The Costs of Anonymization: A Case Study Using Clinical Data

  • Lisa Pilgram; 
  • Thierry Meurers; 
  • Bradley Malin; 
  • GCKD Investigators; 
  • Elke Schaeffner; 
  • Kai-Uwe Eckardt; 
  • Fabian Prasser

ABSTRACT

Background:

Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns can be addressed through the application of privacy-enhancing technologies, such as anonymization, whereby data is altered so that it is no longer reasonably related to a person. Yet such alterations have the potential to influence the dataset’s statistical properties, hence, there is a privacy-utility trade-off that must be considered.

Objective:

The goal of this study is to comprehensively evaluate the privacy-utility trade-off of anonymized data in a real-world application using data and scientific results from the German Chronic Kidney Disease (GCKD) study.

Methods:

The GCKD dataset extract for this study consists of 5,217 records and 70 variables. We followed a two-step procedure to determine variables with re-identification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. We then transformed the data via generalization and suppression, and varied the anonymization process using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, we applied general-purpose metrics representing data granularity and entropy, as well the reproducibility of a previously published analysis. Reproducibility was assessed by measuring the overlap of the 95% confidence interval (CI) lengths between anonymized and original results. The 95% CI overlap was assessed at the individual estimate-level and compiled into table- and dataset-level by averaging.

Results:

We observed a higher utility in terms of the 95% CI overlap, than according to general-purpose metrics. For example, granularity varied between 68.2% and 87.6% and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. At the individual estimate-level, a non-overlapping 95% CI was detected six times across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic configuration in terms of replicating scientific results at the same level of privacy.

Conclusions:

The benefits of use case-specific anonymization with preserved utility in the GCKD application indicate that anonymization can be highly context-specific. Anonymization processes that are tailored to specific anticipated use cases may, more generally, be a good tool to overcome the privacy-utility trade-off and can result in data from which reliable evidence is more likely to be generated.


 Citation

Please cite as:

Pilgram L, Meurers T, Malin B, GCKD Investigators , Schaeffner E, Eckardt KU, Prasser F

The Costs of Anonymization: Case Study Using Clinical Data

J Med Internet Res 2024;26:e49445

DOI: 10.2196/49445

PMID: 38657232

PMCID: 11079766

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.