Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Bioinformatics and Biotechnology

Date Submitted: Nov 6, 2023
Date Accepted: Mar 29, 2024

The final, peer-reviewed published version of this preprint can be found here:

Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review

Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M

Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review

JMIR Bioinform Biotech 2024;5:e54332

DOI: 10.2196/54332

PMID: 38935957

PMCID: 11165293

Assessing privacy vulnerabilities in genetic datasets: A scoping review

  • Mara Thomas; 
  • Nuria Mackes; 
  • Asad Preuss-Dodhy; 
  • Thomas Wieland; 
  • Markus Bundschus

ABSTRACT

Background:

Genetic data is widely considered inherently identifiable. However, genetic datasets come in many shapes and sizes and the feasibility of privacy attacks depends on their specific content. Assessing the re-identification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation.

Objective:

We aimed to get a comprehensive understanding of privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic datasets.

Methods:

We conducted a two step search, in which we first identified 21 reviews published 2017-2023 on the topic of genomic privacy, and then evaluated all references cited in the reviews (N=1645) to identify N=42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks, as well as the effort and resources needed for their implementation and their probability of success.

Results:

From our literature review, we derived nine (non mutually exclusive) features of genetic datasets that are both inherent to any genetic data and informative about privacy risk: Single nucleotide polymorphism (SNP) content, short tandem repeat (STR) content, biological modality/type of data, analysis method, data format/level of processing, germline vs. somatic variation content, structural variation content, single nucleotide variant (SNV) content and aggregated sample measure content.

Conclusions:

Based on our literature review, the evaluation of these nine features covers the great majority of privacy critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.


 Citation

Please cite as:

Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M

Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review

JMIR Bioinform Biotech 2024;5:e54332

DOI: 10.2196/54332

PMID: 38935957

PMCID: 11165293

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.