Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 23, 2021
Date Accepted: Aug 2, 2021

The final, peer-reviewed published version of this preprint can be found here:

Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study

Al Moubayed N, Zuo Z, Watson M, Hall R, Kennelly C, Budgen D

Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study

JMIR Med Inform 2021;9(10):e29871

DOI: 10.2196/29871

PMID: 34652278

PMCID: 8556642

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Data Anonymization for Pervasive Healthcare: A Systematic Mapping Study

  • Noura Al Moubayed; 
  • Zheming Zuo; 
  • Matthew Watson; 
  • Robert Hall; 
  • Chris Kennelly; 
  • David Budgen

ABSTRACT

Background:

Data science offers an unparalleled opportunity to identify new insight into many aspects of human life, with recent advances in healthcare. Using data science in digital health raises significant challenges in data privacy, transparency, and trustworthiness. Recent regulations enforce the need for a clear legal basis to collect, process, and share data, for example under the General Data Protection Regulation (GDPR) and UK Data Protection Act (DPA) 2018. For healthcare providers, the legal basis of using the electronic health record (EHR) is strictly for clinical care. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. Identifiable personal and sensitive information must be sufficiently anonymized. Raw data are commonly anonymized to be used for research purposes with risk assessment for re-identification and utility. Whilst healthcare organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modelling. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable for use in different problem domains. Additionally, tools to support measuring the risk of the anonymized data regarding re-identification against its usefulness exist but it can be unclear as to their efficacy.

Objective:

In this systematic literature mapping (SLM) study, we aim to alleviate those issues by reviewing the landscape of data anonymization for digital healthcare.

Methods:

We employ the Google Scholar, Web of Science, Elsevier Scopus, and PubMed for to retrieve academic studies published in English up to June 2020. Noteworthy, grey literature is also involved to initialize the search. We focus on review questions covering five bottom-up aspects: 1) basic anonymization operations; 2) privacy models; 3) re-identification risk and usability metrics; 4) off-the-shelf anonymization tools; 5) lawful basis for EHR data anonymization.

Results:

We identified 239 eligible studies in which 60 articles are related to general background introduction, 16 papers are selected for seven basic anonymization operations, 104 studies are covered for seventy-two conventional and machine-learning-based privacy models, seven and fifteen metrics are respectively included for measuring the re-identification risk and degree of usability in 4 and 19 papers, and twenty data anonymization software tools are explored in 36 publications. In addition, we also evaluate the practical feasibility of performing anonymization on HER data with reference to its usability of medical decision-making. Furthermore, we summarize the lawful basis to deliver guidance for practical EHR anonymization.

Conclusions:

This SLM study indicates that data anonymization on EHR is theoretically achievable yet practically, requires more research efforts in practical implementations to balance privacy-preserving and usability, thus, to ensure more reliable healthcare applications.


 Citation

Please cite as:

Al Moubayed N, Zuo Z, Watson M, Hall R, Kennelly C, Budgen D

Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study

JMIR Med Inform 2021;9(10):e29871

DOI: 10.2196/29871

PMID: 34652278

PMCID: 8556642

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.