Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Apr 23, 2021
Date Accepted: Aug 2, 2021
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Data Anonymization for Pervasive Healthcare: A Systematic Mapping Study
ABSTRACT
Background:
Data science offers an unparalleled opportunity to identify new insight into many aspects of human life, with recent advances in healthcare. Using data science in digital health raises significant challenges in data privacy, transparency, and trustworthiness. Recent regulations enforce the need for a clear legal basis to collect, process, and share data, for example under the General Data Protection Regulation (GDPR) and UK Data Protection Act (DPA) 2018. For healthcare providers, the legal basis of using the electronic health record (EHR) is strictly for clinical care. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. Identifiable personal and sensitive information must be sufficiently anonymized. Raw data are commonly anonymized to be used for research purposes with risk assessment for re-identification and utility. Whilst healthcare organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modelling. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable for use in different problem domains. Additionally, tools to support measuring the risk of the anonymized data regarding re-identification against its usefulness exist but it can be unclear as to their efficacy.
Objective:
In this systematic literature mapping (SLM) study, we aim to alleviate those issues by reviewing the landscape of data anonymization for digital healthcare.
Methods:
We employ the Google Scholar, Web of Science, Elsevier Scopus, and PubMed for to retrieve academic studies published in English up to June 2020. Noteworthy, grey literature is also involved to initialize the search. We focus on review questions covering five bottom-up aspects: 1) basic anonymization operations; 2) privacy models; 3) re-identification risk and usability metrics; 4) off-the-shelf anonymization tools; 5) lawful basis for EHR data anonymization.
Results:
We identified 239 eligible studies in which 60 articles are related to general background introduction, 16 papers are selected for seven basic anonymization operations, 104 studies are covered for seventy-two conventional and machine-learning-based privacy models, seven and fifteen metrics are respectively included for measuring the re-identification risk and degree of usability in 4 and 19 papers, and twenty data anonymization software tools are explored in 36 publications. In addition, we also evaluate the practical feasibility of performing anonymization on HER data with reference to its usability of medical decision-making. Furthermore, we summarize the lawful basis to deliver guidance for practical EHR anonymization.
Conclusions:
This SLM study indicates that data anonymization on EHR is theoretically achievable yet practically, requires more research efforts in practical implementations to balance privacy-preserving and usability, thus, to ensure more reliable healthcare applications.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.