Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 26, 2020
Date Accepted: Nov 11, 2020
Date Submitted to PubMed: Nov 12, 2020
Proposal and Assessment of De-identification Strategy to Enhance Anonymity of Observational Medical Outcomes Partnership Common Data Model in Public Cloud Computing Environment: Study for Medical Data Anonymity
ABSTRACT
Background:
The Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) defined by the non-profit organization, Observational Health Data Sciences and Informatics (OHDSI), is gaining attention for its use in the analysis of patient-level clinical data from various medical institutions. To analyze these data in a public environment, such as a cloud system, an appropriate de-identification strategy is required.
Objective:
This study proposes a de-identification strategy, which is composed of several rules used along with the k-anonymity, l-diversity, and t-closeness privacy models. Then, the proposed strategy is evaluated in the actual CDM database.
Methods:
The CDM database used in this study was constructed by the Anam Hospital of Korea University. For analysis and evaluation, the ARX anonymizing framework was used with the k-anonymity, l-diversity, and t-closeness models.
Results:
The CDM database, constructed according to the rules established by OHDSI, exhibited a low risk of re-identification. The DRUG_EXPOSURE table exhibited the highest re-identifiable record rate in the dataset (11.3%) with a re-identification success rate of 0.03%. However, because all tables include at least one ‘highest risk’ value of 100%, suitable anonymizing techniques are needed. Because the CDM database preserves the ‘source values’ (raw data), and the combination of source values increases the risk of re-identification, this study proposes an enhanced de-identification strategy for the source values. When applying this strategy, the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models is significantly reduced, and the overall possibility of re-identification is also reduced.
Conclusions:
Thus, through de-identification via our proposed method, the privacy of the CDM database can be improved. Based on the enhanced privacy of the CDM database, clinical research involving multiple centers is expected to be encouraged.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.