Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Dec 30, 2019
Date Accepted: Mar 11, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Re-examination on Rule Based Method in De-identification of Electronic Health Records
ABSTRACT
Background:
De-identification of clinical records is a critical step before data can be made publicly available to the research community. This task is usually treated as a sequence labeling issue and ensemble learning is one of the best performing solutions. The significance of the classical rule-based method remains an open issue as a candidate learner.
Objective:
The main objective of this study is to investigate whether a rule-based learner is useful in a hybrid de-identification system and bring suggestions on how to build and integrate a rule-based learner.
Methods:
We choose a data-driven rule-learner named TBED and integrate into the best performed hybrid system in this task.
Results:
On the popular i2b2 de-identification data set, experiments show that TBED can generate high performance with the rules learned. And integrating the rule-based model into an ensemble framework achieves the best performance reported in the community, which reached an F1 score of 96.76%.
Conclusions:
We not only prove the contribution of rule-based method to the current ensemble learning approach for the de-identification of clinical records, but also validate such a rule system could be automatically learned by TBED mechanism, avoiding the high cost and low-reliability manual rule development method. In particular, we boost the ensemble model with rules to the top performance of the de-identification of clinical records.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.