Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Aug 1, 2020
Date Accepted: Nov 18, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching
ABSTRACT
Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies due to linguistic and stylistic variations thus necessitating the task of normalization. In this paper, a system for clinical term normalization is presented which utilizes patterns to convert clinical terms into their normalized forms. These patterns are automatically learned from UMLS as well as from a given training corpus. The patterns are generalized sequences of edits which are derived from edit distance computation. The patterns are both character-based as well as word-based and are learned separately for different semantic types. Besides these patterns, the system also normalizes clinical terms through the subterms mentioned in them. The system was evaluated on the MCN corpus as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. The paper includes an ablation study to evaluate contributions of various components of the system. A challenging part of the task, which accounted for a loss of 5% in absolute accuracy, was disambiguation task when a clinical term could be normalized to multiple concepts. Given that the system is based on patterns, it is human-interpretable and also capable of giving insights into the common forms in which clinical terms could be found in clinical text which are different from their standardized forms.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.