JMIR Preprints #23104: Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching

Rohit Kate

ABSTRACT

Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies due to linguistic and stylistic variations thus necessitating the task of normalization. In this paper, a system for clinical term normalization is presented which utilizes patterns to convert clinical terms into their normalized forms. These patterns are automatically learned from UMLS as well as from a given training corpus. The patterns are generalized sequences of edits which are derived from edit distance computation. The patterns are both character-based as well as word-based and are learned separately for different semantic types. Besides these patterns, the system also normalizes clinical terms through the subterms mentioned in them. The system was evaluated on the MCN corpus as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. The paper includes an ablation study to evaluate contributions of various components of the system. A challenging part of the task, which accounted for a loss of 5% in absolute accuracy, was disambiguation task when a clinical term could be normalized to multiple concepts. Given that the system is based on patterns, it is human-interpretable and also capable of giving insights into the common forms in which clinical terms could be found in clinical text which are different from their standardized forms.

Citation

Please cite as:

Kate R

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation

JMIR Med Inform 2021;9(1):e23104

DOI: 10.2196/23104

PMID: 33443483

PMCID: 7843202

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 1, 2020

Date Accepted: Nov 18, 2020

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching

ABSTRACT

Citation

Copyright