Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 1, 2020
Date Accepted: Nov 18, 2020

The final, peer-reviewed published version of this preprint can be found here:

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation

Kate R

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation

JMIR Med Inform 2021;9(1):e23104

DOI: 10.2196/23104

PMID: 33443483

PMCID: 7843202

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching

  • Rohit Kate

ABSTRACT

Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies due to linguistic and stylistic variations thus necessitating the task of normalization. In this paper, a system for clinical term normalization is presented which utilizes patterns to convert clinical terms into their normalized forms. These patterns are automatically learned from UMLS as well as from a given training corpus. The patterns are generalized sequences of edits which are derived from edit distance computation. The patterns are both character-based as well as word-based and are learned separately for different semantic types. Besides these patterns, the system also normalizes clinical terms through the subterms mentioned in them. The system was evaluated on the MCN corpus as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. The paper includes an ablation study to evaluate contributions of various components of the system. A challenging part of the task, which accounted for a loss of 5% in absolute accuracy, was disambiguation task when a clinical term could be normalized to multiple concepts. Given that the system is based on patterns, it is human-interpretable and also capable of giving insights into the common forms in which clinical terms could be found in clinical text which are different from their standardized forms.


 Citation

Please cite as:

Kate R

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation

JMIR Med Inform 2021;9(1):e23104

DOI: 10.2196/23104

PMID: 33443483

PMCID: 7843202

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.