Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Dec 31, 2019
Date Accepted: Mar 13, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
TNorm: A Pattern Learning Method for Temporal Expression Classification and Normalization from Chinese Narrative Clinical Texts
ABSTRACT
Background:
Temporal information frequently exists in the representation of the disease progress, prescription, medication, the surgery progress, or discharge summary in narrative clinical text. The accurate extraction and normalization of temporal expressions can positively boost the analysis and understanding of narrative clinical texts so as to promote the clinical research and practice.
Objective:
The study is to propose a novel approach for extracting and normalizing temporal expressions from Chinese narrative clinical text.
Methods:
TNorm, a rule-based and pattern learning-based approach, has been developed for automatic temporal expression extraction and normalization from unstructured Chinese clinical text data. TNorm consists of three stages: extraction, classification, and normalization. It applies a set of heuristic rules and automatically-generated patterns for temporal expressions identification and extraction of clinical texts. Then, it collects the features of extracted temporal expressions for temporal type prediction and classification by using machine learning algorithms. Finally, the features are combined with the rule-based and a pattern learning-based approach to normalize the extracted temporal expressions.
Results:
The evaluation dataset is a set of narrative clinical texts in Chinese containing 1,459 discharge summaries of a domestic Grade-A Class-three hospital. The results present that TNorm, combined with temporal expressions extraction and temporal types prediction, achieves a precision of 0.8491, a recall of 0.8328, and a F1 score of 0.8409 in temporal expressions normalization.
Conclusions:
This study illustrates an automatic approach TNorm that extracts and normalizes temporal expression from Chinese narrative clinical texts. TNorm was evaluated on the basis of discharge summaries and demonstrated its effectiveness on temporal expression normalization with experiment results.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.