JMIR Preprints #9965: Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

Current Preprint Settings

(as selected by the authors)

1. Allow access to the preprint PDF upon submission to:

(a) Open peer-review purposes
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

2. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

3. When a final paper is published in a JMIR journal, display the preprint as follows:

(a) Allow download
(b) Show abstract only
(c) Do not display anything

4. If the paper is rejected from JMIR journals, display the preprint to:

(a) Logged-in users only
(b) Anybody, anytime
(c) Nobody

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

Yu Zhang;
Xuwen Wang;
Zhen Hou;
Jiao Li

ABSTRACT

Background:

Electronic health record (EHR) is an important data resource for clinical studies and applications. Physicians or clinicians describe patientsâ€™ disorders or treatment procedures using free texts in EHR. The narrative information plays an important role in patient treatment and clinical research. However, it is challenging to make machines understand the clinical narratives.

Objective:

This study aimed to automatically identify Chinese clinical entities from free texts in EHR and make machines semantically understand diagnosis, test, body part, symptom, treatment, and so on.

Methods:

The dataset we used for this study is the benchmark dataset with human annotated Chinese EHR, released by China Conference on Knowledge Graph and Semantic Computing (CCKS) 2017 clinical named entity recognition (CNER) challenge task. Overall, 2 machine learning models, conditional random fields (CRF) method and bidirectional long short-term memory (LSTM)-CRF, were applied to recognize clinical entities from Chinese EHR data. For training the CRF-based model, we selected features as bag of Chinese characters, part-of-speech tags, character types, and the position of characters. For the bidirectional LSTM-CRFâ€“based model, character embeddings and segmentation information were used as features. In addition, we also employed a dictionary-based approach as the baseline for performance evaluation purpose. Precision, recall, and the harmonic average of the precision and recall (F1 score) were used to evaluate the performance of the methods.

Results:

Experiments on the test set showed that our methods were able to automatically identify types of Chinese clinical entities such as diagnosis, test, symptom body part, and treatment in 1-round running. The identification overall performance of CRF and bidirectional LSTM-CRF achieved precision of 0.9203 and 0.9112, recall of 0.8709 and 0.8974, and F1 score of 0.8949 and 0.9043, respectively. The results also indicated that our methods performed well on recognizing each type of clinical entities, in which the â€œsymptomâ€ type achieved the best with F1 score over 0.96. Moreover, as the number of features increased, F1 score of the CRF model increases from of 0.8547 to 0.8949.

Conclusions:

In this study, we employed 2 computational methods to simultaneously identify types of Chinese clinical entities from free texts in EHRs. Via training, it can effectively identify various types of clinical entities (eg, symptom and treatment) with high accuracy. The deep learning model, bidirectional LSTM-CRF, can achieve better performance than the CRF model with little feature engineering. This study contributed to translating human-readable health information into machine-readable one.

Citation

Please cite as:

Zhang Y, Wang X, Hou Z, Li J

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

JMIR Med Inform 2018;6(4):e50

DOI: 10.2196/medinform.9965

PMID: 30559093

PMCID: 6315256

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 30, 2018

Open Peer Review Period: Jan 30, 2018 - Jun 28, 2018

Date Accepted: Oct 27, 2018

(closed for review but you can still tweet)

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

ABSTRACT

Citation

Copyright

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 30, 2018

Open Peer Review Period: Jan 30, 2018 - Jun 28, 2018

Date Accepted: Oct 27, 2018

(closed for review but you can still tweet)

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

ABSTRACT

Citation

Per the author's request the PDF is not available.

Copyright