Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 25, 2021
Date Accepted: Apr 19, 2021

The final, peer-reviewed published version of this preprint can be found here:

Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant Supervision

Jia Q, Zhang D, Xu H, Xie Y

Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant Supervision

JMIR Med Inform 2021;9(6):e28219

DOI: 10.2196/28219

PMID: 34125076

PMCID: 8240806

Traditional Chinese medicine Named Entity Extraction: Span-level method with Distantly Supervised

  • Qi Jia; 
  • Dezheng Zhang; 
  • Haifeng Xu; 
  • Yonghong Xie

ABSTRACT

Background:

Traditional Chinese medicine (TCM) clinical records contain the symptoms of patients, diagnoses, and subsequent treatment of doctors. These records are important resources for research and analysis of TCM diagnosis knowledge. However, most of TCM clinical records are unstructured text. An automatically extracting medical entities method from TCM clinical records is indispensable.

Objective:

Training a medical entity extracting model need a large number of annotated corpus. The cost of annotated corpus is very high and lack of gold standard datasets for supervised learning methods. Therefore, we utilize distantly supervised name entity recognition (NER) to response the challenge.

Methods:

We propose a span-level distantly supervised named entity recognition (NER) approach to extract TCM medical entity. It utilizes the pre-trained language model, a simple multi-layer neural network as classifier to detect and classify entity. We also design a negative sampling strategy for the span-level model. The strategy randomly selects negative samples in every epoch and filter the possible false negative sample periodically. It reduces the bad influence from the false negative samples.

Results:

We compare with other baseline methods to illustrate the effectiveness of ours on a gold standard dataset. The F1-score of our method is 77.34 and remarkably outperform the other baselines.

Conclusions:

We develop a distantly supervised NER approach to extract medical entity from TCM clinical record. We estimate our approach on TCM clinical records dataset. The experiments result indicate that our approach achieves a better performance than other baselines.


 Citation

Please cite as:

Jia Q, Zhang D, Xu H, Xie Y

Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant Supervision

JMIR Med Inform 2021;9(6):e28219

DOI: 10.2196/28219

PMID: 34125076

PMCID: 8240806

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.