Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Feb 25, 2021
Date Accepted: Apr 19, 2021
Traditional Chinese medicine Named Entity Extraction: Span-level method with Distantly Supervised
ABSTRACT
Background:
Traditional Chinese medicine (TCM) clinical records contain the symptoms of patients, diagnoses, and subsequent treatment of doctors. These records are important resources for research and analysis of TCM diagnosis knowledge. However, most of TCM clinical records are unstructured text. An automatically extracting medical entities method from TCM clinical records is indispensable.
Objective:
Training a medical entity extracting model need a large number of annotated corpus. The cost of annotated corpus is very high and lack of gold standard datasets for supervised learning methods. Therefore, we utilize distantly supervised name entity recognition (NER) to response the challenge.
Methods:
We propose a span-level distantly supervised named entity recognition (NER) approach to extract TCM medical entity. It utilizes the pre-trained language model, a simple multi-layer neural network as classifier to detect and classify entity. We also design a negative sampling strategy for the span-level model. The strategy randomly selects negative samples in every epoch and filter the possible false negative sample periodically. It reduces the bad influence from the false negative samples.
Results:
We compare with other baseline methods to illustrate the effectiveness of ours on a gold standard dataset. The F1-score of our method is 77.34 and remarkably outperform the other baselines.
Conclusions:
We develop a distantly supervised NER approach to extract medical entity from TCM clinical record. We estimate our approach on TCM clinical records dataset. The experiments result indicate that our approach achieves a better performance than other baselines.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.