Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jan 15, 2020
Date Accepted: Apr 11, 2020
End-to-End Traditional Chinese Medicine Syndrome Differentiation: A Case Study of Lung Cancer
ABSTRACT
Background:
Tradition Chinese medicine (TCM) has been proved to manage advanced lung cancer efficiently and the accurate syndrome differentiation crucial to the treatment institution. The accumulation of the verified TCM treatment cases and the progress of artificial intelligent technology paved the way for intelligent TCM syndrome differentiation, which is expected to expand the benefits for more lung cancer patients.
Objective:
This work aims to establish end-to-end TCM diagnostic models to imitate lung cancer syndrome differentiation. Compared with the approaches leveraging structured TCM datasets, proposed models utilize unstructured medical records as input to take full advantage of collected practical TCM treatment cases from the lung cancer expert in a more effective way.
Methods:
We denote lung cancer TCM syndrome differentiation as multi-label text classification. First, the Bidirectional Encoder Representations from Transformers (BERT) and Conditional Random Fields (CRF) model is used for entity representation. Then, five deep learning-based text classification models are implemented to the construction medical records multi-label classifier, during which, two data augmentation strategy is adopted to cope with over-fitting issues. Finally, a model fusion approach is exploited to elevate the performance.
Results:
The F1 with augmentation of RCNN is 0. 8650, 2.41% higher than before. The Hamming Loss of RCNN with augmentation is 0.0987, 1.8% lower than the same model without augmentation. Compared with other models, Text-HAN achieved the highest F1 with 0.8676 and 0.8751. Contrast with character encoding-based representation, the MAP of word encoding-based RCNN is 10% higher. Text-CNN & Text-RNN &Text-HAN as fusion model achieved F1 of 0.8884, performance best in all models.
Conclusions:
Medical records could be utilized more productively via constructing end-to-end models, which are more feasible for TCM diagnosis assisting. With the help of entity-level representation, data augmentation, and model fusion, the deep learning-based multi-label classification approaches can better imitate the TCM syndrome differentiation process in complex cases like advanced lung cancer.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.