Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Dec 10, 2021
Open Peer Review Period: Dec 10, 2021 - Feb 4, 2022
Date Accepted: Feb 25, 2022
Date Submitted to PubMed: Apr 22, 2022
(closed for review but you can still tweet)
Multi-Label Classification in Patient-Doctor Dialogues: Named Entity Study with Chinese RoBERTa-WWM-ext-CNN
ABSTRACT
Background:
With the prevalence of Online Consultation, many dialogues information are accumulated, which in authentic language environment are of significant value to the research and development of intelligent question answering and auxiliary diagnosis.Contextual data which contains patient-doctor dialogues has been utilized in recent Natural Language Processing (NLP) studies.
Objective:
Find a more effective and simple model to achieve named entity automatic classification in Patient-Doctor dialogues with Chinese RoBERTa-WWM-ext-CNN
Methods:
In this paper, our task is named entity automatic annotation and classification in patient-doctor dialogues. We adapt downstream architecture in Chinese RoBERTa-WWM-ext, which combines text convolutional neural network (CNN). We use RoBERTa-WWM-ext to express sentence semantics as a text vector, then extract the local features of the sentence through CNN, which is our new fusion model. To verify its knowledge learning ability, we choose kNowledge IntEgration (ERNIE)、original Bidirectional Encoder Representations from Transformers (BERT) and Chinese BERT with Whole Word Masking (WWM) to do the same task, then compare these models’ results.
Results:
From the scoring results, it shows that our model outperforms the other models on this task.
Conclusions:
The fine-tuning in the downstream task of the model and the integration of multi-model can be well optimized.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.