JMIR Preprints #35606: Multi-Label Classification in Patient-Doctor Dialogues: Named Entity Study with Chinese RoBERTa-WWM-ext-CNN

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Multi-Label Classification in Patient-Doctor Dialogues: Named Entity Study with Chinese RoBERTa-WWM-ext-CNN

Yuanyuan Sun;
Dongping Gao;
Xifeng Shen;
Meiting Li;
Jiale Nan;
Weining Zhang

ABSTRACT

Background:

With the prevalence of Online Consultation, many dialogues information are accumulated, which in authentic language environment are of significant value to the research and development of intelligent question answering and auxiliary diagnosis.Contextual data which contains patient-doctor dialogues has been utilized in recent Natural Language Processing (NLP) studies.

Objective:

Find a more effective and simple model to achieve named entity automatic classification in Patient-Doctor dialogues with Chinese RoBERTa-WWM-ext-CNN

Methods:

In this paper, our task is named entity automatic annotation and classification in patient-doctor dialogues. We adapt downstream architecture in Chinese RoBERTa-WWM-ext, which combines text convolutional neural network (CNN). We use RoBERTa-WWM-ext to express sentence semantics as a text vector, then extract the local features of the sentence through CNN, which is our new fusion model. To verify its knowledge learning ability, we choose kNowledge IntEgration (ERNIE)、original Bidirectional Encoder Representations from Transformers (BERT) and Chinese BERT with Whole Word Masking (WWM) to do the same task, then compare these models’ results.

Results:

From the scoring results, it shows that our model outperforms the other models on this task.

Conclusions:

The fine-tuning in the downstream task of the model and the integration of multi-model can be well optimized.

Citation

Please cite as:

Sun Y, Gao D, Shen X, Li M, Nan J, Zhang W

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study

JMIR Med Inform 2022;10(4):e35606

DOI: 10.2196/35606

PMID: 35451969

PMCID: 9073616

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 10, 2021

Open Peer Review Period: Dec 10, 2021 - Feb 4, 2022

Date Accepted: Feb 25, 2022

Date Submitted to PubMed: Apr 22, 2022

(closed for review but you can still tweet)

Multi-Label Classification in Patient-Doctor Dialogues: Named Entity Study with Chinese RoBERTa-WWM-ext-CNN

ABSTRACT

Citation

Copyright