Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 10, 2021
Open Peer Review Period: Dec 10, 2021 - Feb 4, 2022
Date Accepted: Feb 25, 2022
Date Submitted to PubMed: Apr 22, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study

Sun Y, Gao D, Shen X, Li M, Nan J, Zhang W

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study

JMIR Med Inform 2022;10(4):e35606

DOI: 10.2196/35606

PMID: 35451969

PMCID: 9073616

Multi-Label Classification in Patient-Doctor Dialogues: Named Entity Study with Chinese RoBERTa-WWM-ext-CNN

  • Yuanyuan Sun; 
  • Dongping Gao; 
  • Xifeng Shen; 
  • Meiting Li; 
  • Jiale Nan; 
  • Weining Zhang

ABSTRACT

Background:

With the prevalence of Online Consultation, many dialogues information are accumulated, which in authentic language environment are of significant value to the research and development of intelligent question answering and auxiliary diagnosis.Contextual data which contains patient-doctor dialogues has been utilized in recent Natural Language Processing (NLP) studies.

Objective:

Find a more effective and simple model to achieve named entity automatic classification in Patient-Doctor dialogues with Chinese RoBERTa-WWM-ext-CNN

Methods:

In this paper, our task is named entity automatic annotation and classification in patient-doctor dialogues. We adapt downstream architecture in Chinese RoBERTa-WWM-ext, which combines text convolutional neural network (CNN). We use RoBERTa-WWM-ext to express sentence semantics as a text vector, then extract the local features of the sentence through CNN, which is our new fusion model. To verify its knowledge learning ability, we choose kNowledge IntEgration (ERNIE)、original Bidirectional Encoder Representations from Transformers (BERT) and Chinese BERT with Whole Word Masking (WWM) to do the same task, then compare these models’ results.

Results:

From the scoring results, it shows that our model outperforms the other models on this task.

Conclusions:

The fine-tuning in the downstream task of the model and the integration of multi-model can be well optimized.


 Citation

Please cite as:

Sun Y, Gao D, Shen X, Li M, Nan J, Zhang W

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study

JMIR Med Inform 2022;10(4):e35606

DOI: 10.2196/35606

PMID: 35451969

PMCID: 9073616

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.