JMIR Preprints #19735: Measuring Semantic Textual Similarity in Clinical Text: A Study of Transformer-based Models

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Measuring Semantic Textual Similarity in Clinical Text: A Study of Transformer-based Models

Xi Yang;
Yinghan Ma;
Xing He;
Hansi Zhang;
Jiang Bian;
Yonghui Wu

ABSTRACT

Background:

Semantic textual similarity (STS) is one of the fundamental tasks in natural language processing (NLP). Many shared tasks and corpora for STS have been organized in the general English domain; yet, such resources are limited in the biomedical domain. In 2019, the n2c2 challenge developed a comprehensive clinical STS dataset and called for a community effort to solicit state-of-the-art solutions for clinical STS.

Objective:

Based on our participation in this challenge, this study presents our transformer-based clinical STS models developed during this challenge as well as new models we explored after the challenge. This project is part of the 2019 N2C2/OHNLP shared task on clinical STS.

Methods:

In this study, we explored three transformer-based models, including BERT, XLNet, and RoBERTa for clinical STS. We examined transformer models pre-trained using both general English text and clinical text. We also explored using a general English STS dataset as a supplementary corpus in addition to the clinical training set developed in this challenge. Furthermore, we also investigated various ensemble methods to combine different transformer models.

Results:

Our best submission based on the XLNet model achieved the third-best performance (Pearson correlation of 0.8864) in this challenge. After challenge, we further explored other transformer models and improved the performance to 0.9065 using a RoBERTa model, which outperformed the best-performed system developed in this challenge (correlation of 0.9010).

Conclusions:

This study demonstrated the efficiency of utilizing transformer-based models to measure semantic similarity for clinical text. Our models can be applied to clinical applications such as clinical text de-duplication and summarization.