JMIR Preprints #59782: Evaluating Medical Entity Recognition in Healthcare: A Comprehensive Analysis of BERT-Based Models

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating Medical Entity Recognition in Healthcare: A Comprehensive Analysis of BERT-Based Models

Shengyu Liu;
Anran Wang;
Xiaolei Xiu;
Ming Zhong;
Sizhu Wu

ABSTRACT

Background:

Named Entity Recognition (NER) models play a pivotal role in deciphering unstructured medical texts by identifying diseases, treatments, and conditions, thereby advancing clinical decision-making and research. Machine learning innovations, especially in deep learning, have notably enhanced NER capabilities. Yet, their performance is inconsistent across medical datasets due to the complexity of medical terminology and linguistic variety. Prior studies have predominantly analyzed general NER performance, overlooking specific applications in medical scenarios and the challenges therein. Moreover, an in-depth analysis of how leading models and macro-factors, such as linguistic composition, affect NER accuracy is needed. This deficiency impedes the refinement of NER models for medical applications, which is vital for improving patient outcomes and the efficiency of healthcare services.

Objective:

This study aims to meticulously evaluate the performance of BioBERT, RoBERTa, BigBird, and DeBERTa NER models within medical text analysis, concentrating on varied medical datasets to determine how complex medical terminology and linguistic diversity affect entity recognition accuracy. It also examines the role of macro-factors, including the lexical composition of entity phrases, in influencing the efficacy of specific models. The goal is to bridge the current research gap by offering insights that facilitate refining NER models for medical use, ultimately advancing patient care and healthcare service efficiency.

Methods:

This study conducts a thorough evaluation of four prominent NER models: BioBERT, RoBERTa, BigBird, and DeBERTa. The focus is assessing prediction accuracy, training efficiency, computational resource use (CPU and GPU), etc. We utilized three diverse medical datasets-Revised JNLPBA, BC5CDR, and AnatEM-selected for their relevance to the medical field. Furthermore, the study explores the impact of significant macro-factors, like the number of words in an entity phrase, on the models’ performance. A systematic analysis of these factors’ influence on prediction accuracy across the datasets was performed, aiming to gain an in-depth understanding of the impact of different macro-factors on the prediction accuracy of the medical NER model.

Results:

The analysis shows that the BioBERT model exceeded the performance of other models in prediction accuracy across the Revised JNLPBA, BC5CDR, and AnatEM medical datasets, highlighting its superior proficiency in identifying medical entities. Nevertheless, its accuracy was not consistently superior across all entity types. Additionally, the research confirmed that macro-factors, such as the number of words in an entity phrase, markedly affect the prediction accuracy of the models.

Conclusions:

This study highlights the essential role of NER models in medical informatics, emphasizing the imperative for model optimization via precise data targeting and fine-tuning. The insights from this study will notably improve clinical decision-making and facilitate the creation of more sophisticated and effective medical NER models.

Citation

Please cite as:

Liu S, Wang A, Xiu X, Zhong M, Wu S

Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study

JMIR Med Inform 2024;12:e59782

DOI: 10.2196/59782

PMID: 39419501

PMCID: 11528166

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 23, 2024

Open Peer Review Period: May 16, 2024 - Jul 11, 2024

Date Accepted: Sep 15, 2024

(closed for review but you can still tweet)

Evaluating Medical Entity Recognition in Healthcare: A Comprehensive Analysis of BERT-Based Models

ABSTRACT

Citation

Copyright