Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jan 30, 2025
Date Accepted: Jul 23, 2025

The final, peer-reviewed published version of this preprint can be found here:

Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study

Kopacheva E, Henriksson A, Dalianis H, Hammar T, Lincke A

Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study

JMIR Form Res 2025;9:e71949

DOI: 10.2196/71949

PMID: 40934508

PMCID: 12425423

Fine-tuning Clinical Language Models to Identify Adverse Drug Events in Clinical Text: Machine Learning Approach

  • Elizaveta Kopacheva; 
  • Aron Henriksson; 
  • Hercules Dalianis; 
  • Tora Hammar; 
  • Alisa Lincke

ABSTRACT

Background:

Medications are essential for health care but can cause adverse drug events (ADEs), which are harmful and sometimes fatal. Detecting ADEs is a challenging task because they are often not documented in the structured data of electronic health records (EHRs) or explicitly written in clinical notes.

Objective:

This study aims to fine-tune the pre-trained clinical language model, SweDeClin-BERT, for medical named entity recognition (NER) and relation extraction (RE) tasks, and to implement an integrated NER-RE approach to more effectively identify ADEs in clinical notes from clinical units in Sweden. The performance of this approach will be compared to our previous machine learning method, which utilized conditional random fields (CRFs) and Random Forest (RF).

Methods:

We fine-tuned the SweDeClin-BERT model for the NER and RE tasks and implemented an integrated NER-RE pipeline to extract entities and relationships from clinical notes. The models were evaluated using 400 clinical notes from clinical units in Sweden. The NER-RE pipeline was then applied to classify the clinical notes as containing or not containing ADEs. Additionally, we conducted an error analysis to better understand the model’s behavior and to identify potential areas for improvement.

Results:

The fine-tuned SweDeClin-BERT model achieved an F1-score of 0.845 for NER and 0.81 for RE task, outperforming the baseline models (CRFs for NER and Random Forests for RE). In particular, the RE task showed a 53% improvement in macro-average F1-score compared to the baseline. The integrated NER-RE pipeline achieved an overall F1-score of 0.81 in relax mode.

Conclusions:

Utilizing a domain-specific language model like SweDeClin-BERT for detecting ADEs in clinical notes demonstrates improved classification performance (0.77 in strict and 0.81 in relaxed mode) compared to conventional machine learning models like CRFs and RF. However, the proposed fine-tuned ADE model requires further refinement and evaluatation on annotated clinical notes from another hospital to evaluate the model’s generalizability. Clinical Trial: This research has been approved by the Regional Ethical Review Board (Etikprövningsnämnden), permission number 2012/834-31/5 and permission number 2023-06920-01.


 Citation

Please cite as:

Kopacheva E, Henriksson A, Dalianis H, Hammar T, Lincke A

Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study

JMIR Form Res 2025;9:e71949

DOI: 10.2196/71949

PMID: 40934508

PMCID: 12425423

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.