Accepted for/Published in: JMIR Formative Research
Date Submitted: Jan 30, 2025
Date Accepted: Jul 23, 2025
Fine-tuning Clinical Language Models to Identify Adverse Drug Events in Clinical Text: Machine Learning Approach
ABSTRACT
Background:
Medications are essential for health care but can cause adverse drug events (ADEs), which are harmful and sometimes fatal. Detecting ADEs is a challenging task because they are often not documented in the structured data of electronic health records (EHRs) or explicitly written in clinical notes.
Objective:
This study aims to fine-tune the pre-trained clinical language model, SweDeClin-BERT, for medical named entity recognition (NER) and relation extraction (RE) tasks, and to implement an integrated NER-RE approach to more effectively identify ADEs in clinical notes from clinical units in Sweden. The performance of this approach will be compared to our previous machine learning method, which utilized conditional random fields (CRFs) and Random Forest (RF).
Methods:
We fine-tuned the SweDeClin-BERT model for the NER and RE tasks and implemented an integrated NER-RE pipeline to extract entities and relationships from clinical notes. The models were evaluated using 400 clinical notes from clinical units in Sweden. The NER-RE pipeline was then applied to classify the clinical notes as containing or not containing ADEs. Additionally, we conducted an error analysis to better understand the model’s behavior and to identify potential areas for improvement.
Results:
The fine-tuned SweDeClin-BERT model achieved an F1-score of 0.845 for NER and 0.81 for RE task, outperforming the baseline models (CRFs for NER and Random Forests for RE). In particular, the RE task showed a 53% improvement in macro-average F1-score compared to the baseline. The integrated NER-RE pipeline achieved an overall F1-score of 0.81 in relax mode.
Conclusions:
Utilizing a domain-specific language model like SweDeClin-BERT for detecting ADEs in clinical notes demonstrates improved classification performance (0.77 in strict and 0.81 in relaxed mode) compared to conventional machine learning models like CRFs and RF. However, the proposed fine-tuned ADE model requires further refinement and evaluatation on annotated clinical notes from another hospital to evaluate the model’s generalizability. Clinical Trial: This research has been approved by the Regional Ethical Review Board (Etikprövningsnämnden), permission number 2012/834-31/5 and permission number 2023-06920-01.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.