JMIR Preprints #82997: Improving Retrieval Augmented Generation for Healthcare by Fine-tuning Clinical Embedding Models: Development and Evaluation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Improving Retrieval Augmented Generation for Healthcare by Fine-tuning Clinical Embedding Models: Development and Evaluation Study

Kamyar Arzideh;
Henning Schäfer;
Ahmad Idrissi-Yaghir;
Cynthia Sabrina Schmidt;
Bahadir Eryilmaz;
Mikel Bahn;
Amin T. Turki;
Olivia Barbara Pollok;
Eva Maria Hartmann;
Philipp Winnekens;
Katarzyna Borys;
Johannes Haubold;
Felix Nensa;
René Hosch

ABSTRACT

Background:

Embedding models can be integrated into Retrieval Augmented Generation systems to retrieve and search for unstructured data. These models are trained on publicly available English data, limiting their effectiveness in non-English healthcare settings. More importantly, the models are not trained on real-world clinical data, leading to inaccurate results when integrated into Retrieval Augmented Generation systems for healthcare use cases.

Objective:

This retrospective study addresses this gap by developing embedding models specifically trained on real-world clinical documents for medical information retrieval.

Methods:

Embedding models were fine-tuned using eleven million question-answer pairs generated from 400,000 clinical documents from a large German hospital, including radiology reports, discharge letters, pathology reports, and operation notes. Furthermore, all datasets were translated into English and pseudonymized to publish these models for other healthcare institutions. A Large Language Model generated medically relevant questions for each document section, creating training data aiming to reflect real-world clinical scenarios. Evaluation was performed in two scenarios: information retrieval and Retrieval Augmented Generation.

Results:

The fine-tuned models demonstrated superior performance on real-world German and translated English evaluation datasets, surpassing the state-of-the-art multilingual-e5-large, bge-m5, and gte-multilingual-base models in both evaluation scenarios. For the information retrieval evaluation, the fine-tuned model achieved a mAP@100 of 0.268 compared to the next best model, multilingual-e5-large, which reached a mAP@100 of 0.135. For the Retrieval Augmented Generation evaluation, the fine-tuned model showed a BERTScore F1-score of 0.769 compared to 0.756.

Conclusions:

By using a real-world dataset consisting of reports from different medical specialties and incorporating a Large Language Model to generate questions based on these reports, a large training dataset was created and used to fine-tune an embedding model. This model surpassed the performance of state-of-the-art models and holds promise for improving Retrieval Augmented Generation in the healthcare domain.

Citation

Please cite as:

Arzideh K, Schäfer H, Idrissi-Yaghir A, Schmidt CS, Eryilmaz B, Bahn M, Turki AT, Pollok OB, Hartmann EM, Winnekens P, Borys K, Haubold J, Nensa F, Hosch R

Improving Retrieval Augmented Generation for Health Care by Fine-Tuning Clinical Embedding Models: Development and Evaluation Study

J Med Internet Res 2026;28:e82997

DOI: 10.2196/82997

PMID: 41880603

PMCID: 13016438

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 26, 2025

Date Accepted: Jan 29, 2026

Improving Retrieval Augmented Generation for Healthcare by Fine-tuning Clinical Embedding Models: Development and Evaluation Study

ABSTRACT

Citation

Copyright