JMIR Preprints #82545: Clinical Note Generation from Doctor-Patient Conversations Using Parameter-Efficient Fine-Tuning Large Language Models: Comparative Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Clinical Note Generation from Doctor-Patient Conversations Using Parameter-Efficient Fine-Tuning Large Language Models: Comparative Study

Saib Ahmed;
Farig Yousuf Sadeque

ABSTRACT

Background:

Clinical note documentation is a vital yet time-intensive aspect of healthcare. While advancements in natural language processing (NLP) have transformed many domains, generating accurate summaries of doctor-patient conversations remains underexplored due to the limited availability of open-source datasets. Large Language Models (LLMs), with their training on vast datasets, present a promising solution to this challenge.

Objective:

Precision in clinical summarization is crucial as it directly impacts patient care and safety. This study evaluates the effectiveness of decoder-only LLMs compared to traditional encoder-decoder architectures in generating clinical notes from doctor-patient dialogues, focusing on maintaining medical accuracy and complying with healthcare privacy standards.

Methods:

We utilized the MTS-DIALOG dataset, containing 1,700 doctor-patient conversations paired with clinical notes. Our experiments involved fine-tuning several decoder-only LLMs, including Mistral, Meditron, and Llama, using a parameter-efficient fine-tuning approach.

Results:

Model performance was evaluated using ROUGE and BERT scores, demonstrating that Meditron-7B and Llama3-8B achieved state-of-the-art results, with Mistral-7B also performing competitively. The findings indicate that decoder-only LLMs, particularly Llama variants, outperform traditional models. Moreover, fine-tuning with higher quantization has the potential to further enhance performance.

Conclusions:

This study underscores the potential of decoder-only LLMs to transform clinical workflows by streamlining medical documentation, thereby enabling healthcare professionals to dedicate more time to patient care.

Citation

Please cite as:

Ahmed S, Yousuf Sadeque F

Clinical Note Generation From Doctor-Patient Conversations Using Parameter-Efficient Fine-Tuning Large Language Models: Comparative Study

JMIR Med Inform 2026;14:e82545

DOI: 10.2196/82545

PMID: 42234930

PMCID: 13232911

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 17, 2025

Open Peer Review Period: Sep 3, 2025 - Oct 29, 2025

Date Accepted: Mar 13, 2026

(closed for review but you can still tweet)

Clinical Note Generation from Doctor-Patient Conversations Using Parameter-Efficient Fine-Tuning Large Language Models: Comparative Study

ABSTRACT

Citation

Copyright