JMIR Preprints #90814: Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study

Kuan-Hsun Lin;
Chia-Ping Chang;
Chen-Tsung Kuo;
Chien-Yeh Hsu;
Shih-Hsin Hung;
Chung-Yueh Lien;
Siang-Hao Lee;
Yi-Chen Yeh;
Yuan-Chia Chu

ABSTRACT

Background:

Accurate transcription of medical records is critical for clinical decision-making and patient care, particularly in high-stakes fields like pathology. This challenge is further amplified in multilingual environments.

Objective:

Integrating advanced technologies like Automatic Speech Recognition (ASR) and Large Language Models (LLMs) could enhance the accuracy and efficiency of generating pathology reports.

Methods:

We assessed the performance of the Whisper ASR system combined with LLMs in generating clinically relevant pathology reports from 125 simulated multilingual audio recordings. To guide the transcription process, system messages were utilized. The primary outcome measured was the reduction in Character Error Rate (CER). Secondary analyses assessed the effectiveness of different LLMs, including BLEU, ROUGE, and METEOR metrics, expert pathologists' rankings of the generated pathology reports, and a comprehensive error type analysis.

Results:

The use of system messages within the Whisper ASR system significantly reduced the Character Error Rate (CER) from 0.344 to 0.066. The Qwen2:72b model exhibited superior performance across all metrics evaluated, including BLEU, ROUGE-1, ROUGE-2, ROUGE-L, and METEOR scores, indicating high alignment with reference texts and comprehensive content coverage. In contrast, the Llama3.1:70b model showed moderate performance with greater variability, while the Gemma2:27b model had the lowest scores and highest variability. Qwen2:72b also maintained efficient inference speeds with a mean of 5.2 seconds and a narrow 95% confidence interval, demonstrating stable and reliable performance for clinical use.

Conclusions:

The integration of ASR with LLM technologies significantly improves the accuracy of pathology report generation in multilingual settings. This enhancement has the potential to streamline clinical workflows and support the transition to fully digital medical records. In this study, the audio recordings were simulated by board-certified pathologists based on real-world gross examination procedures, in order to reflect typical clinical speech patterns. Further validation in actual clinical environments is still necessary to confirm generalizability.

Citation

Please cite as:

Lin KH, Chang CP, Kuo CT, Hsu CY, Hung SH, Lien CY, Lee SH, Yeh YC, Chu YC

Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study

JMIR Form Res 2026;10:e90814

DOI: 10.2196/90814

PMID: 42127277

PMCID: 13170781

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jan 4, 2026

Date Accepted: Apr 24, 2026

Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study

ABSTRACT

Citation

Copyright