Accepted for/Published in: JMIR Formative Research
Date Submitted: Jan 4, 2026
Date Accepted: Apr 24, 2026
Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study
ABSTRACT
Background:
Accurate transcription of medical records is critical for clinical decision-making and patient care, particularly in high-stakes fields like pathology. This challenge is further amplified in multilingual environments.
Objective:
Integrating advanced technologies like Automatic Speech Recognition (ASR) and Large Language Models (LLMs) could enhance the accuracy and efficiency of generating pathology reports.
Methods:
We assessed the performance of the Whisper ASR system combined with LLMs in generating clinically relevant pathology reports from 125 simulated multilingual audio recordings. To guide the transcription process, system messages were utilized. The primary outcome measured was the reduction in Character Error Rate (CER). Secondary analyses assessed the effectiveness of different LLMs, including BLEU, ROUGE, and METEOR metrics, expert pathologists' rankings of the generated pathology reports, and a comprehensive error type analysis.
Results:
The use of system messages within the Whisper ASR system significantly reduced the Character Error Rate (CER) from 0.344 to 0.066. The Qwen2:72b model exhibited superior performance across all metrics evaluated, including BLEU, ROUGE-1, ROUGE-2, ROUGE-L, and METEOR scores, indicating high alignment with reference texts and comprehensive content coverage. In contrast, the Llama3.1:70b model showed moderate performance with greater variability, while the Gemma2:27b model had the lowest scores and highest variability. Qwen2:72b also maintained efficient inference speeds with a mean of 5.2 seconds and a narrow 95% confidence interval, demonstrating stable and reliable performance for clinical use.
Conclusions:
The integration of ASR with LLM technologies significantly improves the accuracy of pathology report generation in multilingual settings. This enhancement has the potential to streamline clinical workflows and support the transition to fully digital medical records. In this study, the audio recordings were simulated by board-certified pathologists based on real-world gross examination procedures, in order to reflect typical clinical speech patterns. Further validation in actual clinical environments is still necessary to confirm generalizability.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.