Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jan 4, 2026
Date Accepted: Apr 24, 2026

The final, peer-reviewed published version of this preprint can be found here:

Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study

Lin KH, Chang CP, Kuo CT, Hsu CY, Hung SH, Lien CY, Lee SH, Yeh YC, Chu YC

Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study

JMIR Form Res 2026;10:e90814

DOI: 10.2196/90814

PMID: 42127277

Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study

  • Kuan-Hsun Lin; 
  • Chia-Ping Chang; 
  • Chen-Tsung Kuo; 
  • Chien-Yeh Hsu; 
  • Shih-Hsin Hung; 
  • Chung-Yueh Lien; 
  • Siang-Hao Lee; 
  • Yi-Chen Yeh; 
  • Yuan-Chia Chu

ABSTRACT

Background:

Accurate transcription of medical records is critical for clinical decision-making and patient care, particularly in high-stakes fields like pathology. This challenge is further amplified in multilingual environments.

Objective:

Integrating advanced technologies like Automatic Speech Recognition (ASR) and Large Language Models (LLMs) could enhance the accuracy and efficiency of generating pathology reports.

Methods:

We assessed the performance of the Whisper ASR system combined with LLMs in generating clinically relevant pathology reports from 125 simulated multilingual audio recordings. To guide the transcription process, system messages were utilized. The primary outcome measured was the reduction in Character Error Rate (CER). Secondary analyses assessed the effectiveness of different LLMs, including BLEU, ROUGE, and METEOR metrics, expert pathologists' rankings of the generated pathology reports, and a comprehensive error type analysis.

Results:

The use of system messages within the Whisper ASR system significantly reduced the Character Error Rate (CER) from 0.344 to 0.066. The Qwen2:72b model exhibited superior performance across all metrics evaluated, including BLEU, ROUGE-1, ROUGE-2, ROUGE-L, and METEOR scores, indicating high alignment with reference texts and comprehensive content coverage. In contrast, the Llama3.1:70b model showed moderate performance with greater variability, while the Gemma2:27b model had the lowest scores and highest variability. Qwen2:72b also maintained efficient inference speeds with a mean of 5.2 seconds and a narrow 95% confidence interval, demonstrating stable and reliable performance for clinical use.

Conclusions:

The integration of ASR with LLM technologies significantly improves the accuracy of pathology report generation in multilingual settings. This enhancement has the potential to streamline clinical workflows and support the transition to fully digital medical records. In this study, the audio recordings were simulated by board-certified pathologists based on real-world gross examination procedures, in order to reflect typical clinical speech patterns. Further validation in actual clinical environments is still necessary to confirm generalizability.


 Citation

Please cite as:

Lin KH, Chang CP, Kuo CT, Hsu CY, Hung SH, Lien CY, Lee SH, Yeh YC, Chu YC

Automatic Speech Recognition and Large Language Models for Multilingual Pathology Report Generation: Proof-of-Concept Study

JMIR Form Res 2026;10:e90814

DOI: 10.2196/90814

PMID: 42127277

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.