Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 25, 2025
Open Peer Review Period: Aug 25, 2025 - Oct 20, 2025
Date Accepted: Feb 24, 2026
(closed for review but you can still tweet)
Initial Insights into an Institutional Secure Large Language model for MRI Examination Requests: Retrospective Review
ABSTRACT
Background:
Incomplete clinical details on MRI examination requests (MERs) can lead to sub-optimal protocol selection. An institutional secure large language model (sLLM) with access to the electronic medical record (EMR) may improve request completeness and protocol accuracy across multiple MRI subspecialties.
Objective:
To compare clinician MERs with sLLM-augmented MERs for information quality and to evaluate protocoling accuracy of the sLLM versus board-certified radiologists across body, musculoskeletal, and neuroradiology MRI.
Methods:
This retrospective study included 608 consecutive MRI examinations performed between September 2023 and July 2024 (body 206, musculoskeletal 203, neuroradiology 199). The cohort comprised 528 patients (mean age 51.2 years ± 19.2, range 4–93; 279 women [52.8%], 249 men [47.2%]). MERs without EMR access were excluded. A privately hosted Anthropic Claude 3.5 model (temperature 0) augmented each MER with salient EMR data and, via rule-based parsing, recommended region/coverage and contrast use. Two experienced radiologists established a consensus reference standard. Two board-certified general radiologists (Rad 3, Rad 4) and the sLLM were compared with this standard. Clinical-information quality was graded using the Reason-for-Exam Imaging Reporting and Data System (RI-RADS). Inter-rater reliability was quantified with Gwet’s AC1. Paired accuracies were compared with McNemar testing to determine if there was a statistically significant difference.
Results:
Inter-reader agreement for RI-RADS was almost perfect for sLLM-augmented MERs (AC1 0.97; 95% CI 0.94–0.99) and moderate for clinician MERs (AC1 0.43; 95% CI 0.34–0.52). Limited or deficient clinical information (RI-RADS C/D) fell to 0–0.7% with sLLM augmentation versus 5.2–20.4% for clinician MERs. Overall protocol accuracy was 566/608 (93.1%; 95% CI 89.6–96.6) for the sLLM, 556/608 (91.4%; 95% CI 87.6–95.3) for Rad 3, and 560/608 (92.1%; 95% CI 88.4–95.8) for Rad 4 (sLLM vs Rad 3 p=.23; vs Rad 4 p=.40). Region/coverage accuracy was similar (sLLM 95.2%, Rad 3 96.2%, Rad 4 94.2%; p=.46 and p=.36). Contrast decisions were more accurate using the sLLM at 574/608 (94.4%; 95% CI 91.3–97.5) versus Rad 3 at 560/608 (92.1%; 95% CI 88.4–95.8; p=.027) and not significantly different to Rad 4 at 565/608 (92.9%; 95% CI 89.4–96.4; p=.16). Subspecialty analyses showed similar patterns, with the sLLM outperforming Rad 4 for musculoskeletal MRI contrast decisions (96.6% vs 91.1%; p=.006) and matching readers elsewhere. Manual review indicated that sLLM improvements arose from EMR details not listed on the MER (infection/inflammation, tumor history, prior surgery). No clinically significant hallucinations were identified.
Conclusions:
Across body, musculoskeletal, and neuroradiology MRI, secure LLM-augmented examination requests had improved clinical context and enhanced contrast selection while matching general radiologists for region/coverage. Integrating secure LLMs into routine vetting workflows may reduce manual workload and standardize protocoling.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.