Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jun 2, 2025
Open Peer Review Period: Jun 3, 2025 - Jul 29, 2025
Date Accepted: Oct 12, 2025
(closed for review but you can still tweet)
Performance of Retrieval-Augmented-Generation large language models in guideline-concordant Prostate Specific Antigen (PSA) testing: A comparative study against junior clinicians
ABSTRACT
Background:
Society guidelines for prostate cancer screening via PSA testing serve to standardize patient care, and are often utilized by trainees, junior staff, or generalist medical practitioners to guide medical decision-making. Adherence to guidelines is a time-consuming and challenging task and rates of inappropriate PSA testing are high.
Objective:
This study evaluates a retrieval-augmented generation (RAG) enhanced large language model (LLM), grounded in current EAU and AUA guidelines, to assess its effectiveness in providing guideline-concordant PSA screening recommendations compared to junior clinicians.
Methods:
A retrieval-augmented generation (RAG) pipeline was developed and used to process a series of 44 fictional case scenarios. Five junior clinicians were tasked to provide PSA testing recommendations for the same scenarios, in closed-book and open-book formats. Answers were compared for accuracy in a binomial fashion.
Results:
The RAG-LLM tool provided guideline-concordant recommendations in 95.5% of case scenarios, compared to junior clinicians, who were correct in 62.3% of scenarios in a closed-book format, and 74.1% of scenarios in an open book format. The difference was statistically significant for both closed-book (p <0.001) and open-book (p <0.001) formats.
Conclusions:
Use of RAG techniques allows LLMs to integrate complex guidelines into day-to-day medical decision-making. RAG-LLM tools in Urology have the capability to enhance clinical decision-making by providing guideline-concordant recommendations for PSA testing, potentially improving the consistency of healthcare delivery, reducing cognitive load on clinicians, and reducing unnecessary investigations and costs.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.