JMIR Preprints #78393: Performance of Retrieval-Augmented-Generation large language models in guideline-concordant Prostate Specific Antigen (PSA) testing: A comparative study against junior clinicians

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Performance of Retrieval-Augmented-Generation large language models in guideline-concordant Prostate Specific Antigen (PSA) testing: A comparative study against junior clinicians

Joshua Yi Min Tung;
Quan Le;
Jinxuan Yao;
Yifei Huang;
Daniel Yan Zheng Lim;
Gerald Gui Ren Sng;
Rachel Shu En Lau;
Yu Guang Tan;
Kenneth Chen;
Kae Jack Tay;
Jen Hong Tan;
John Shyi Peng Yuen;
Christopher Wai Sam Cheng;
Henry Sun Sien Ho

ABSTRACT

Background:

Society guidelines for prostate cancer screening via PSA testing serve to standardize patient care, and are often utilized by trainees, junior staff, or generalist medical practitioners to guide medical decision-making. Adherence to guidelines is a time-consuming and challenging task and rates of inappropriate PSA testing are high.

Objective:

This study evaluates a retrieval-augmented generation (RAG) enhanced large language model (LLM), grounded in current EAU and AUA guidelines, to assess its effectiveness in providing guideline-concordant PSA screening recommendations compared to junior clinicians.

Methods:

A retrieval-augmented generation (RAG) pipeline was developed and used to process a series of 44 fictional case scenarios. Five junior clinicians were tasked to provide PSA testing recommendations for the same scenarios, in closed-book and open-book formats. Answers were compared for accuracy in a binomial fashion.

Results:

The RAG-LLM tool provided guideline-concordant recommendations in 95.5% of case scenarios, compared to junior clinicians, who were correct in 62.3% of scenarios in a closed-book format, and 74.1% of scenarios in an open book format. The difference was statistically significant for both closed-book (p <0.001) and open-book (p <0.001) formats.

Conclusions:

Use of RAG techniques allows LLMs to integrate complex guidelines into day-to-day medical decision-making. RAG-LLM tools in Urology have the capability to enhance clinical decision-making by providing guideline-concordant recommendations for PSA testing, potentially improving the consistency of healthcare delivery, reducing cognitive load on clinicians, and reducing unnecessary investigations and costs.

Citation

Please cite as:

Tung JYM, Le Q, Yao J, Huang Y, Lim DYZ, Sng GGR, Lau RSE, Tan YG, Chen K, Tay KJ, Tan JH, Yuen JSP, Cheng CWS, Ho HSS

Performance of Retrieval-Augmented Generation Large Language Models in Guideline-Concordant Prostate-Specific Antigen Testing: Comparative Study With Junior Clinicians

J Med Internet Res 2025;27:e78393

DOI: 10.2196/78393

PMID: 41259800

PMCID: 12629621

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 2, 2025

Open Peer Review Period: Jun 3, 2025 - Jul 29, 2025

Date Accepted: Oct 12, 2025

(closed for review but you can still tweet)

Performance of Retrieval-Augmented-Generation large language models in guideline-concordant Prostate Specific Antigen (PSA) testing: A comparative study against junior clinicians

ABSTRACT

Citation

Copyright