Previously submitted to: JMIR Medical Education (no longer under consideration since Dec 24, 2025)
Date Submitted: Jul 3, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluating the Performance of Perplexity Artificial Intelligence on the Taiwan Urology Certification Examination: Accuracy, Consistency, and Educational Potential
ABSTRACT
Background:
Large language models (LLMs), such as ChatGPT, are increasingly used in medical education. However, they often underperform on specialty board examinations. Perplexity artificial intelligence (AI) distinguishes itself by providing citation-backed responses; however, its effectiveness on specialty board exams—particularly in urology—has not been systematically evaluated.
Objective:
To evaluate the performance of Perplexity AI (Pro Search mode) on the Taiwan Urology Certification Examination and examine its potential as a training tool for urology residents.
Methods:
We submitted 742 single-choice questions from the 2019–2023 Taiwan Urology Certification Examinations to Perplexity AI. Two independent urology residents evaluated the accuracy, consistency, and comprehensiveness of each response. A board-certified urologist resolved any discordant ratings. Correct-response rates were compared across years and difficulty levels. Consistency and comprehensiveness were evaluated based on the model’s explanations. Statistical analysis included chi-square testing and univariate logistic regression.
Results:
Perplexity AI achieved an overall accuracy of 54.4%, with year-specific rates of 52.3% (2019), 58.7% (2020), 48.3% (2021), 52.0% (2022), and 60.8% (2023). Accuracy was highest for basic-level questions (70.0%). Explanations were highly consistent (88.4%), and fully comprehensive responses were more likely to be correct than partially comprehensive ones (61.9% vs. 41.7%).
Conclusions:
Perplexity AI provides coherent, citation-supported responses with 54.4% overall accuracy and 88.4% consistency. Its real-time search function and detailed rationales make it a promising supplemental tool in urology education. However, its domain-specific strengths and weaknesses suggest that combining multiple LLMs may offer the most robust study strategy.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.