JMIR Preprints #80050: Evaluating the Performance of Perplexity Artificial Intelligence on the Taiwan Urology Certification Examination: Accuracy, Consistency, and Educational Potential

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating the Performance of Perplexity Artificial Intelligence on the Taiwan Urology Certification Examination: Accuracy, Consistency, and Educational Potential

HsuCheng Ko;
Jhe-Yuan Hsu;
Yi-Sheng Lin;
Kuan-Chun Hsueh;
Chao-Yu Hsu;
Yen-Chuan Ou;
Min-Che Tung;
Yi-Yu Chen;
Pei-Chi Tsai;
I-Yen Lee

ABSTRACT

Background:

Large language models (LLMs), such as ChatGPT, are increasingly used in medical education. However, they often underperform on specialty board examinations. Perplexity artificial intelligence (AI) distinguishes itself by providing citation-backed responses; however, its effectiveness on specialty board exams—particularly in urology—has not been systematically evaluated.

Objective:

To evaluate the performance of Perplexity AI (Pro Search mode) on the Taiwan Urology Certification Examination and examine its potential as a training tool for urology residents.

Methods:

We submitted 742 single-choice questions from the 2019–2023 Taiwan Urology Certification Examinations to Perplexity AI. Two independent urology residents evaluated the accuracy, consistency, and comprehensiveness of each response. A board-certified urologist resolved any discordant ratings. Correct-response rates were compared across years and difficulty levels. Consistency and comprehensiveness were evaluated based on the model’s explanations. Statistical analysis included chi-square testing and univariate logistic regression.

Results:

Perplexity AI achieved an overall accuracy of 54.4%, with year-specific rates of 52.3% (2019), 58.7% (2020), 48.3% (2021), 52.0% (2022), and 60.8% (2023). Accuracy was highest for basic-level questions (70.0%). Explanations were highly consistent (88.4%), and fully comprehensive responses were more likely to be correct than partially comprehensive ones (61.9% vs. 41.7%).

Conclusions:

Perplexity AI provides coherent, citation-supported responses with 54.4% overall accuracy and 88.4% consistency. Its real-time search function and detailed rationales make it a promising supplemental tool in urology education. However, its domain-specific strengths and weaknesses suggest that combining multiple LLMs may offer the most robust study strategy.

Citation

Please cite as:

Ko H, Hsu JY, Lin YS, Hsueh KC, Hsu CY, Ou YC, Tung MC, Chen YY, Tsai PC, Lee IY

Evaluating the Performance of Perplexity Artificial Intelligence on the Taiwan Urology Certification Examination: Accuracy, Consistency, and Educational Potential

JMIR Preprints. 03/07/2025:80050

DOI: 10.2196/preprints.80050

URL: https://preprints.jmir.org/preprint/80050

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Previously submitted to: JMIR Medical Education (no longer under consideration since Dec 24, 2025)

Date Submitted: Jul 3, 2025

Evaluating the Performance of Perplexity Artificial Intelligence on the Taiwan Urology Certification Examination: Accuracy, Consistency, and Educational Potential

ABSTRACT

Citation

Copyright