Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Previously submitted to: JMIR Medical Education (no longer under consideration since Dec 24, 2025)

Date Submitted: Jul 3, 2025

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Evaluating the Performance of Perplexity Artificial Intelligence on the Taiwan Urology Certification Examination: Accuracy, Consistency, and Educational Potential

  • HsuCheng Ko; 
  • Jhe-Yuan Hsu; 
  • Yi-Sheng Lin; 
  • Kuan-Chun Hsueh; 
  • Chao-Yu Hsu; 
  • Yen-Chuan Ou; 
  • Min-Che Tung; 
  • Yi-Yu Chen; 
  • Pei-Chi Tsai; 
  • I-Yen Lee

ABSTRACT

Background:

Large language models (LLMs), such as ChatGPT, are increasingly used in medical education. However, they often underperform on specialty board examinations. Perplexity artificial intelligence (AI) distinguishes itself by providing citation-backed responses; however, its effectiveness on specialty board exams—particularly in urology—has not been systematically evaluated.

Objective:

To evaluate the performance of Perplexity AI (Pro Search mode) on the Taiwan Urology Certification Examination and examine its potential as a training tool for urology residents.

Methods:

We submitted 742 single-choice questions from the 2019–2023 Taiwan Urology Certification Examinations to Perplexity AI. Two independent urology residents evaluated the accuracy, consistency, and comprehensiveness of each response. A board-certified urologist resolved any discordant ratings. Correct-response rates were compared across years and difficulty levels. Consistency and comprehensiveness were evaluated based on the model’s explanations. Statistical analysis included chi-square testing and univariate logistic regression.

Results:

Perplexity AI achieved an overall accuracy of 54.4%, with year-specific rates of 52.3% (2019), 58.7% (2020), 48.3% (2021), 52.0% (2022), and 60.8% (2023). Accuracy was highest for basic-level questions (70.0%). Explanations were highly consistent (88.4%), and fully comprehensive responses were more likely to be correct than partially comprehensive ones (61.9% vs. 41.7%).

Conclusions:

Perplexity AI provides coherent, citation-supported responses with 54.4% overall accuracy and 88.4% consistency. Its real-time search function and detailed rationales make it a promising supplemental tool in urology education. However, its domain-specific strengths and weaknesses suggest that combining multiple LLMs may offer the most robust study strategy.


 Citation

Please cite as:

Ko H, Hsu JY, Lin YS, Hsueh KC, Hsu CY, Ou YC, Tung MC, Chen YY, Tsai PC, Lee IY

Evaluating the Performance of Perplexity Artificial Intelligence on the Taiwan Urology Certification Examination: Accuracy, Consistency, and Educational Potential

JMIR Preprints. 03/07/2025:80050

DOI: 10.2196/preprints.80050

URL: https://preprints.jmir.org/preprint/80050

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.