Accepted for/Published in: JMIR Rehabilitation and Assistive Technologies
Date Submitted: Jan 7, 2026
Date Accepted: Apr 9, 2026
Can GPT-5 Support Licensing Exam Preparation? Analysis of Accuracy, Reasoning, and Semantic Similarity Across Rehabilitation Disciplines
ABSTRACT
Background:
As artificial intelligence tools become more common in health professional education, students are increasingly turning to large language models such as ChatGPT (GPT-5) to support studying for high-stakes licensing exams. Although these models can generate accurate factual responses, their ability to mirror expert reasoning and provide conceptually sound explanations remains uncertain. This study examined GPT-5’s accuracy, reasoning patterns, and semantic similarity to validated rehabilitation board-preparation in physical therapy, occupational therapy, and speech-language pathology.
Methods:
Three hundred multiple choice questions (100 per discipline) from verified board-preparation sources were entered into GPT-5 without hints or prompting. Model accuracy was recorded as correct or incorrect. The board preparation sources provided reasoning type (inductive, deductive, analytical, evaluative, inferential) per each question which was used to determine GPT-5 accuracy per reasoning type. Semantic similarity between GPT-5 and expert rationales were calculated using cosine similarity. Descriptive statistics summarized performance across disciplines. Incorrect responses underwent qualitative content analysis to identify shared conceptual challenges with dual coder review to establish agreement.
Results:
GPT-5 demonstrated high factual accuracy overall, with discipline specific variation: PT 91%, SLP 83%, and OT 78%. Deductive reasoning questions demonstrated the highest accuracy across disciplines, achieving 100% in PT. Mean semantic similarity between GPT-5 and expert rationales was 0.707, highest for deductive (0.712) and analytical (0.708) reasoning. Qualitative review indicated consistent issues with advanced reasoning tasks.
Conclusions:
GPT-5 reproduced substantial domain knowledge from rehabilitation board-preparation materials but showed persistent deficits in higher-order reasoning. Although semantic similarity to expert explanations was high, inconsistencies in inferential and evaluative logic limit its reliability as an unsupervised study tool. Findings highlight the need for guided use of LLMs in health-professions education, further research across specialties and exam formats, and clearer standards for integrating AI-based study aids to ensure educational quality and patient safety.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.