Currently submitted to: JMIR Medical Education
Date Submitted: Mar 16, 2026
Open Peer Review Period: Mar 18, 2026 - May 13, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Using Generative Artificial Intelligence to Aid in Surgery Resident Selection: A Retrospective Comparative Study
ABSTRACT
Background:
Surgery resident selection is a resource-intensive process. The advent of generative artificial intelligence (GAI) offers a new possibility to aid in resident selection, increasing the efficiency of file review without the burden of creating a customized machine-learning algorithm.
Objective:
Our study aimed to compare file review of general surgery applicants by GAI to file review by our program’s residency selection committee (RSC).
Methods:
GPT-4o, an open access GAI software, was used to score deidentified 2023-2024 Canadian Resident Matching Service (CaRMS) application files to our program based on our RSC’s file review scoresheet. GAI scores were compared to RSC-assigned scores for each application element including CVs, personal letters, and reference letters. Rank lists generated from both sets of scores were compared using Spearman’s rank correlation. GPT-4o was then used to create ten generic application files. These were scored by GAI and compared to GAI scores for the 2023-2024 CaRMS applicants using the Wilcoxon rank-sum test.
Results:
A total of 124 application files were included. Median GAI file review scores were consistently higher than RSC-assigned scores (24.46 vs. 17.54 y, p<0.05) and had less variance between applicants (6.96 vs. 20.80, p<0.05). The interrater reliability between GAI scores and RSC scores was poor across all application elements (0.16). Rank lists generated by GAI and RSC scores demonstrated a weakly positive correlation for each application element (0.25 to 0.37, p<0.05). Rank lists based on total file review scores demonstrated a moderately positive correlation (0.44, p<0.05). Median scores for GAI-created files compared to CaRMS applicant files were statistically similar for application CVs (6.88, p=0.25), but were significantly higher for other application elements and global scores (27.51 vs. 24.46, p<0.05).
Conclusions:
GAI in its current form cannot reliably replicate human file review. Further research is needed to determine the potential role for GAI in residency selection.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.