Accepted for/Published in: JMIR AI
Date Submitted: Jul 2, 2023
Open Peer Review Period: Jun 30, 2023 - Aug 25, 2023
Date Accepted: Nov 19, 2023
(closed for review but you can still tweet)
ChatGPT becomes an Oncologist: the performance of Artificial Intelligence in the American Society of Clinical Oncology Evaluation Program
ABSTRACT
Background:
Importance: ChatGPT is a state-of-the-art large language model that uses artificial intelligence (AI) to address questions across diverse topics. The American Society of Clinical Oncology Self-Evaluation (ASCO-SEP) program created a comprehensive educational program to help physicians keep up-to-date with the many rapid advances in the field. The question bank consists of multiple-choice questions (MCQs) addressing the many facets of cancer care, including diagnosis, treatment, and supportive care.
Objective:
As ChatGPT applications rapidly expand, we sought to investigate its performance in the field of medical oncology by using questions from ASCO-SEP.
Methods:
We conducted a systematic assessment of the performance of ChatGPT-3 on the American Society of Clinical Oncology Self-Evaluation Program (ASCO-SEP), the leading educational and assessment tool for medical oncologists in training and practice. Over 1000 multiple choice questions covering the spectrum of cancer care where extracted. Questions were categorized by cancer type/discipline, with sub-categorization as treatment, diagnosis or other. Answers were scored as correct if ChatGPT selected the answer as defined by ASCO-SEP.
Results:
Overall, ChatGPT achieved a score of 56% for correct answers provided (583/1040). The program demonstrated varying levels of accuracy across cancer types/disciplines. The highest accuracy was observed in questions related to developmental therapeutics (8/10; 80% correct), while the lowest accuracy was observed in questions related to gastrointestinal cancer (102/209; 49% correct). There was no significant difference in the program’s performance across the pre-defined sub-categories of diagnosis, treatment and other (p value = .16 > 0.05).
Conclusions:
Although below the required passing rate, ChatGPT’s performance on the ASCO-SEP showed promise for future applications in cancer care and medical education. Current limitations of the technology include training data that does not extend beyond 2021, and the inability to process or interpret data tables or images. However, as the technology continues to evolve, it is expected that these limitations will be overcome, allowing for improved capabilities. Clinical Trial: Non applicable
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.