Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jul 2, 2023
Open Peer Review Period: Jun 30, 2023 - Aug 25, 2023
Date Accepted: Nov 19, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks

Odabashian R, Bastin D, Manzoor M, Tangestaniapour S, Assad M, Lakhani S, Jones G, Odabashian M, McGee S

Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks

JMIR AI 2024;3:e50442

DOI: 10.2196/50442

PMID: 38875575

PMCID: 11041475

ChatGPT becomes an Oncologist: the performance of Artificial Intelligence in the American Society of Clinical Oncology Evaluation Program

  • Roupen Odabashian; 
  • Donald Bastin; 
  • Maria Manzoor; 
  • Sina Tangestaniapour; 
  • Malke Assad; 
  • Sunita Lakhani; 
  • Georden Jones; 
  • Maritsa Odabashian; 
  • Sharon McGee

ABSTRACT

Background:

Importance: ChatGPT is a state-of-the-art large language model that uses artificial intelligence (AI) to address questions across diverse topics. The American Society of Clinical Oncology Self-Evaluation (ASCO-SEP) program created a comprehensive educational program to help physicians keep up-to-date with the many rapid advances in the field. The question bank consists of multiple-choice questions (MCQs) addressing the many facets of cancer care, including diagnosis, treatment, and supportive care.

Objective:

As ChatGPT applications rapidly expand, we sought to investigate its performance in the field of medical oncology by using questions from ASCO-SEP.

Methods:

We conducted a systematic assessment of the performance of ChatGPT-3 on the American Society of Clinical Oncology Self-Evaluation Program (ASCO-SEP), the leading educational and assessment tool for medical oncologists in training and practice. Over 1000 multiple choice questions covering the spectrum of cancer care where extracted. Questions were categorized by cancer type/discipline, with sub-categorization as treatment, diagnosis or other. Answers were scored as correct if ChatGPT selected the answer as defined by ASCO-SEP.

Results:

Overall, ChatGPT achieved a score of 56% for correct answers provided (583/1040). The program demonstrated varying levels of accuracy across cancer types/disciplines. The highest accuracy was observed in questions related to developmental therapeutics (8/10; 80% correct), while the lowest accuracy was observed in questions related to gastrointestinal cancer (102/209; 49% correct). There was no significant difference in the program’s performance across the pre-defined sub-categories of diagnosis, treatment and other (p value = .16 > 0.05).

Conclusions:

Although below the required passing rate, ChatGPT’s performance on the ASCO-SEP showed promise for future applications in cancer care and medical education. Current limitations of the technology include training data that does not extend beyond 2021, and the inability to process or interpret data tables or images. However, as the technology continues to evolve, it is expected that these limitations will be overcome, allowing for improved capabilities. Clinical Trial: Non applicable


 Citation

Please cite as:

Odabashian R, Bastin D, Manzoor M, Tangestaniapour S, Assad M, Lakhani S, Jones G, Odabashian M, McGee S

Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks

JMIR AI 2024;3:e50442

DOI: 10.2196/50442

PMID: 38875575

PMCID: 11041475

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.