Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Aug 14, 2023
Date Accepted: Dec 4, 2023

The final, peer-reviewed published version of this preprint can be found here:

Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study

Ćirković A, Katz T

Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study

JMIR Form Res 2023;7:e51798

DOI: 10.2196/51798

PMID: 38153777

PMCID: 10784977

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: A Comparative Study

  • Aleksandar Ćirković; 
  • Toam Katz

ABSTRACT

Background:

Refractive surgery research aims to optimize patient categorization for ideal procedures, minimizing risks while maximizing outcomes. Recent advances have led to the development of AI-powered algorithms, including machine learning (ML) approaches, to assess risks and enhance workflow. Large language models (LLMs) like ChatGPT-4 have emerged as potential general AI tools that can assist across various disciplines, including refractive surgery decision-making. However, their capabilities in pre-categorizing refractive surgery patients based on real-world parameters remain unexplored.

Objective:

This exploratory study aimed to examine ChatGPT-4's capabilities in pre-categorizing refractive surgery patients based on commonly used clinical parameters. The goal was to assess whether ChatGPT-4 could provide meaningful categorizations based on batch processed inputs, comparable to those made by a refractive surgeon.

Methods:

Data from 100 consecutive patients from a refractive clinic were anonymized and analyzed. Parameters included age, sex, manifest refraction, visual acuity, and various corneal measurements and indices from Scheimpflug imaging. The study compared ChatGPT-4's performance with a clinician's categorizations using Cohen's Kappa coefficient, a confusion matrix, and descriptive statistics.

Results:

A statistically significant non-coincidental accordance was found between ChatGPT-4 and the clinician's categorizations with a Cohen's Kappa coefficient of 0.399 for six categories (confidence interval [0.256;0.537]) and 0.610 for binary categorization (confidence interval [0.372;0.792]). The model showed temporal instability and response variability. The Chi-Squared test showed significant differences in categorization distributions (Χ²=94.7, p<0.01), and Fischer’s exact test for binary categorizations resulted in an Odds Ratio of 27.9 and a p-value of <0.01.

Conclusions:

The study revealed that ChatGPT-4 exhibits potential as a pre-categorization tool in refractive surgery, showing promising agreement with clinician categorizations. However, limitations such as temporal instability and variability between iterations indicate room for improvement. The results encourage further exploration into the application of LLMs like ChatGPT-4 in healthcare, particularly in decision-making processes that require understanding vast clinical data. Future research should focus on refining the model's accuracy, expanding the variables used for classification, and exploring the boundaries of its limitations to pave the way for large-scale validation and real-world implementation. Clinical Trial: none


 Citation

Please cite as:

Ćirković A, Katz T

Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study

JMIR Form Res 2023;7:e51798

DOI: 10.2196/51798

PMID: 38153777

PMCID: 10784977

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.