Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Mar 13, 2024
Date Accepted: May 7, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation

Noda M, Yoshimura H, Okubo T, Koshu R, Uchiyama Y, Nomura A, Ito M, Takumi Y

Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation

JMIR AI 2024;3:e58342

DOI: 10.2196/58342

PMID: 38875669

PMCID: 11179042

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Feasibility of Multimodal Artificial Intelligence using Generative Pre-Trained Transformer 4-Vision for the Classification of Middle Ear Disease

  • Masao Noda; 
  • Hidekane Yoshimura; 
  • Takuya Okubo; 
  • Ryota Koshu; 
  • Yuki Uchiyama; 
  • Akihiro Nomura; 
  • Makoto Ito; 
  • Yutaka Takumi

ABSTRACT

Background:

The integration of artificial intelligence (AI), particularly deep learning models, has transformed the landscape of medical technology, especially in the field of diagnosis utilizing imaging and physiological data. In otolaryngology, AI has shown promise in image classification for middle ear diseases. However, existing models often lack patient-specific data and clinical context, limiting their universal applicability. The emergence of Generative Pre-trained Transformer 4 Vision (GPT-4V) has enabled a multimodal diagnostic approach, integrating language processing with image analysis.

Objective:

In this study, we investigated the effectiveness of GPT-4V in diagnosing middle ear diseases by integrating patient-specific data with otoscopic images of the tympanic membrane.

Methods:

The study design was divided into two phases: (1) establishing a model with appropriate prompts and (2) validating the ability of the optimal prompt model to classify images. 305 otoscopic images of four middle ear disease (acute otitis media (AOM), middle ear cholesteatoma (Chole), chronic otitis media (COM), and otitis media with effusion (OME)) were obtained from patients who visited Shinshu University or Jichi Medical University between April 2010 and December 2023. The optimized GPT-4V settings were established using prompts and patients’ data, and the model with the optimal prompt created was used to verify the diagnostic accuracy of GPT-4V on 190 images. To compare the diagnostic accuracy of GPT-4V with that of physicians, 30 clinicians completed a web-based questionnaire consisting of 190 images.

Results:

The multimodal AI approach achieved an accuracy of 82.1%, which is superior to that of certified pediatricians, 70.6%, but trailing behind that of otolaryngologists, more than 95%. The model's disease-specific accuracy rates were 89.19% for AOM, 76.5% for COM, 79.3% for cholesteatoma, and 85.7% for OME, which highlights the need for disease-specific optimization. Comparisons with physicians revealed promising results, suggesting the potential of GPT-4V to augment clinical decision-making.

Conclusions:

Despite its advantages, challenges such as data privacy and ethical considerations must be addressed. Overall, this study underscores the potential of multimodal AI for enhancing diagnostic accuracy and improving patient care in otolaryngology. Further research is warranted to optimize and validate this approach in diverse clinical settings


 Citation

Please cite as:

Noda M, Yoshimura H, Okubo T, Koshu R, Uchiyama Y, Nomura A, Ito M, Takumi Y

Feasibility of Multimodal Artificial Intelligence Using GPT-4 Vision for the Classification of Middle Ear Disease: Qualitative Study and Validation

JMIR AI 2024;3:e58342

DOI: 10.2196/58342

PMID: 38875669

PMCID: 11179042

Per the author's request the PDF is not available.