Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Dec 14, 2024
Date Accepted: Apr 7, 2025

The final, peer-reviewed published version of this preprint can be found here:

Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process

Noda M, Koshu R, Tsunoda R, Tsunoda R, Ogihara H, Kamo T, Ito M, Fushiki H

Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process

JMIR Form Res 2025;9:e70070

DOI: 10.2196/70070

PMID: 40478723

PMCID: 12164947

Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: A Performance Study

  • Masao Noda; 
  • Ryota Koshu; 
  • Reiko Tsunoda; 
  • Reiko Tsunoda; 
  • Hirofumi Ogihara; 
  • Tomohiko Kamo; 
  • Makoto Ito; 
  • Hiroaki Fushiki

ABSTRACT

Background:

Conventional nystagmus classification methods often rely on subjective observation by specialists, which is time-consuming and variable among clinicians. Recently, deep learning techniques have been employed to automate nystagmus classification using convolutional and recurrent neural networks, which can accurately classify nystagmus pat-terns using video data. However, associated challenges include the need for large datasets when creating models, the fact that they only address specific image conditions, and the complexity as-sociated with utilizing the models.

Objective:

To evaluate a novel approach for nystagmus classification that utilized the Generative Pre-trained Transformer-4-Vision (GPT-4V) model, which is a state-of-the-art large-scale language model with powerful image recognition capabilities.

Methods:

We developed a pupil-tracking process using a nystagmus-recording video and verified the optimization model’s accuracy using GPT-4V classification and nystagmus recording. We tested whether the created optimization model could be evaluated in six categories of nystagmus: right horizontal, left horizontal, upward, downward, right torsional, and left torsional. The traced trajectory was input as two-dimensional coordinate data or an image, and multiple in-context learning methods were evaluated.

Results:

The developed model showed 37% overall classification accuracy with the input of pupil-traced images and a maximum accuracy of 24.6% when pupil coordinates were input. Regarding orientation, we achieved a maximum accuracy of 69% for the classification of horizontal nystagmus patterns but a lower accuracy for the vertical and torsional components.

Conclusions:

We demonstrated the potential of versatile vertigo management in a generative artificial intelligence model that improves the accuracy and efficiency of nystagmus classification. We also highlighted areas for further improvement, such as expanding dataset size and enhancing input modalities, to improve classification performance across all nystagmus types. The GPT-4V model, validated only for recognizing still images, can be linked to video classification and proposed as a novel method.


 Citation

Please cite as:

Noda M, Koshu R, Tsunoda R, Tsunoda R, Ogihara H, Kamo T, Ito M, Fushiki H

Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process

JMIR Form Res 2025;9:e70070

DOI: 10.2196/70070

PMID: 40478723

PMCID: 12164947

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.