Accepted for/Published in: JMIR Formative Research
Date Submitted: Dec 14, 2024
Date Accepted: Apr 7, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: A Performance Study
ABSTRACT
Background:
Conventional nystagmus classification methods often rely on subjective observation by specialists, which is time-consuming and variable among clinicians. Recently, deep learning techniques have been employed to automate nystagmus classification using convolutional and recurrent neural networks, which can accurately classify nystagmus pat-terns using video data. However, associated challenges include the need for large datasets when creating models, the fact that they only address specific image conditions, and the complexity as-sociated with utilizing the models.
Objective:
To evaluate a novel approach for nystagmus classification that utilized the Generative Pre-trained Transformer-4-Vision (GPT-4V) model, which is a state-of-the-art large-scale language model with powerful image recognition capabilities.
Methods:
We developed a pupil-tracking process using a nystagmus-recording video and verified the optimization model’s accuracy using GPT-4V classification and nystagmus recording. We tested whether the created optimization model could be evaluated in six categories of nystagmus: right horizontal, left horizontal, upward, downward, right torsional, and left torsional. The traced trajectory was input as two-dimensional coordinate data or an image, and multiple in-context learning methods were evaluated.
Results:
The developed model showed 37% overall classification accuracy with the input of pupil-traced images and a maximum accuracy of 24.6% when pupil coordinates were input. Regarding orientation, we achieved a maximum accuracy of 69% for the classification of horizontal nystagmus patterns but a lower accuracy for the vertical and torsional components.
Conclusions:
We demonstrated the potential of versatile vertigo management in a generative artificial intelligence model that improves the accuracy and efficiency of nystagmus classification. We also highlighted areas for further improvement, such as expanding dataset size and enhancing input modalities, to improve classification performance across all nystagmus types. The GPT-4V model, validated only for recognizing still images, can be linked to video classification and proposed as a novel method.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.