Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 23, 2024
Date Accepted: Mar 10, 2025

The final, peer-reviewed published version of this preprint can be found here:

Clinical Value of ChatGPT for Epilepsy Presurgical Decision-Making: Systematic Evaluation of Seizure Semiology Interpretation

Ding JE, Liu F

Clinical Value of ChatGPT for Epilepsy Presurgical Decision-Making: Systematic Evaluation of Seizure Semiology Interpretation

J Med Internet Res 2025;27:e69173

DOI: 10.2196/69173

PMID: 40354107

PMCID: 12107199

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Is ChatGPT Better Than Epileptologists at Interpreting Seizure Semiology?

  • Jun-En Ding; 
  • Feng Liu

ABSTRACT

Background:

This study evaluates the clinical utility of ChatGPT in interpreting seizure semiology for epileptogenic zone (EZ) localization in focal epilepsy presurgical assessment. We analyzed two datasets: 852 semiology-EZ pairs from 193 peer-reviewed publications and 184 pairs from Far Eastern Memorial Hospital (FEMH), Taiwan. ChatGPT's performance was tested using zero-shot and few-shot prompting methods, and compared against eight epileptologists' interpretations of 100 randomly selected cases. Performance was measured using regional sensitivity (RSens), weighted sensitivity (WSens), and net positive inference rate (NPIR). Results showed ChatGPT achieved >80% sensitivity for frontal and temporal lobes, ~40% for occipital lobe, 20-30% for parietal lobe, 20% for insular cortex, and 0% for cingulate cortex across both datasets. Compared to epileptologists, ChatGPT demonstrated superior performance in frontal and temporal lobe localization, comparable accuracy in occipital and parietal regions, but underperformed in insular and cingulate cortices. Both ChatGPT and epileptologists showed similar WSens and NPIR values. These findings suggest ChatGPT could serve as a valuable clinical tool in epilepsy presurgical workup, with potential for further improvement as language model technology advances.

Objective:

This study aims to evaluate the clinical value of representative large language models (LLMs), namely ChatGPT, on interpreting seizure semiology to localize epileptogenic zones (EZs) for presurgical assessment in patients with focal epilepsy.

Methods:

We compiled two data cohorts through public sources and a private database respectively. The data cohort compiled from public sources consists of 852 semiology-EZ pairs derived from 193 peer-reviewed journal publications. The private database includes 184 semiology-EZ pairs collected from the Far Eastern Memorial Hospital (FEMH) in Taiwan. ChatGPT was asked to generate the most likely EZ locations based on the semiology records from both cohorts with two prompting methods: Zero-shot prompting (ZSP) and Few-shot prompting (FSP). To evaluate the ChatGPT’s performance compared to epileptologists, a panel of eight epileptologists were recruited for an online survey to provide their interpretations on 100 randomly selected semiology records. The responses from ChatGPT and epileptologists were compared using three metrics: regional sensitivity (RSens), weighted sensitivity (WSens) and net positive inference rate (NPIR).

Results:

In the evaluation of interpreting seizure semiology, ChatGPT achieved over 80% sensitivity for the frontal and temporal lobes, approximately 40% for the occipital lobe, 20-30% for the parietal lobe, 20% for the insular cortex, and 0% for the cingulate cortex consistently in both data cohorts. By analyzing the responses from epileptologists, ChatGPT-4 outperformed epileptologists in localizing the frontal and temporal lobes, exhibited similar accuracy for the occipital and parietal lobes, but underperformed in the insular and cingulate cortices. Both ChatGPT and epileptologists demonstrated comparable value for WSens and mean of NPIR.

Conclusions:

In this cross-sectional study of seizure semiology interpretation, ChatGPT-generated responses outperformed or matched the responses from epileptologists in regions where EZs are commonly located, including the frontal lobe and the temporal lobe. However, epileptologists provided more accurate responses in regions where EZs are rarely located, such as the insula and the cingulate cortex. Overall, our results demonstrate that ChatGPT might serve as a valuable tool to assist in the preoperative assessment for epilepsy surgery. However, it must be acknowledged that the information provided by ChatGPT may not always be backed by reliable sources, posing a challenge to the verification of ChatGPT-generated responses. Furthermore, medical professionals, including epileptologists and epilepsy surgeons, must fully recognize the limitations of ChatGPT and exercise caution when utilizing its responses. This study serves as an important reference for employing ChatGPT in seizure semiology interpretation while underscoring its present constraints.


 Citation

Please cite as:

Ding JE, Liu F

Clinical Value of ChatGPT for Epilepsy Presurgical Decision-Making: Systematic Evaluation of Seizure Semiology Interpretation

J Med Internet Res 2025;27:e69173

DOI: 10.2196/69173

PMID: 40354107

PMCID: 12107199

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.