Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Mental Health

Date Submitted: Jul 25, 2023
Open Peer Review Period: Jul 25, 2023 - Aug 9, 2023
Date Accepted: Aug 24, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study

Levkovich I, Elyoseph Z

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study

JMIR Ment Health 2023;10:e51232

DOI: 10.2196/51232

PMID: 37728984

PMCID: 10551796

“Between Lines of Code”: Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 vs ChatGPT-4- A vignette study

  • Inbar Levkovich; 
  • Zohar Elyoseph

ABSTRACT

Background:

ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT’s practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. A previous study found that ChatGPT-3.5 (March 14, 2023) underestimated suicide risk and associated factors. Due to rapid technological progress in the AI field, and in view of ChatGPT-4’s launching, we aimed to re-evaluate ChatGPT’s ability to perform risk assessments of suicidal behavior and relevant risk factors.

Objective:

The study’s aim was to evaluate ChatGPT’s ability to assess suicide risk, taking into consideration two discernable factors – perceived burdensomeness and thwarted belongingness – over a two-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5.

Methods:

ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Utilizing both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed three evaluative procedures in June-July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version).

Results:

We found a notable alignment between the likelihood of suicide attempts as evaluated by ChatGPT-4 and as evaluated by mental health professionals under all conditions tested. Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (both versions), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals. The empirical evidence suggests that the incidence of suicidal ideation and psychache, as evaluated by ChatGPT-4, surpassed the estimations of the mental health professionals. Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by the mental health professionals.

Conclusions:

The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4’s potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one’s actual suicide risk level.


 Citation

Please cite as:

Levkovich I, Elyoseph Z

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study

JMIR Ment Health 2023;10:e51232

DOI: 10.2196/51232

PMID: 37728984

PMCID: 10551796

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.