JMIR Preprints #51232: “Between Lines of Code”: Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 vs ChatGPT-4- A vignette study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

“Between Lines of Code”: Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 vs ChatGPT-4- A vignette study

Inbar Levkovich;
Zohar Elyoseph

ABSTRACT

Background:

ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT’s practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. A previous study found that ChatGPT-3.5 (March 14, 2023) underestimated suicide risk and associated factors. Due to rapid technological progress in the AI field, and in view of ChatGPT-4’s launching, we aimed to re-evaluate ChatGPT’s ability to perform risk assessments of suicidal behavior and relevant risk factors.

Objective:

The study’s aim was to evaluate ChatGPT’s ability to assess suicide risk, taking into consideration two discernable factors – perceived burdensomeness and thwarted belongingness – over a two-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5.

Methods:

ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Utilizing both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed three evaluative procedures in June-July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version).

Results:

We found a notable alignment between the likelihood of suicide attempts as evaluated by ChatGPT-4 and as evaluated by mental health professionals under all conditions tested. Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (both versions), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals. The empirical evidence suggests that the incidence of suicidal ideation and psychache, as evaluated by ChatGPT-4, surpassed the estimations of the mental health professionals. Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by the mental health professionals.

Conclusions:

The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4’s potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one’s actual suicide risk level.

Citation

Please cite as:

Levkovich I, Elyoseph Z

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study

JMIR Ment Health 2023;10:e51232

DOI: 10.2196/51232

PMID: 37728984

PMCID: 10551796

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Mental Health

Date Submitted: Jul 25, 2023

Open Peer Review Period: Jul 25, 2023 - Aug 9, 2023

Date Accepted: Aug 24, 2023

(closed for review but you can still tweet)

“Between Lines of Code”: Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 vs ChatGPT-4- A vignette study

ABSTRACT

Citation

Copyright