JMIR Preprints #51383: Optimizing ChatGPT’s Interpretation and Reporting of Delirium Assessment Outcomes: An Exploratory Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Optimizing ChatGPT’s Interpretation and Reporting of Delirium Assessment Outcomes: An Exploratory Study

Yong K Choi;
Shih-Yin Lin;
Donna Marie Fick;
Richard W Shulman;
Sangil Lee;
Priyanka Shrestha;
Kate Santoso

ABSTRACT

Background:

Generative artificial intelligence (AI) and large language models, such as OpenAI's ChatGPT, have shown promising potential in supporting medical education and clinical decision making, given their vast knowledge base and natural language processing capabilities. As a general-purpose AI, ChatGPT is capable of completing a wide range of tasks including differential diagnosis without additional training. However, the specific application of ChatGPT in learning and applying a series of specialized, context-specific tasks mimicking the workflow of a human assessor, such as administering a standardized assessment questionnaire, followed by inputting assessment results in a standardized form, and interpretating assessment results strictly following credible, published scoring criteria, have not been thoroughly studied.

Objective:

This exploratory study aimed to (1) evaluate ChatGPT’s ability in learning and administering a standardized informant-based delirium assessment tool, specifically the Sour Seven Questionnaire, via content-specific training; and (2) optimize ChatGPT’s interpretation and reporting of the assessment results using a prompt engineering approach.

Methods:

Using prompt engineering, we provided context-specific training to ChatGPT-3.5 and ChatGPT-4, guiding the models to learn the assessment tool and subsequently identify and score delirium symptoms in clinical vignettes. Performance was compared with human expert scores, followed by iterative prompt optimization to minimize inconsistencies and errors.

Results:

Both ChatGPT models demonstrated promising proficiency in applying the Sour Seven Questionnaire to the vignettes, despite initial inconsistencies and errors. Performance notably improved through iterative prompt engineering, enhancing the models’ capacity to detect delirium symptoms and assign scores. Prompt optimizations included adjusting the scoring methodology to accept only definitive 'Yes' or 'No' responses, revising the evaluation prompt to mandate responses in a tabular format, and guiding the models to adhere to the two recommended actions specified in the Sour Seven Questionnaire.

Conclusions:

Our findings provide preliminary evidence supporting the potential utility of AI models like ChatGPT in administering standardized clinical assessment tools. The results highlight the significance of context-specific training and prompt engineering in harnessing the full potential of these AI models for healthcare applications. Despite the encouraging results, broader generalizability and further validation in real-world settings warrant additional research.

Citation

Please cite as:

Choi YK, Lin SY, Fick DM, Shulman RW, Lee S, Shrestha P, Santoso K

Optimizing ChatGPT’s Interpretation and Reporting of Delirium Assessment Outcomes: Exploratory Study

JMIR Form Res 2024;8:e51383

DOI: 10.2196/51383

PMID: 39353189

PMCID: 11480687

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jul 31, 2023

Date Accepted: Jun 4, 2024

Optimizing ChatGPT’s Interpretation and Reporting of Delirium Assessment Outcomes: An Exploratory Study

ABSTRACT

Citation

Copyright