JMIR Preprints #56413: Performance of Large Language Models in Patient Complaint Resolution: A single-blind comparative evaluation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Performance of Large Language Models in Patient Complaint Resolution: A single-blind comparative evaluation

Lorraine Pei Xian Yong;
Joshua Yi Min Tung;
Zi Yao Lee;
Win Sen Kuan;
Mui Teng Chua

ABSTRACT

Background:

Patient complaints are a perennial challenge faced by healthcare institutions globally, requiring extensive time and effort from healthcare workers. Despite these efforts, patient dissatisfaction remains high. Recent studies on the utility of Large Language Models (LLMs) such as the GPT models developed by OpenAI in the healthcare sector have shown great promise, with the ability to provide more detailed and empathetic responses as compared to physicians. LLMs could potentially be utilized in responding to patient complaints to improve patient satisfaction and complaint response time.

Objective:

This study aimed to evaluate the performance of LLMs in addressing patient complaints received by a tertiary healthcare institution, with the goal of enhancing patient satisfaction.

Methods:

Anonymized patient complaint emails and associated responses from the Patient Relations Department (PRD) were obtained. ChatGPT-4.0 was provided with the same complaint email and tasked to generate a response. The complaints and the respective responses were uploaded onto a web-based questionnaire. Respondents were asked to rate both responses on a 10-point Likert scale for 4 items: appropriateness, completeness, empathy, and satisfaction. Participants were also asked to choose a preferred response at the end of each scenario.

Results:

There were a total of 188 respondents, of which 61.2% were healthcare workers. A majority of the respondents, including both healthcare and non-healthcare workers, preferred replies from ChatGPT (87.2% to 97.3%). GPT4.0 responses were rated higher in all four assessed items (P <.001), and had higher average wordcounts as compared to human responses (238 to 76 words). Regression analyses showed that a higher word count was a statistically significant predictor (P <.001) of higher score in all 4 items. However, on subgroup analysis by authorship, this only held true for responses written by PRD staff and not those generated by ChatGPT which received consistently high scores irrespective of response length.

Conclusions:

This study provides significant evidence supporting the effectiveness of LLMs in patient complaint resolution. ChatGPT demonstrated superiority in terms of response appropriateness, empathy, quality, and overall satisfaction when compared against actual human responses to patient complaints. Future research can be done to measure the degree of improvement that artificial intelligence (AI) generated responses can bring in terms of time savings, patient satisfaction and stress reduction for healthcare workers.

Citation

Please cite as:

Yong LPX, Tung JYM, Lee ZY, Kuan WS, Chua MT

Performance of Large Language Models in Patient Complaint Resolution: Web-Based Cross-Sectional Survey

J Med Internet Res 2024;26:e56413

DOI: 10.2196/56413

PMID: 39121468

PMCID: 11344182

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 17, 2024

Open Peer Review Period: Jan 18, 2024 - Mar 14, 2024

Date Accepted: Jul 5, 2024

(closed for review but you can still tweet)

Performance of Large Language Models in Patient Complaint Resolution: A single-blind comparative evaluation

ABSTRACT

Citation

Copyright