Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 20, 2025
Date Accepted: May 16, 2025

The final, peer-reviewed published version of this preprint can be found here:

Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study

Koh SWC, Wong ERN, Tan JC, van der Lubbe SCC, Goh JC, Ching ESY, Chia IWY, Low SH, Ang PY, Quek Q, Motani M, Valderas JM

Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study

J Med Internet Res 2025;27:e74231

DOI: 10.2196/74231

PMID: 40768757

PMCID: 12327907

Patient Complaints Classification using Artificial Intelligence-Powered Large Language Models: An Analytical Cross-Sectional Study

  • Sky Wei Chee Koh; 
  • Eunice Rui Ning Wong; 
  • John Chongmin Tan; 
  • Stephanie C. C. van der Lubbe; 
  • Jun Cong Goh; 
  • Ethan Sheng Yong Ching; 
  • Ian Wen Yih Chia; 
  • Si Hui Low; 
  • Ping Young Ang; 
  • Queenie Quek; 
  • Mehul Motani; 
  • Jose Maria Valderas

ABSTRACT

Background:

Patient complaints offer actionable insights for quality improvement and safety. Artificial intelligence (AI) can facilitate the analysis of complaints, but its accuracy in categorizing complaints requires further evaluation.

Objective:

To categorise patient complaints in primary care using the Healthcare Complaint Analysis Tool (HCAT) General Practice (GP) and evaluate AI-powered categorization of complaints.

Methods:

This analytical cross-sectional study analysed 1,816 anonymous patient complaints from seven public primary care clinics in Singapore. Complaints were first coded by trained human coders using the HCAT (GP) taxonomy. Large language models (LLMs) (GPT (Generative Pre-trained Transformer )-3.5 turbo, GPT-4o mini, and Claude 3.5 Sonnet) were employed to validate manual classification and identify complaint themes. LLM classifications were assessed using accuracy, sensitivity, specificity, and F-scores. Cohen's kappa and McNemar's test evaluated AI-human agreement and compared AI model concordance.

Results:

Most complaints were related to management (59.4%) and institutional processes (45.7%), were of medium severity (54.7%), occurred within the practice (34.5%), and resulted in minimal harm (75.4%). LLM models achieved moderate to good accuracy (60.4%–95.5%) in HCAT (GP) field classifications, with GPT-4o mini generally outperforming GPT-3.5 turbo, except in severity classification. All three LLMs demonstrated moderate concordance rates (average 61.9%–68.8%) in complaints classification with varying levels of agreement (κ = 0.114–0.623). GPT-4o mini and Claude 3.5 Sonnet significantly outperformed GPT-3.5 turbo in several fields (p < 0.05). Claude’s thematic analysis identified long wait times (21.6%), staff attitudes (15.8%) and appointment booking issues (10.5%) as the top concerns, accounting for nearly half of all complaints.

Conclusions:

While GPT-4o and Claude 3.5 demonstrated promising results, further fine-tuning and model training is required to improve accuracy. Integrating AI into complaint analysis can facilitate proactive identification of systemic issues, ultimately enhancing quality improvement and patient safety.


 Citation

Please cite as:

Koh SWC, Wong ERN, Tan JC, van der Lubbe SCC, Goh JC, Ching ESY, Chia IWY, Low SH, Ang PY, Quek Q, Motani M, Valderas JM

Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study

J Med Internet Res 2025;27:e74231

DOI: 10.2196/74231

PMID: 40768757

PMCID: 12327907

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.