JMIR Preprints #78838: Evaluating ChatGPT Responses on Scar or Keloid for Patient Education

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating ChatGPT Responses on Scar or Keloid for Patient Education

Mingjun Rao

ABSTRACT

Background:

Scars and keloids impose significant physical and psychological burdens on patients, often leading to functional limitations, cosmetic concerns, and mental health issues like anxiety or depression. Patients increasingly turn to online platforms for information, yet existing web-based resources on scars/keloids are frequently unreliable, fragmented, or difficult to understand. Large language models (LLMs) such as ChatGPT-4 show promise in delivering medical information, but their accuracy, readability, and potential for generating hallucinated content require validation for patient education applications.

Objective:

To systematically evaluate ChatGPT-4’s performance in providing patient education on scars and keloids, focusing on its accuracy, reliability, readability, and reference quality.

Methods:

This study involved collecting 354 questions from Reddit communities (r/Keloids, r/SCAR, r/PlasticSurgery), covering topics including treatment options, preoperative/postoperative care, and psychological impacts. Each question was input into ChatGPT-4 in independent sessions to mimic real-world patient interactions. Responses were evaluated using multiple tools: the PEMAT-AI for understandability and actionability, the DISCERN-AI for treatment information quality, the Global Quality Scale (GQS) for overall information excellence, and standard readability metrics (Flesch Reading Ease, Gunning Fog Index, etc.). Three plastic surgeons used the NLAT-AI tool to rate accuracy, safety, and clinical appropriateness, while REF-AI validated references for reference hallucination, relevance, and source quality.

Results:

ChatGPT-4 demonstrated high accuracy and reliability: PEMAT-AI showed 75.5% understandability, DISCERN-AI rated responses as "Good" (26.3/35), and GQS scored 4.28/5. Surgeons’ evaluations averaged 3.94–4.43/5 across dimensions, with strong internal consistency (Cronbach’s alpha = 0.81). Readability analyses indicated moderate complexity (Flesch: 50.13, Gunning Fog: 12.68), corresponding to a 12th-grade reading level. REF-AI identified 11.8% hallucinated references (383/3250), while 88.2% of references were real, with 95.1% from authoritative sources (e.g., government guidelines, literature).

Conclusions:

ChatGPT-4 exhibits substantial potential as a patient education tool for scars and keloids, offering reliable and accurate information. However, improvements in readability (to align with 6th–8th-grade standards) and reduction of reference hallucinations are essential to enhance accessibility and trustworthiness. Future LLM optimizations should prioritize simplifying medical language and strengthening reference validation mechanisms to maximize clinical utility. Clinical Trial: not applicable

Citation

Please cite as:

Rao M

Evaluating GPT-4 Responses on Scars or Keloids for Patient Education: Large Language Model Evaluation Study

JMIR Med Inform 2026;14:e78838

DOI: 10.2196/78838

PMID: 41773665

PMCID: 12954683

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 10, 2025

Date Accepted: Dec 29, 2025

Evaluating ChatGPT Responses on Scar or Keloid for Patient Education

ABSTRACT

Citation

Copyright