JMIR Preprints #80752: Can GPT Generate Medical Dialogue for Clinical Vignettes: An Evaluation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Can GPT Generate Medical Dialogue for Clinical Vignettes: An Evaluation

Yasutaka Yanagita;
Daiki Yokokawa;
Shiichi Ihara;
Ryo Yoshida;
Yoshihide Okano;
Takanori Uehara

ABSTRACT

Background:

Clinical vignettes often focus on prototypical presentations; require substantial time and effort to develop; and fail to represent patient diversity, the complexity of clinical conditions, patients’ perspectives, and the dynamic nature of physician–patient interactions.

Objective:

We evaluated the quality of physician–patient dialogues produced by generative AI in Japanese, focusing on their medical accuracy and overall appropriateness as medical interviews.

Methods:

We created an AI prompt that included a specific clinical history and instructed the model to simulate a cooperative patient responding to the physician’s questions to generate a physician–patient dialogue. The target diseases were those covered by the Japanese National Medical Licensing Examination. Each dialogue consisted of 25 turns by the physician and 25 by the patient, reflecting the typical volume of conversation in Japanese outpatient settings. Three internists independently evaluated each generated dialogue using a 7-point Likert scale across six criteria: coherence of the conversation, medical accuracy of the patient’s responses, medical accuracy of the physician’s responses, content of the medical history, communication skills, and professionalism. In addition, the composite score for each dialogue was calculated as the overall mean of these six criteria.

Results:

The mean scores (standard deviation) for the six criteria were as follows: coherence of the conversation: 5.9 (0.9); medical accuracy of the patient’s responses: 6.0 (0.9); medical accuracy of the physician’s responses: 5.6 (1.1); content of the medical history taking: 5.9 (0.9); communication skills: 5.6 (0.9); and professionalism: 5.5 (1.1). The composite score was 5.7 (1.0).

Conclusions:

While physician oversight remains essential, it is feasible to efficiently create AI-generated educational materials for medical education that overcome the limitations of traditional clinical vignettes. This approach may reduce time and financial burdens, enhancing opportunities to practice clinical interviewing in settings that closely mirror real-world encounters.

Citation

Please cite as:

Yanagita Y, Yokokawa D, Ihara S, Yoshida R, Okano Y, Uehara T

Quality Assessment of Large Language Model–Generated Medical Dialogue for Clinical Vignettes: Evaluation Study

JMIR Form Res 2025;9:e80752

DOI: 10.2196/80752

PMID: 41183323

PMCID: 12624296

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jul 16, 2025

Open Peer Review Period: Jul 17, 2025 - Sep 11, 2025

Date Accepted: Oct 15, 2025

(closed for review but you can still tweet)

Can GPT Generate Medical Dialogue for Clinical Vignettes: An Evaluation

ABSTRACT

Citation

Copyright