JMIR Preprints #68486: Virtual patients using large language models: Scalable, contextualized simulation of clinician-patient dialog with feedback

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Virtual patients using large language models: Scalable, contextualized simulation of clinician-patient dialog with feedback

David A Cook;
Joshua Overgaard;
V. Shane Pankratz;
Guilherme Del Fiol;
Chris A. Aakre

ABSTRACT

Background:

Virtual patients (VPs) are computer screen-based simulations of patient-clinician encounters. VP use is limited by cost and low scalability.

Objective:

Show proof-of-concept that VPs powered by large language models (LLMs) generate authentic dialogs, accurate representations of patient preferences, and personalized feedback on clinical performance; and explore LLMs for rating dialog and feedback quality.

Methods:

We conducted an intrinsic evaluation study rating 60 VP-clinician conversations. We used carefully engineered prompts to direct OpenAI Generative Pre-trained Transformer (GPT) to emulate a patient and provide feedback. Using 2 outpatient medicine topics (chronic cough [diagnosis] and diabetes [management]), each with permutations representing different patient preferences, we created 60 conversations (dialogs plus feedback): 48 with a human clinician, and 12 "self-chat" dialogs with GPT role-playing both the VP and clinician. Primary outcomes were dialog authenticity and feedback quality, rated using novel instruments meticulously grounded in empirical and conceptual work. Each conversation was rated by 3 physicians and also by GPT. Secondary outcomes included patient preferences represented in the dialogs, cost, and user experience.

Results:

The average cost per conversation was $0.51 for GPT-4.0-turbo and $0.02 for GPT-3.5-turbo. Conversation ratings (maximum 6) were mean (SD) overall authenticity 4.7 (0.7); overall user experience 4.9 (0.7); and average feedback 4.7 (0.6). For dialogs created using GPT-4.0-turbo, physician ratings of patient preferences aligned with intended preferences in 20-47 of 48 dialogs (42-98%). Subgroup comparisons revealed higher ratings for dialogs using GPT-4.0-turbo vs GPT-3.5-turbo, and for human-generated vs self-chat dialogs. Feedback ratings were similar for human-generated vs GPT-generated ratings, whereas authenticity ratings were significantly lower.

Conclusions:

LLM-powered VPs can simulate patient-clinician dialogs, demonstrably represent patient preferences, and provide personalized performance feedback. This approach is scalable, globally-accessible, and inexpensive. LLM-generated ratings of feedback quality are similar to human ratings. Clinical Trial: None

Citation

Please cite as:

Cook DA, Overgaard J, Pankratz VS, Del Fiol G, Aakre CA

Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback

J Med Internet Res 2025;27:e68486

DOI: 10.2196/68486

PMID: 39854611

PMCID: 12008702

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 6, 2024

Open Peer Review Period: Nov 7, 2024 - Jan 2, 2025

Date Accepted: Jan 13, 2025

Date Submitted to PubMed: Jan 24, 2025

(closed for review but you can still tweet)

Virtual patients using large language models: Scalable, contextualized simulation of clinician-patient dialog with feedback

ABSTRACT

Citation

Copyright