Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 6, 2024
Open Peer Review Period: Nov 7, 2024 - Jan 2, 2025
Date Accepted: Jan 13, 2025
Date Submitted to PubMed: Jan 24, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback

Cook DA, Overgaard J, Pankratz VS, Del Fiol G, Aakre CA

Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback

J Med Internet Res 2025;27:e68486

DOI: 10.2196/68486

PMID: 39854611

PMCID: 12008702

Virtual patients using large language models: Scalable, contextualized simulation of clinician-patient dialog with feedback

  • David A Cook; 
  • Joshua Overgaard; 
  • V. Shane Pankratz; 
  • Guilherme Del Fiol; 
  • Chris A. Aakre

ABSTRACT

Background:

Virtual patients (VPs) are computer screen-based simulations of patient-clinician encounters. VP use is limited by cost and low scalability.

Objective:

Show proof-of-concept that VPs powered by large language models (LLMs) generate authentic dialogs, accurate representations of patient preferences, and personalized feedback on clinical performance; and explore LLMs for rating dialog and feedback quality.

Methods:

We conducted an intrinsic evaluation study rating 60 VP-clinician conversations. We used carefully engineered prompts to direct OpenAI Generative Pre-trained Transformer (GPT) to emulate a patient and provide feedback. Using 2 outpatient medicine topics (chronic cough [diagnosis] and diabetes [management]), each with permutations representing different patient preferences, we created 60 conversations (dialogs plus feedback): 48 with a human clinician, and 12 "self-chat" dialogs with GPT role-playing both the VP and clinician. Primary outcomes were dialog authenticity and feedback quality, rated using novel instruments meticulously grounded in empirical and conceptual work. Each conversation was rated by 3 physicians and also by GPT. Secondary outcomes included patient preferences represented in the dialogs, cost, and user experience.

Results:

The average cost per conversation was $0.51 for GPT-4.0-turbo and $0.02 for GPT-3.5-turbo. Conversation ratings (maximum 6) were mean (SD) overall authenticity 4.7 (0.7); overall user experience 4.9 (0.7); and average feedback 4.7 (0.6). For dialogs created using GPT-4.0-turbo, physician ratings of patient preferences aligned with intended preferences in 20-47 of 48 dialogs (42-98%). Subgroup comparisons revealed higher ratings for dialogs using GPT-4.0-turbo vs GPT-3.5-turbo, and for human-generated vs self-chat dialogs. Feedback ratings were similar for human-generated vs GPT-generated ratings, whereas authenticity ratings were significantly lower.

Conclusions:

LLM-powered VPs can simulate patient-clinician dialogs, demonstrably represent patient preferences, and provide personalized performance feedback. This approach is scalable, globally-accessible, and inexpensive. LLM-generated ratings of feedback quality are similar to human ratings. Clinical Trial: None


 Citation

Please cite as:

Cook DA, Overgaard J, Pankratz VS, Del Fiol G, Aakre CA

Virtual Patients Using Large Language Models: Scalable, Contextualized Simulation of Clinician-Patient Dialogue With Feedback

J Med Internet Res 2025;27:e68486

DOI: 10.2196/68486

PMID: 39854611

PMCID: 12008702

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.