JMIR Preprints #79039: LLM-based Virtual Patient Systems for History-Taking in Medical Education: A Comprehensive Systematic Review

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

LLM-based Virtual Patient Systems for History-Taking in Medical Education: A Comprehensive Systematic Review

Dongliang Li;
Syaheerah Lebai Lutfi

ABSTRACT

Background:

Large language models (LLMs) like GPT-3.5 and GPT-4 are transforming virtual patient systems in medical education, offering scalable, cost-effective alternatives to standardized patients. However, systematic evaluations of their performance and limitations are limited.

Objective:

This review evaluates LLM-based virtual patient systems for medical history-taking, focusing on patient types and disease scope (RQ1), techniques enhancing history-taking (RQ2), experimental designs and metrics (RQ3), and public dataset characteristics (RQ4).

Methods:

Following PRISMA guidelines, we analyzed 34 studies (2020–May 2025) from nine databases (PubMed, Scopus, Web of Science, IEEE Xplore, ACM Digital Library, SpringerLink, ERIC, arXiv, Springer) using predefined keywords.

Results:

RQ1: Systems simulate mental health, chronic, neurological, and emergency cases but lack multimorbidity and diverse profiles, limiting applicability. RQ2: Techniques rely on prompt design; few-shot learning and multi-agent frameworks have limited impact. Knowledge graph (KG) integration boosts accuracy by 16.02%, and fine-tuning helps, but further exploration is needed. RQ3: Evaluations use 81.8% Top-1 accuracy, 4.5/5 empathy, 88.1 SUS scores, and 0.9412 robustness but lack standardization and use small samples (10–50 students, 3–5 experts). RQ4: Datasets (e.g., MIMIC-II) are restricted by privacy, hindering comparisons.

Conclusions:

LLM-based virtual patient systems demonstrate significant potential but face several limitations. Current systems predominantly focus on common diseases, lacking adequate simulation of multimorbidity, cultural diversity, and complex drug interactions, thereby reducing clinical realism. Existing datasets such as MIMIC-III are biased toward single-disease scenarios, English language, and critical care, neglecting broader linguistic and cultural contexts. Methodologically, long prompts suffer from primacy and recency effects, while few-shot learning encounters challenges in maintaining dialogue coherence. To address these issues, incorporating LLM-KG embedding methods into model training can enhance contextual understanding, while combining chain-of-thought reasoning with LoRA improves inference efficiency. Multi-agent frameworks with dialogue compression offer further optimization for real-time interactions. Future research should prioritize the development of open-access, multilingual datasets through ethical data augmentation and international collaboration, supported by regular bias audits to ensure fairness. Establishing unified evaluation frameworks with standardized metrics—such as Top-K accuracy, semantic similarity scores above 0.75, and SUS scores exceeding 80—will be essential for advancing realism, accuracy, and fairness in virtual patient systems. Clinical Trial: -

Citation

Please cite as:

Li D, Lebai Lutfi S

Large Language Model–Based Virtual Patient Systems for History-Taking in Medical Education: Comprehensive Systematic Review

JMIR Med Inform 2026;14:e79039

DOI: 10.2196/79039

PMID: 41481915

PMCID: 12811743

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 14, 2025

Open Peer Review Period: Jul 4, 2025 - Aug 29, 2025

Date Accepted: Oct 23, 2025

(closed for review but you can still tweet)

LLM-based Virtual Patient Systems for History-Taking in Medical Education: A Comprehensive Systematic Review

ABSTRACT

Citation

Copyright