Currently submitted to: JMIR Formative Research
Date Submitted: Apr 1, 2026
Open Peer Review Period: Apr 15, 2026 - Jun 10, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Synthetic Content Validation of Pediatric Trust Instruments Using Persona-Driven Large Language Models
ABSTRACT
Background:
Large language models (LLMs) could streamline healthcare instrument validation by serving as scalable, systematic expert panels and qualitative researcher surrogates. This is particularly relevant as traditional instrument development is time- and resource-intensive. Currently, there is a significant gap in validated trust instruments for pediatric emergency and surgical contexts. Such tools are key because trust is foundational to the patient family-physician relationship and is associated with improved care-seeking and treatment adherence.
Objective:
This study had two objectives: (1) to develop and validate new trust instruments through a synthetic instrument validation (SIV) approach that integrates human and LLM capabilities, and (2) to evaluate appropriate use cases for LLMs in psychometric assessment.
Methods:
Two new trust instruments were developed, one for patient families and one for physicians. In phase one, the instruments underwent a two-stage content validation process using parallel synthetic and human expert panels (16 synthetic personas and 10 human experts across validation stages). Synthetic panels consisted of three persona-prompted LLMs (Claude Sonnet 4, GPT-5, Grok 4), with human panels serving as comparators. The Scale-Content Validity Index (S-CVI) and Fleiss' kappa (????) acceptance thresholds were set at ≥0.80. In phase two, LLM performance in quantitative research tasks was evaluated. The patient family instrument underwent Flesch-Kincaid assessment, and both instruments underwent cosine similarity analyses using three parallel methods: algorithmic, LLM-instructed, and LLM-derived.
Results:
In phase one, human–synthetic expert panels demonstrated substantial inter-rater reliability across both instruments. Fleiss' ???? values for dimensional validation were 0.84 (95% CI [0.72, 0.96]) for the patient family and 0.87 (95% CI [0.72, 1.00]) for the physician. For contextual validation, ???? values were 0.83 (95% CI [0.73, 0.93]) and 0.88 (95% CI [0.80, 0.96]), respectively. All instrument sections exceeded S-CVI ≥0.80 thresholds across both stages. Phase two Flesch-Kincaid metrics converged across all three methods (grade level 8.1 ± 1.1; readability score 60.1 ± 5.6), meeting accessibility standards and demonstrating methodological similarity. In contrast, cosine similarity analyses revealed significant LLM quantitative limitations, necessitating reliance on the algorithmic method alone, achieving a maximum cosine similarity of 0.83, indicating acceptable item distinctiveness overall.
Conclusions:
Persona-prompted LLMs effectively performed subjective psychometric assessments and reduced timelines from months to weeks, but showed limitations in quantitative computations. This suggests that LLMs currently excel in qualitative assessments, while falling short in rule-based and deterministic computations. These findings help establish task-dependent boundaries for LLM integration in psychometric research, necessitating selective human-LLM collaboration. This hybrid SIV framework shows potential to accelerate healthcare instrument development while maintaining validation rigor.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.