Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 29, 2025
Date Accepted: Dec 17, 2025
Developing a Quality Evaluation Index System for Health Conversational Artificial Intelligence: Mixed Methods Study
ABSTRACT
Background:
Effective communication is fundamental to healthcare, yet demographic transitions and a widening global health workforce gap are intensifying the imbalance between service demand and resource supply. Health Conversational Artificial Intelligence (HCAI) based on Large Language Models (LLMs) offers a potential pathway to improve accessibility and personalization of care. Nevertheless, the lack of a rigorous, user-centered evaluation framework limits systematic assessment of HCAI quality, raising concerns about safety, reliability, and clinical applicability.
Objective:
To establish a scientific and systematic quality evaluation index system for HCAI, providing a theoretical foundation and practical tool for the assessment and optimization of HCAI.
Methods:
Based on literature review, industry standards, and expert group discussions, a preliminary framework of the index system was established. Two rounds of Delphi expert consultations were then conducted to collect expert opinions. The Analytic Hierarchy Process (AHP) was applied to assign weights to indicators at each level, and the final content and structure of the index system were determined.
Results:
Both rounds of expert consultation achieved a 100% response rate. The authority coefficient (Cr) of the experts was 0.84 in both rounds. The Kendall’s W coefficient ranged from 0.14 to 0.20 in the first round and from 0.13 to 0.17 in the second round, both showing statistical significance (P < 0.05). The final HCAI quality evaluation index system consisted of 3 primary indicators, 7 secondary indicators, and 28 tertiary indicators. According to AHP weight calculations, the primary indicators were ranked in descending order as follows: ethics and compliance (0.4781), health consultation capability (0.4112), and user experience (0.1107).
Conclusions:
The evaluation index system constructed in this study demonstrates scientific validity and practical relevance. It provides a valuable reference for the quality assessment, model optimization, and regulatory oversight of HCAI systems.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.