Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 29, 2025
Date Accepted: Dec 17, 2025

The final, peer-reviewed published version of this preprint can be found here:

Developing a Quality Evaluation Index System for Health Conversational Artificial Intelligence: Mixed Methods Study

Liao W, Li M, Ma C, Han Y, Wang D, Liu H, Wang Y, Feng Z, Wang H, Guan Y

Developing a Quality Evaluation Index System for Health Conversational Artificial Intelligence: Mixed Methods Study

J Med Internet Res 2026;28:e83188

DOI: 10.2196/83188

PMID: 41554116

PMCID: 12865354

Developing a Quality Evaluation Index System for Health Conversational Artificial Intelligence: Mixed Methods Study

  • Weizhen Liao; 
  • Meng Li; 
  • Chengyu Ma; 
  • Youli Han; 
  • Dan Wang; 
  • Haopeng Liu; 
  • Yi Wang; 
  • Zijie Feng; 
  • Huichao Wang; 
  • Yiru Guan

ABSTRACT

Background:

Effective communication is fundamental to healthcare, yet demographic transitions and a widening global health workforce gap are intensifying the imbalance between service demand and resource supply. Health Conversational Artificial Intelligence (HCAI) based on Large Language Models (LLMs) offers a potential pathway to improve accessibility and personalization of care. Nevertheless, the lack of a rigorous, user-centered evaluation framework limits systematic assessment of HCAI quality, raising concerns about safety, reliability, and clinical applicability.

Objective:

To establish a scientific and systematic quality evaluation index system for HCAI, providing a theoretical foundation and practical tool for the assessment and optimization of HCAI.

Methods:

Based on literature review, industry standards, and expert group discussions, a preliminary framework of the index system was established. Two rounds of Delphi expert consultations were then conducted to collect expert opinions. The Analytic Hierarchy Process (AHP) was applied to assign weights to indicators at each level, and the final content and structure of the index system were determined.

Results:

Both rounds of expert consultation achieved a 100% response rate. The authority coefficient (Cr) of the experts was 0.84 in both rounds. The Kendall’s W coefficient ranged from 0.14 to 0.20 in the first round and from 0.13 to 0.17 in the second round, both showing statistical significance (P < 0.05). The final HCAI quality evaluation index system consisted of 3 primary indicators, 7 secondary indicators, and 28 tertiary indicators. According to AHP weight calculations, the primary indicators were ranked in descending order as follows: ethics and compliance (0.4781), health consultation capability (0.4112), and user experience (0.1107).

Conclusions:

The evaluation index system constructed in this study demonstrates scientific validity and practical relevance. It provides a valuable reference for the quality assessment, model optimization, and regulatory oversight of HCAI systems.


 Citation

Please cite as:

Liao W, Li M, Ma C, Han Y, Wang D, Liu H, Wang Y, Feng Z, Wang H, Guan Y

Developing a Quality Evaluation Index System for Health Conversational Artificial Intelligence: Mixed Methods Study

J Med Internet Res 2026;28:e83188

DOI: 10.2196/83188

PMID: 41554116

PMCID: 12865354

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.