Advancing Clinical Chatbot Validation using AI-Powered Evaluation with a New Three-Bot Evaluation System
ABSTRACT
Background:
The healthcare sector faces a projected shortfall of 10 million workers by 2030. AI automation in patient education and initial therapy screening presents a strategic response to mitigate this shortage and reallocate medical staff to higher-priority tasks.
Objective:
This study introduces a novel three-bot method for efficiently testing and validating early-stage AI healthcare provider chatbots. To extensively test AI provider chatbots without involving real patients or researchers, various AI patient bots and an evaluator bot were developed.
Methods:
Provider bots interacted with AI patient bots embodying frustrated, anxious, or depressed personas. An evaluator bot reviewed interaction transcripts based on specific criteria. Human experts then reviewed each interaction transcript, and the evaluator bot’s results were compared to human evaluation results to ensure accuracy.
Results:
The patient-education bot demonstrated high competency in delivering accurate medical information, easy-to-understand explanations, and empathy. The screening bot excelled in maintaining effective communication, building relationships, and exploring emotions. Statistical analysis confirmed the reliability and accuracy of the AI evaluations.
Conclusions:
The innovative evaluation method ensures a safe and effective means to test and refine early versions of healthcare provider chatbots without risking patient safety or excessive time and effort from researchers. This method allows for rapid testing and validation of healthcare chatbots to automate basic medical tasks.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.