Accepted for/Published in: JMIR Medical Education
Date Submitted: Jul 27, 2023
Date Accepted: Dec 11, 2023
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an Artificial Intelligence-Based Chatbot
ABSTRACT
Background:
Regular physical activity is critical for health and disease prevention. Yet healthcare providers and patients face barriers to implement evidence-based lifestyle recommendations. The potential to augment care with the increased availability of artificial intelligence (AI) technologies is limitless; however, the suitability of AI-generated exercise recommendations has yet to be explored.
Objective:
The purpose of this study was to assess the comprehensiveness, accuracy, and readability of individualized exercise recommendations generated by a novel AI chatbot.
Methods:
A coding scheme was developed to score AI-generated exercise recommendations across ten categories informed by gold-standard exercise recommendations, including: 1) health condition-specific benefits of exercise, 2) exercise pre-participation health screening, 3) frequency, 4) intensity, 5) time, 6) type, 7) volume, 8) progression, 9) special considerations, and 10) references to primary literature. The AI chatbot was prompted to provide individualized exercise recommendations for 26 clinical populations using an open-source application programming interface. Two independent reviewers coded AI-generated content for each category and calculated comprehensiveness (%) and factual accuracy (%) on a scale of 0-100%. Readability was assessed using the Flesch-Kincaid formula. Qualitative analysis identified and categorized themes from AI-generated output.
Results:
AI-generated exercise recommendations were 41% comprehensive and 91% accurate, with the majority (53%) of inaccuracy related to the need for exercise pre-participation medical clearance. Average readability level of AI-generated exercise recommendations was at the college-level, with an average Flesch reading ease score of 31.1. Several recurring themes and observations of AI-generated output included concern for liability and safety; preference for aerobic exercise; and potential bias and direct discrimination against certain age-based populations and individuals with disabilities.
Conclusions:
There were notable gaps in comprehensiveness, accuracy, and readability of AI-generated exercise recommendations. Exercise and healthcare professionals should be aware of these limitations when using and/or endorsing AI-based technologies as a tool to support lifestyle change involving exercise.
Citation