JMIR Preprints #67879: Chatbots in healthcare: A study of readability and response accuracy in answers to questions about hypertension.

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Chatbots in healthcare: A study of readability and response accuracy in answers to questions about hypertension.

Robert Olszewski;
Jakub Brzeziński;
Klaudia Watros;
Małgorzata Mańczak;
Jakub Owoc;
Krzysztof Jeziorski

ABSTRACT

Background:

AI-powered chatbots, using Large Language Models, may effectively answer questions from patients with hypertension, providing responses that are accurate, empathetic, and easy to read.

Objective:

This study evaluates the performance of three such chatbots in delivering quality responses.

Methods:

One hundred questions were randomly selected from the Reddit forum r/hypertension and submitted to three publicly available chatbots (ChatGPT-3.5, Microsoft Copilot, Gemini), anonymized as A, B, and C. Two independent medical professionals assessed the accuracy and empathy of their responses using Likert scales. Additionally, 300 responses were analyzed with the WebFX readability tool to measure various readability indices.

Results:

In total, 300 responses were evaluated. Chatbot A generated the most extensive responses, with an average of 13 sentences per reply, while Chatbot B had the shortest replies. Chatbot C achieved the highest score on the Flesch Reading Ease Scale, indicating better readability, while Chatbot A scored the lowest. Other readability metrics, including the Flesch-Kincaid Grade Level, Gunning Fog Score, and others, also showed significant differences among the chatbots, reflecting variability in readability.

Conclusions:

The study indicates that while all chatbots can produce professional responses, their readability varies significantly. These findings underscore the potential of AI chatbots in patient education. However, they also highlight the urgent need for further optimization to enhance the comprehensibility of their outputs.

Citation

Please cite as:

Olszewski R, Brzeziński J, Watros K, Mańczak M, Owoc J, Jeziorski K

Chatbots in healthcare: A study of readability and response accuracy in answers to questions about hypertension.

JMIR Preprints. 23/10/2024:67879

DOI: 10.2196/preprints.67879

URL: https://preprints.jmir.org/preprint/67879

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Previously submitted to: Journal of Medical Internet Research (no longer under consideration since Feb 07, 2025)

Date Submitted: Oct 23, 2024

Open Peer Review Period: Oct 30, 2024 - Dec 25, 2024

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Chatbots in healthcare: A study of readability and response accuracy in answers to questions about hypertension.

ABSTRACT

Citation

Copyright