Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Mar 27, 2024
Date Accepted: Aug 1, 2024
“Doctor ChatGPT, can you help me?” The patient’s perspective: A cross-sectional study
ABSTRACT
Background:
Artificial intelligence (AI) and the language models derived from it, such as ChatGPT, offer immense possibilities, particularly in the field of medicine. It is already evident that ChatGPT can provide adequate and, in some cases, expert-level responses to health-related queries and advice for patients. However, it is currently unknown how patients perceive these capabilities, whether they can derive benefit from them, and if potential risks, such as harmful suggestions, are detected by patients.
Objective:
To clarify, whether patients can get useful and safe health care advice from an AI chatbot assistant.
Methods:
This cross-sectional study was conducted using 100 publicly available health-related questions of five medical specialties (trauma and general surgery, otolaryngology, pediatrics, and internal medicine) from an online platform for patients. Responses generated by ChatGPT-4 and by an expert panel (EP) consisting of experienced physicians from the aforementioned online platform were packed into 10 sets consisting of 10 questions each. The blinded evaluation was done by patients or non-medical participants regarding empathy and quality (assessed through the question "Would this answer have helped you?") on a scale from 1 to 5. As a control, evaluation was also performed by three doctors in each respective medical specialty, who were additionally asked about the potential harm of the response and its correctness.
Results:
In total, 200 sets of questions were submitted by 64 participants (age 45.7 ± 15.9 (Mean ± SD), 46.8% male) resulting in 2000 evaluated answers of ChatGPT and the EP each. ChatGPT scored higher in terms of empathy (4.18 vs. 2.7; P<.001) and quality (4.04 vs. 2.98; P<.001). Subanalysis showed a small favor in terms of levels of empathy given by women in comparison to men (4.46 vs. 4.14, P=.049). Ratings of ChatGPT were high regardless of the participant’s age. The same highly significant results were observed in the evaluation of the respective specialist doctors. Chat-GPT outperformed significantly in correctness (4.51 vs. 3.55; p<.001). Specialists rated the quality (3.93 vs. 4.59) and correctness (4.62 vs. 3.84) significantly lower in potentially harmful responses from ChatGPT (p<.001). This was not the case among non-medical participants (empathy: 4.22 vs. 4.17, P=.63; quality: 4.16 vs. 4.03; P=.27).
Conclusions:
The results indicate that ChatGPT is capable of supporting patients in health-related queries better than physicians, at least in terms of written advice through an online platform. ChatGPT had a low percentage of potentially harmful advice – even lower than the percentage of online physicians. Alarmingly, patients are not able to independently recognize these potential dangers.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.