Accepted for/Published in: JMIR Formative Research
Date Submitted: Aug 7, 2024
Date Accepted: Jan 20, 2025
Comparison of ChatGPT and internet research for clinical research and decision-making in occupational medicine: a randomised controlled trial
ABSTRACT
Background:
Artificial intelligence (AI) is becoming part of our everyday lives through implementation in algorithms or technology. Its use is also being tested in the medical field. Large Language Models (LLM) as generative AI such as GPT-4 or the product ChatGPT based on it are being used more due to their increasing performance and reliability. However, their use in specific medical areas such as occupational medicine is still largely unexplored.
Objective:
he objective of this study was to assess the potential suitability of generative LLM, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany.
Methods:
In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Physicians and medical students (n = 56) were asked to work on three cases of occupational lung diseases and answer case-related questions. They were divided into two groups: One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss or DocCheck. The responses were compared quantitatively. Before and after case processing participants were asked for a self-assessment of their occupational medicine expertise. The conversations of the ChatGPT group were logged and also entered into other LLM to compare their outputs.
Results:
Participants of the ChatGPT showed better performance in specific research, e.g. for potential hazardous substances or activities (Case 1: p = .01, Cohen’s r =-.38), and led to an increase in self-assessment with regard to specialist knowledge (p = .047). However, clinical decisions, e.g. whether an occupational disease report should be filed, were more often made correctly as a result of the participants' own research (Case 1: p = .007, OR (CI95%) = 6.00 (1.54 – 23.36)).
Conclusions:
ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the LLM. Future systems should be critically assessed, even if initial results are promising.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.