Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 29, 2024
Open Peer Review Period: Aug 30, 2024 - Oct 25, 2024
Date Accepted: Jan 17, 2025
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Can we use Large Language Models (LLMs) to assess the chronic pain experience?
ABSTRACT
Background:
Chronic pain is a frequent problem in society, having an enormous impact. Tools that enhance the assessment to understand better people with pain experiences are essential to provide the care people need. In this line, a qualitative approach based in written narratives (WNs) from the people suffering chronic pain can be quite useful as supported in different studies. However, the assessment from this perspective can be time-consuming.
Objective:
This study explores the feasibility of employing LLMs to assess WNs of people with chronic pain. At the end, we want to evaluate the potential of applying this LLMs to assist clinicians in assessing patients’ pain.
Methods:
We performed an experiment based on a list of pain narratives made by people with fibromyalgia and qualitatively evaluated in Serrat et al.[17]. Focusing on pain severity and disability, we prompt GPT-4 to assign scores and scores' explanations, to these narratives. Then we quantitatively compare GPT-4 scores with experts' scores of the same narratives, employing statistical measures such as Pearson correlations, Root Mean Squared Error (RMSE), Gwet's AC2 and Krippendorff's α. Additionally, experts specialized in chronic pain conducted a qualitative analysis of the scores' explanation to assess their accuracy and potential applicability of GPT's analysis for future pain narrative evaluations.
Results:
Our analysis reveals that GPT-4's performance in assessing pain narratives yielded promising results. GPT-4 was comparable in terms of agreement with experts, correlations with standardized measurements, and error rates. Moreover, experts generally deemed the ratings provided by GPT-4, as well as the scores' explanation, to be adequate.
Conclusions:
These findings underline the potential of LLMs in facilitating the assessment of pain narratives, offering a novel approach to understanding and evaluating patient pain experiences. The integration of automated assessments through LLMs presents opportunities for streamlining and enhancing the evaluation process, paving the way for improved patient care and tailored interventions in the realm of chronic pain management.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.