Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jun 20, 2024
Date Accepted: Jan 26, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Exploring Inflammatory Bowel Disease Discourse on Reddit throughout the COVID-19 Pandemic using OpenAI’s GPT 3.5 Turbo Model: Classification Model Validation and Case Study
ABSTRACT
Background:
Inflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence. Online communities have become vital for communication among IBD patients, especially throughout the COVID-19 pandemic. However, these interactions remain largely underexplored.
Objective:
This study aims to analyze community posts from three of the largest IBD support groups on Reddit between March 1, 2020, and December 31, 2022, using a pre-trained transformer model, and to validate the classification system's results via comparison to human scoring.
Methods:
We collected 53,333 posts and classified them using OpenAI's GPT-3.5 Turbo model to determine sentiment, categorize topics, and identify demographic information and COVID-19 mentions. Manual validation was performed on a subset of 397 posts to measure inter-rater agreement between human raters and the GPT-3.5 model.
Results:
Fleiss’ kappa and Gwet’s AC1 coefficients indicated a high level of agreement between raters, with values ranging from 0.53 to 0.91. Medication (n = 14,909) and Symptoms (n = 14,939) emerged as the most discussed topics. Most posts conveyed a neutral sentiment. While most users did not disclose their age, those who did primarily fell into the 20-29 (n = 2,392) and 30-39 (n = 859) age ranges. After an initial spike in posts within the first month, most posts did not reference the COVID-19 pandemic.
Conclusions:
Our study showcases the potential of generative pre-trained transformer models in processing and extracting insights from medical social media data. Future research can benefit from further sub-analyses of our validated dataset or utilize OpenAI’s model to analyze social media data for other conditions, particularly those where patient experiences are challenging to collect.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.