JMIR Preprints #53332: Exploring Inflammatory Bowel Disease Discourse on Reddit throughout the COVID-19 Pandemic using OpenAI’s GPT 3.5 Turbo Model: Classification Model Validation and Case Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Exploring Inflammatory Bowel Disease Discourse on Reddit throughout the COVID-19 Pandemic using OpenAI’s GPT 3.5 Turbo Model: Classification Model Validation and Case Study

Tyler Babinski;
Sara Karley;
Marita Cooper;
Salma Shaik;
Y. Ken Wang

ABSTRACT

Background:

Inflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence. Online communities have become vital for communication among IBD patients, especially throughout the COVID-19 pandemic. However, these interactions remain largely underexplored.

Objective:

This study aims to analyze community posts from three of the largest IBD support groups on Reddit between March 1, 2020, and December 31, 2022, using a pre-trained transformer model, and to validate the classification system's results via comparison to human scoring.

Methods:

We collected 53,333 posts and classified them using OpenAI's GPT-3.5 Turbo model to determine sentiment, categorize topics, and identify demographic information and COVID-19 mentions. Manual validation was performed on a subset of 397 posts to measure inter-rater agreement between human raters and the GPT-3.5 model.

Results:

Fleiss’ kappa and Gwet’s AC1 coefficients indicated a high level of agreement between raters, with values ranging from 0.53 to 0.91. Medication (n = 14,909) and Symptoms (n = 14,939) emerged as the most discussed topics. Most posts conveyed a neutral sentiment. While most users did not disclose their age, those who did primarily fell into the 20-29 (n = 2,392) and 30-39 (n = 859) age ranges. After an initial spike in posts within the first month, most posts did not reference the COVID-19 pandemic.

Conclusions:

Our study showcases the potential of generative pre-trained transformer models in processing and extracting insights from medical social media data. Future research can benefit from further sub-analyses of our validated dataset or utilize OpenAI’s model to analyze social media data for other conditions, particularly those where patient experiences are challenging to collect.

Citation

Please cite as:

Babinski T, Karley S, Cooper M, Shaik S, Wang YK

Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study

J Med Internet Res 2025;27:e53332

DOI: 10.2196/53332

PMID: 40607732

PMCID: 12271966

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 20, 2024

Date Accepted: Jan 26, 2025

Exploring Inflammatory Bowel Disease Discourse on Reddit throughout the COVID-19 Pandemic using OpenAI’s GPT 3.5 Turbo Model: Classification Model Validation and Case Study

ABSTRACT

Citation

Copyright