Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 20, 2024
Date Accepted: Jan 26, 2025

The final, peer-reviewed published version of this preprint can be found here:

Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study

Babinski T, Karley S, Cooper M, Shaik S, Wang YK

Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study

J Med Internet Res 2025;27:e53332

DOI: 10.2196/53332

PMID: 40607732

PMCID: 12271966

Exploring Inflammatory Bowel Disease Discourse on Reddit throughout the COVID-19 Pandemic using OpenAI’s GPT 3.5 Turbo Model: Classification Model Validation and Case Study

  • Tyler Babinski; 
  • Sara Karley; 
  • Marita Cooper; 
  • Salma Shaik; 
  • Y. Ken Wang

ABSTRACT

Background:

Inflammatory bowel disease (IBD) is a chronic autoimmune disorder with an increasing prevalence. Online communities have become vital for communication among IBD patients, especially throughout the COVID-19 pandemic. However, these interactions remain largely underexplored.

Objective:

This study aims to analyze community posts from three of the largest IBD support groups on Reddit between March 1, 2020, and December 31, 2022, using a pre-trained transformer model, and to validate the classification system's results via comparison to human scoring.

Methods:

We collected 53,333 posts and classified them using OpenAI's GPT-3.5 Turbo model to determine sentiment, categorize topics, and identify demographic information and COVID-19 mentions. Manual validation was performed on a subset of 397 posts to measure inter-rater agreement between human raters and the GPT-3.5 model.

Results:

Fleiss’ kappa and Gwet’s AC1 coefficients indicated a high level of agreement between raters, with values ranging from 0.53 to 0.91. Medication (n = 14,909) and Symptoms (n = 14,939) emerged as the most discussed topics. Most posts conveyed a neutral sentiment. While most users did not disclose their age, those who did primarily fell into the 20-29 (n = 2,392) and 30-39 (n = 859) age ranges. After an initial spike in posts within the first month, most posts did not reference the COVID-19 pandemic.

Conclusions:

Our study showcases the potential of generative pre-trained transformer models in processing and extracting insights from medical social media data. Future research can benefit from further sub-analyses of our validated dataset or utilize OpenAI’s model to analyze social media data for other conditions, particularly those where patient experiences are challenging to collect.


 Citation

Please cite as:

Babinski T, Karley S, Cooper M, Shaik S, Wang YK

Exploring Inflammatory Bowel Disease Discourse on Reddit Throughout the COVID-19 Pandemic Using OpenAI’s GPT-3.5 Turbo Model: Classification Model Validation and Case Study

J Med Internet Res 2025;27:e53332

DOI: 10.2196/53332

PMID: 40607732

PMCID: 12271966

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.