Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Dec 30, 2022
Date Accepted: Mar 16, 2023
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
A Case Study on Using Reddit Comments as a Source of Information on Patients’ Experiences and Concerns: Text Analyses on r/CysticFibrosis
ABSTRACT
Background:
The use of social media rose significantly during the COVID-19 pandemic, where people were using it to communicate and share information amidst pandemic disruptions. Rare disease patients have been utilizing social media as an information network since before the pandemic, providing valuable insight into patients' experiences in everyday life. One platform that proved relevant is Reddit. We thus examine the experiences of patients suffering from Cystic Fibrosis (CF), who face more vulnerability in the pandemic, given the overlap of symptoms. We look at the impact of COVID-19 on the discussion topics of CF patients.
Objective:
This study aims to identify the effect of COVID-19 on the discussion topics of the r/CysticFibrosis subreddit. We applied BERTopic models on posts and comments, and performed a time series analysis to identify topics that concerned COVID-19 pandemic disruption.
Methods:
We used the Pushshift Reddit API to scrape all comments from the subreddit r/CysticFibrosus until 31 August 2022. We removed duplicate comments, links, tags, and mentions of other users before applying a BERTopic model. We reduced the number of topics to a more manageable size of 22. We fitted an Autoregressive Integrated Moving Average (ARIMA) model for the denoised dataset, without considering the topics, and also for the subsetted data for each of the 22 topics. We assigned a dummy variable to indicate the COVID-19 pandemic period, which we specified as the months of 2020 and controlled for the effects of the number of authors to examine topical changes before and after this time point.
Results:
We collected 120,738 comments from 5,827 unique user IDs from 24 March 2011 until 31 August 2022. After fitting the BERTopic model and excluding outliers and noise, we were left with 42,060 comments categorized into 22 topics. The significance testing of the COVID-19 dummy variable resulted in a mix of positive and negative effects for the various topics.
Conclusions:
COVID-19, overall, had a negative effect on the number of comments in the subreddit r/CysticFibrosis. The mix of positive and negative effects of the COVID-19 dummy variable among the different BERTopic topics indicates a shift in discussion topics. We found that topics discussing medications like Trikafta and Tobramycin, lung transplants and respiration, gratitude, sweat testing, mutations, medical facilities, and inheritance of CF had decreased activity, while the topic discussing marijuana had increased activity during the COVID-19 pandemic.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.