Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Nov 6, 2023
Date Accepted: Sep 27, 2024
Combining topic modelling, sentiment analysis and corpus linguistics to analyse unstructured online patient experience data: A case study of Modafinil experiences
ABSTRACT
Background:
Patient experience data gathered from social media platforms is a rich data source and can offer a truly patient-centred perspective on disease, treatments, the outcomes that they value and health service delivery. Current guidelines are often based on population level evidence, while health based qualitative studies have often been seen as anecdotal, unrepresentative and not generalizable across populations. This novel study examines how we can move towards combining personal evidence of a health effect from sufficient numbers of people to the point where it could be generalised and added to the existing population level evidence
Objective:
The two main aims of this study were to explore how combining unsupervised NLP with corpus linguistics on a large unstructured dataset of Modafinil experiences could be used to explore these perspectives and then to compare the findings with Cochrane meta-analyses looking at the effectiveness of the same drug. By comparing techniques, we also aimed to develop a methodology for this type of data.
Methods:
Using a csv dataset of 69022 posts, gathered from 790 different sources, we used a variety of NLP and corpus techniques to analyse the data. Data cleaning comprised both expansion and contraction of the fields, while maximising the context of each post. Combining Python for NLP techniques and SketchEngine for linguistics analysis, we compared topic-mining (TM) packages using LDA, NMF and word-embedding methods, and TextBlob and VADER for sentiment analysis. Corpus methods included collocation, concordance and 2-6 word ngram generation to show perceived causal inference. Topic-mining was used to map the posts to the themes/codes identified in previous work. These included the health condition or reason for taking it, impact of conditions / symptoms, dosage, reported side-effects, effectiveness, outcomes, and comparison with other interventions.
Results:
Posts were from 790 different sources. Post lengths had an interquartile range of 34-142 words. Parsing showed that Modafinil was used for 166 health conditions, with the most frequent being narcolepsy, multiple sclerosis, ADD, anxiety, sleep apnoea, depression, bipolar, ME/CFS, fibromyalgia and chronic disease. Word-embedding based topic-modelling was most useful, with 70% of posts mapping to the theme/codes. Sentiment analysis reflecting the effectiveness of Modafinil returned 65% positive, 6% neutral, 28% negative. Ngram frequencies were tabulated, with the tense and indicators of possible belief identified.
Conclusions:
The effectiveness of Modafinil for a wide range of conditions suggested by this study contrasts with the existing RCT and systematic review evidence that is used to determine treatment pathway options for clinicians which all conclude that there is either insufficient or low-quality evidence of effectiveness
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.