Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 6, 2023
Date Accepted: Sep 27, 2024

The final, peer-reviewed published version of this preprint can be found here:

Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences

Walsh J, Cave J, Griffiths F

Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences

J Med Internet Res 2024;26:e54321

DOI: 10.2196/54321

PMID: 39662896

PMCID: 11669883

Combining topic modelling, sentiment analysis and corpus linguistics to analyse unstructured online patient experience data: A case study of Modafinil experiences

  • Julia Walsh; 
  • Jonathan Cave; 
  • Frances Griffiths

ABSTRACT

Background:

Patient experience data gathered from social media platforms is a rich data source and can offer a truly patient-centred perspective on disease, treatments, the outcomes that they value and health service delivery. Current guidelines are often based on population level evidence, while health based qualitative studies have often been seen as anecdotal, unrepresentative and not generalizable across populations. This novel study examines how we can move towards combining personal evidence of a health effect from sufficient numbers of people to the point where it could be generalised and added to the existing population level evidence

Objective:

The two main aims of this study were to explore how combining unsupervised NLP with corpus linguistics on a large unstructured dataset of Modafinil experiences could be used to explore these perspectives and then to compare the findings with Cochrane meta-analyses looking at the effectiveness of the same drug. By comparing techniques, we also aimed to develop a methodology for this type of data.

Methods:

Using a csv dataset of 69022 posts, gathered from 790 different sources, we used a variety of NLP and corpus techniques to analyse the data. Data cleaning comprised both expansion and contraction of the fields, while maximising the context of each post. Combining Python for NLP techniques and SketchEngine for linguistics analysis, we compared topic-mining (TM) packages using LDA, NMF and word-embedding methods, and TextBlob and VADER for sentiment analysis. Corpus methods included collocation, concordance and 2-6 word ngram generation to show perceived causal inference. Topic-mining was used to map the posts to the themes/codes identified in previous work. These included the health condition or reason for taking it, impact of conditions / symptoms, dosage, reported side-effects, effectiveness, outcomes, and comparison with other interventions.

Results:

Posts were from 790 different sources. Post lengths had an interquartile range of 34-142 words. Parsing showed that Modafinil was used for 166 health conditions, with the most frequent being narcolepsy, multiple sclerosis, ADD, anxiety, sleep apnoea, depression, bipolar, ME/CFS, fibromyalgia and chronic disease. Word-embedding based topic-modelling was most useful, with 70% of posts mapping to the theme/codes. Sentiment analysis reflecting the effectiveness of Modafinil returned 65% positive, 6% neutral, 28% negative. Ngram frequencies were tabulated, with the tense and indicators of possible belief identified.

Conclusions:

The effectiveness of Modafinil for a wide range of conditions suggested by this study contrasts with the existing RCT and systematic review evidence that is used to determine treatment pathway options for clinicians which all conclude that there is either insufficient or low-quality evidence of effectiveness


 Citation

Please cite as:

Walsh J, Cave J, Griffiths F

Combining Topic Modeling, Sentiment Analysis, and Corpus Linguistics to Analyze Unstructured Web-Based Patient Experience Data: Case Study of Modafinil Experiences

J Med Internet Res 2024;26:e54321

DOI: 10.2196/54321

PMID: 39662896

PMCID: 11669883

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.