Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 15, 2020
Date Accepted: Sep 6, 2020

The final, peer-reviewed published version of this preprint can be found here:

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach

Zhou S, Zhao Y, Bian J, Haynos AF, Zhang R

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach

JMIR Med Inform 2020;8(10):e18273

DOI: 10.2196/18273

PMID: 33124997

PMCID: 7665945

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Exploring Eating Disorder Topics from Twitter

  • Sicheng Zhou; 
  • Yunpeng Zhao; 
  • Jiang Bian; 
  • Ann F. Haynos; 
  • Rui Zhang

ABSTRACT

Background:

Eating disorder (ED) is a group of mental illness that severely damage people’s health. The social media has been an important data source for the research related to public health. Some studies have initially explored the discussion of ED in Twitter and found it a promising source to discover the factors that related to ED, which help to understand this group of diseases. An efficient method is needed to further identify and analyze the tweets relevant to ED.

Objective:

The aims of this study are (1) to develop and validate a machine learning-based classifier to identify the tweets related to ED, and (2) to explore topics related to ED tweets using the topic modeling method.

Methods:

We collected the potential ED-relevant tweets using the keywords formed in previous studies and annotated the tweets with different labels. Several supervised machine learning methods, such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Support Vector Machine (SVM) and Naïve Bayes (NB), were developed and evaluated using the annotation data. We used the classifier with the best performance to identify the ED-relevant tweets, and applied topic model method to analyze the contents of the identified tweets. The results of Correlation Explanation (CorEx) topic model was reviewed and evaluated by a domain expert.

Results:

We developed a CNN-LSTM classifier to identify the ED-relevant tweets in two steps, with the F-scores equal to 0.90 and 0.86 respectively. Totally 33,017 ED-related tweets were identified using the CNN-LSTM classifier. Another set of tweets posted by potential ED users were identified by manually specified rules (17,632 ED-relevant and 83,557 ED-irrelevant). The topic model identified 162 topics. Overall coherence rate for the topic modeling was 78.6%, which indicates the high quality of the produced topics. The topics were further reviewed and analyzed by the domain expert.

Conclusions:

A developed CNN-LSTM classifier could improve the efficiency of identifying the ED-relevant tweets compared to the traditional manual based method. The CorEx topic model was applied on the tweets identified by the classifier and traditional manual based method separately, high overlapped topics were produced. The produced topics were further reviewed by the domain expert, some features of the potential ED users of Twitter were identified, which help people better understand the ED in public.


 Citation

Please cite as:

Zhou S, Zhao Y, Bian J, Haynos AF, Zhang R

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach

JMIR Med Inform 2020;8(10):e18273

DOI: 10.2196/18273

PMID: 33124997

PMCID: 7665945

Per the author's request the PDF is not available.