Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Dec 3, 2021
Open Peer Review Period: Feb 6, 2022 - Apr 6, 2022
Date Accepted: Feb 1, 2022
(closed for review but you can still tweet)
Automatic Classification of Tweets about Eating Disorders: Traditional Machine Learning techniques and BERT models
ABSTRACT
Background:
Eating disorders are a disease that affects an increasing number of people. Social networks provide information that can help to solve problems in this domain. In our study, we use artificial intelligence applied to social media data on eating disorders (ED).
Objective:
We set out to find efficient models capable of categorizing tweets into four different binary categories in the ED domain. We sought to obtain the most accurate model with the best computational cost.
Methods:
For three consecutive months, 1,058,957 tweets related to eating disorders were collected. The data were preprocessed and a subset of tweets was labeled into four categories: (i) messages written by people suffering from ED, (ii) messages promoting suffering from ED, (iii) informative messages, and (iv) scientific or non-scientific messages. Following this, traditional machine learning and deep learning models capable of classifying text were applied to make use of this data.
Results:
Accuracies of between 87.5% and 94.7% were obtained in the four categorizations, with the BERT models getting the best score from among the machine learning and deep learning techniques applied. In particular, RoBERTa and DistilBERT are the highest-scoring models.
Conclusions:
The use of the machine and deep learning techniques for the classification of tweets related to eating disorders has shown promise in the field of detecting people who may suffer from some type of ED, as well as in three other categorizations. BERT models respond with better performance, although their computational cost is significantly higher than other traditional techniques.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.