Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 3, 2021
Open Peer Review Period: Feb 6, 2022 - Apr 6, 2022
Date Accepted: Feb 1, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)–Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

Benítez-Andrades JA, Alija-Pérez JM, Vidal ME, Pastor-Vargas R, García-Ordás MT

Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)–Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

JMIR Med Inform 2022;10(2):e34492

DOI: 10.2196/34492

PMID: 35200156

PMCID: 8914746

Automatic Classification of Tweets about Eating Disorders: Traditional Machine Learning techniques and BERT models

  • José Alberto Benítez-Andrades; 
  • José Manuel Alija-Pérez; 
  • Maria-Esther Vidal; 
  • Rafael Pastor-Vargas; 
  • María Teresa García-Ordás

ABSTRACT

Background:

Eating disorders are a disease that affects an increasing number of people. Social networks provide information that can help to solve problems in this domain. In our study, we use artificial intelligence applied to social media data on eating disorders (ED).

Objective:

We set out to find efficient models capable of categorizing tweets into four different binary categories in the ED domain. We sought to obtain the most accurate model with the best computational cost.

Methods:

For three consecutive months, 1,058,957 tweets related to eating disorders were collected. The data were preprocessed and a subset of tweets was labeled into four categories: (i) messages written by people suffering from ED, (ii) messages promoting suffering from ED, (iii) informative messages, and (iv) scientific or non-scientific messages. Following this, traditional machine learning and deep learning models capable of classifying text were applied to make use of this data.

Results:

Accuracies of between 87.5% and 94.7% were obtained in the four categorizations, with the BERT models getting the best score from among the machine learning and deep learning techniques applied. In particular, RoBERTa and DistilBERT are the highest-scoring models.

Conclusions:

The use of the machine and deep learning techniques for the classification of tweets related to eating disorders has shown promise in the field of detecting people who may suffer from some type of ED, as well as in three other categorizations. BERT models respond with better performance, although their computational cost is significantly higher than other traditional techniques.


 Citation

Please cite as:

Benítez-Andrades JA, Alija-Pérez JM, Vidal ME, Pastor-Vargas R, García-Ordás MT

Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)–Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

JMIR Med Inform 2022;10(2):e34492

DOI: 10.2196/34492

PMID: 35200156

PMCID: 8914746

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.