JMIR Preprints #34492: Automatic Classification of Tweets about Eating Disorders: Traditional Machine Learning techniques and BERT models

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Automatic Classification of Tweets about Eating Disorders: Traditional Machine Learning techniques and BERT models

José Alberto Benítez-Andrades;
José Manuel Alija-Pérez;
Maria-Esther Vidal;
Rafael Pastor-Vargas;
María Teresa García-Ordás

ABSTRACT

Background:

Eating disorders are a disease that affects an increasing number of people. Social networks provide information that can help to solve problems in this domain. In our study, we use artificial intelligence applied to social media data on eating disorders (ED).

Objective:

We set out to find efficient models capable of categorizing tweets into four different binary categories in the ED domain. We sought to obtain the most accurate model with the best computational cost.

Methods:

For three consecutive months, 1,058,957 tweets related to eating disorders were collected. The data were preprocessed and a subset of tweets was labeled into four categories: (i) messages written by people suffering from ED, (ii) messages promoting suffering from ED, (iii) informative messages, and (iv) scientific or non-scientific messages. Following this, traditional machine learning and deep learning models capable of classifying text were applied to make use of this data.

Results:

Accuracies of between 87.5% and 94.7% were obtained in the four categorizations, with the BERT models getting the best score from among the machine learning and deep learning techniques applied. In particular, RoBERTa and DistilBERT are the highest-scoring models.

Conclusions:

The use of the machine and deep learning techniques for the classification of tweets related to eating disorders has shown promise in the field of detecting people who may suffer from some type of ED, as well as in three other categorizations. BERT models respond with better performance, although their computational cost is significantly higher than other traditional techniques.

Citation

Please cite as:

Benítez-Andrades JA, Alija-Pérez JM, Vidal ME, Pastor-Vargas R, García-Ordás MT

Traditional Machine Learning Models and Bidirectional Encoder Representations From Transformer (BERT)–Based Automatic Classification of Tweets About Eating Disorders: Algorithm Development and Validation Study

JMIR Med Inform 2022;10(2):e34492

DOI: 10.2196/34492

PMID: 35200156

PMCID: 8914746

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 3, 2021

Open Peer Review Period: Feb 6, 2022 - Apr 6, 2022

Date Accepted: Feb 1, 2022

(closed for review but you can still tweet)

Automatic Classification of Tweets about Eating Disorders: Traditional Machine Learning techniques and BERT models

ABSTRACT

Citation

Copyright