Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Development of Philippine depression datasets and language model for depression detection
ABSTRACT
Background:
Depression detection in social media has gained attention in recent years with the help of Natural Language Processing (NLP) techniques.
Objective:
To develop solutions to identify depression patterns through NLP and machine learning, valid datasets need to be constructed.
Methods:
The proposed process included the implementation of clinical screening methods with the help of clinical psychologists in the recruitment of study participants. A total of 76 participants were assessed by clinical psychologists and provided their Twitter data: 61 with depression and 15 with no depression. A dataset was developed consisting of depression symptom annotated tweets with 13 depression categories. These were created through manual annotation in a process constructed, guided, and validated by clinical psychologists.
Results:
Three (3) annotators completed the process for a total of 86,163 tweets, resulting in a substantial inter-annotator agreement score of 0.736 using Fleiss kappa, and a 95.71% psychologist validation score. A word2vec language model was developed using Filipino and English datasets to create a 300-feature word embedding that can be used in various machine learning techniques for NLP.
Conclusions:
This study contributes to depression research by constructing depression datasets from social media to aid NLP in the Philippine setting.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.