JMIR Preprints #17650: An automatic construction of depressing-domain lexicon based on microblogs

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

An automatic construction of depressing-domain lexicon based on microblogs

Genghao Li;
Bing Li;
Langlin Huang;
Sibing Hou

ABSTRACT

Background:

According to the WHO report in 2017, there will be almost one depression patient among every 20 people in China. Diagnosis of depression, however, is usually a hard work in clinical detection due to slow observation, expensive cost and patient resistance. Meanwhile, things are changing with the rapid emergence of social media. People tend to share their daily life and disclose inner feelings frequently, making it possible to have an effective mental detection using rich text information.

Objective:

However, in most of the researches so far, a lack of an efficient depressing-domain lexicon often leads to a bad result. To improve online depression detection, we aim to construct a lexicon in depressing domain based on microblogs we collected. Effective methods are also needed to obtain an automatic construction.

Methods:

We apply an auto-construction of depressing-domain lexicon that can be used for further detection using Word2Vec, semantic relationship graph and Label Propagation Algorithm (LPA). Those two methods combined can cover prior knowledge base and corpus base in specific corpus during construction. The lexicon is obtained based on 111,052 microblogs from 1,868 depressed and non-depressed users. There is no effective lexicon in other studies, and our construction method will make a great contribution in depressing domain.

Results:

In particular, we establish a well-labeled benchmark dataset of depressed and non-depressed. Experiment results show that in terms of F1 value, our auto-construction method performs 5% better than the baselines, and is more effective and steadier. When applied to detection models like Naive Bayes, Logistic Regression and Random Forest, our lexicon helps models outperform by 3-8%, and is able to improve the final accuracy for depression diagnosis in advanced detection.

Conclusions:

Lots of researches ignore the depressing-domain words on social media which can contribute greatly to the diagnosis. Our lexicon is proved to be a meaningful input of classification algorithms, providing insights in depressive status of test objects, so as to improve the final accuracy.

Citation

Please cite as:

Li G, Li B, Huang L, Hou S

Automatic Construction of a Depression-Domain Lexicon Based on Microblogs: Text Mining Study

JMIR Med Inform 2020;8(6):e17650

DOI: 10.2196/17650

PMID: 32574151

PMCID: 7381008

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 31, 2019

Date Accepted: May 5, 2020

An automatic construction of depressing-domain lexicon based on microblogs

ABSTRACT

Citation

Copyright