Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 8, 2020
Date Accepted: Jul 7, 2020
Date Submitted to PubMed: Jul 14, 2020
Topic modeling of social networking service data on occupational accidents: LDA analysis
ABSTRACT
Background:
In most industrialized societies, there are regulations, inspections, insurance, and legal options to support workers who suffer from injury, disease, or death in relation to their work; in practice these resources are imperfect and or even being unavailable due to workplace or employer obstruction. Thus, limitations exist to identify unmet needs in occupational safety and health information.
Objective:
This study was aimed to explore hidden issues in occupational accidents from social network services (SNS) data using topic modeling.
Methods:
We collected 15,244 documents from SNSs on occupational accidents-related queries between 2002 and 2018. To transform unstructured text into structure data, natural language processing of the Korean language was conducted. We performed the Latent Dirichlet allocation (LDA) as a topic model using Python library. A time-series linear regression analysis was also conducted to identify yearly trends for the given documents.
Results:
Results of LDA model showed 14 topics with four themes: Theme 1, Workers’ compensation benefits (Topic 1), Theme 2, Illicit agreements with the employer (Topics 2-3), and Theme 3, Fatal and non-fatal injuries and vulnerable workers (Topics 4-14). In the yearly trend, Theme 1 gradually decreased, but other themes showed an overall increasing pattern. Increases in topics 3 (physical trauma), 5 (fatal injury), 6 (lower extremity injury), 7 (restaurant workers), 8 (construction workers), 9 (fracture), 10 (labor-management conflict), 11 (vulnerable jobs), 12 (vulnerable jobs), and 14 (others) were particularly dominant over time.
Conclusions:
We explored hidden issues of occupational accidents from SNS data, specifically workers’ compensation benefits, illicit agreement, and fatal and non-fatal injuries and vulnerable workers. While traditional systems focus mainly on quantitative monitoring of occupational accidents, qualitative aspects formulated by topic modeling from unstructured SNS queries may be valuable for tackling inequality and improving occupational health and safety.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.