Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 8, 2020
Date Accepted: Jul 7, 2020
Date Submitted to PubMed: Jul 14, 2020

The final, peer-reviewed published version of this preprint can be found here:

Topic Modeling of Social Networking Service Data on Occupational Accidents in Korea: Latent Dirichlet Allocation Analysis

Min KB, Song SH, Min JY

Topic Modeling of Social Networking Service Data on Occupational Accidents in Korea: Latent Dirichlet Allocation Analysis

J Med Internet Res 2020;22(8):e19222

DOI: 10.2196/19222

PMID: 32663156

PMCID: 7453332

Topic modeling of social networking service data on occupational accidents: LDA analysis

  • Kyoung-Bok Min; 
  • Sung-Hee Song; 
  • Jin-Young Min

ABSTRACT

Background:

In most industrialized societies, there are regulations, inspections, insurance, and legal options to support workers who suffer from injury, disease, or death in relation to their work; in practice these resources are imperfect and or even being unavailable due to workplace or employer obstruction. Thus, limitations exist to identify unmet needs in occupational safety and health information.

Objective:

This study was aimed to explore hidden issues in occupational accidents from social network services (SNS) data using topic modeling.

Methods:

We collected 15,244 documents from SNSs on occupational accidents-related queries between 2002 and 2018. To transform unstructured text into structure data, natural language processing of the Korean language was conducted. We performed the Latent Dirichlet allocation (LDA) as a topic model using Python library. A time-series linear regression analysis was also conducted to identify yearly trends for the given documents.

Results:

Results of LDA model showed 14 topics with four themes: Theme 1, Workers’ compensation benefits (Topic 1), Theme 2, Illicit agreements with the employer (Topics 2-3), and Theme 3, Fatal and non-fatal injuries and vulnerable workers (Topics 4-14). In the yearly trend, Theme 1 gradually decreased, but other themes showed an overall increasing pattern. Increases in topics 3 (physical trauma), 5 (fatal injury), 6 (lower extremity injury), 7 (restaurant workers), 8 (construction workers), 9 (fracture), 10 (labor-management conflict), 11 (vulnerable jobs), 12 (vulnerable jobs), and 14 (others) were particularly dominant over time.

Conclusions:

We explored hidden issues of occupational accidents from SNS data, specifically workers’ compensation benefits, illicit agreement, and fatal and non-fatal injuries and vulnerable workers. While traditional systems focus mainly on quantitative monitoring of occupational accidents, qualitative aspects formulated by topic modeling from unstructured SNS queries may be valuable for tackling inequality and improving occupational health and safety.


 Citation

Please cite as:

Min KB, Song SH, Min JY

Topic Modeling of Social Networking Service Data on Occupational Accidents in Korea: Latent Dirichlet Allocation Analysis

J Med Internet Res 2020;22(8):e19222

DOI: 10.2196/19222

PMID: 32663156

PMCID: 7453332

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.