Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 19, 2019
Date Accepted: Aug 30, 2019

The final, peer-reviewed published version of this preprint can be found here:

Mining Hidden Knowledge About Illegal Compensation for Occupational Injury: Topic Model Approach

Min JY, Song SH, Kim H, Min KB

Mining Hidden Knowledge About Illegal Compensation for Occupational Injury: Topic Model Approach

JMIR Med Inform 2019;7(3):e14763

DOI: 10.2196/14763

PMID: 31573948

PMCID: 6787526

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Mining Hidden Knowledge About Illegal Compensation for Occupational Injury: Topic Model Approach

  • Jin-Young Min; 
  • Sung-Hee Song; 
  • HyeJin Kim; 
  • Kyoung-Bok Min

Background:

Although injured employees are legally covered by workers’ compensation insurance in South Korea, some employers make agreements to prevent the injured employees from claiming their compensation. Thus, this leads to underreporting of occupational injury statistics. Illegal compensation (called gong-sang in Korean) is a critical method used to underreport or cover-up occupational injuries. However, gong-sang is not counted in the official occupational injury statistics; therefore, we cannot identify gong-sang–related issues.

Objective:

This study aimed to analyze social media data using topic modeling to explore hidden knowledge about illegal compensation—gong-sang—for occupational injury in South Korea.

Methods:

We collected 2210 documents from social media data by filtering the keyword, gong-sang. The study period was between January 1, 2006, and December 31, 2017. After completing natural language processing of the Korean language, a morphological analyzer, we performed topic modeling using latent Dirichlet allocation (LDA) in the Python library, Gensim. A 10-topic model was selected and run with 3000 Gibbs sampling iterations to fit the model.

Results:

The LDA model was used to classify gong-sang–related documents into 4 categories from a total of 10 topics. Topic 1 was the greatest concern (60.5%). Workers who suffered from industrial accidents seemed to be worried about illegal compensation and legal insurance claims, wherein keywords on the choice between illegal compensation and legal insurance claims were included. In topic 2, keywords were associated with claims for industrial accident insurance benefits. Topics 3 and 4, as the second highest concern (19%), contained keywords implying the monetary compensation of gong-sang. Topics 5 to 10 included keywords on vulnerable jobs (ie, workers in the construction and defense industry, delivery riders, and foreign workers) and body parts (ie, injuries to the hands, face, teeth, lower limbs, and back) to gong-sang.

Conclusions:

We explored hidden knowledge to identify the salient issues surrounding gong-sang using the LDA model. These topics may provide valuable information to ensure the more efficient operation of South Korea’s occupational health and safety administration and protect vulnerable workers from illegal gong-sang compensation practices.


 Citation

Please cite as:

Min JY, Song SH, Kim H, Min KB

Mining Hidden Knowledge About Illegal Compensation for Occupational Injury: Topic Model Approach

JMIR Med Inform 2019;7(3):e14763

DOI: 10.2196/14763

PMID: 31573948

PMCID: 6787526

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.