Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 6, 2018
Open Peer Review Period: Oct 6, 2018 - Oct 17, 2018
Date Accepted: Jan 5, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Data Analysis and Visualization of Newspaper Articles on Thirdhand Smoke: A Topic Modeling Approach

Liu Q, Chen Q, Shen J, Wu H, Sun Y, Ming WK

Data Analysis and Visualization of Newspaper Articles on Thirdhand Smoke: A Topic Modeling Approach

JMIR Med Inform 2019;7(1):e12414

DOI: 10.2196/12414

PMID: 30694199

PMCID: 6371067

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Data Analysis and Visualization of Newspaper Articles on Thirdhand Smoke: A Topic Modeling Approach

  • Qian Liu; 
  • Qiuyi Chen; 
  • Jiayi Shen; 
  • Huailiang Wu; 
  • Yimeng Sun; 
  • Wai-Kit Ming

Background:

Thirdhand smoke has been a growing topic for years in China. Thirdhand smoke (THS) consists of residual tobacco smoke pollutants that remain on surfaces and in dust. These pollutants are re-emitted as a gas or react with oxidants and other compounds in the environment to yield secondary pollutants.

Objective:

Collecting media reports on THS from major media outlets and analyzing this subject using topic modeling can facilitate a better understanding of the role that the media plays in communicating this health issue to the public.

Methods:

The data were retrieved from the Wiser and Factiva news databases. A preliminary investigation focused on articles dated between January 1, 2013, and December 31, 2017. Use of Latent Dirichlet Allocation yielded the top 10 topics about THS. The use of the modified LDAvis tool enabled an overall view of the topic model, which visualizes different topics as circles. Multidimensional scaling was used to represent the intertopic distances on a two-dimensional plane.

Results:

We found 745 articles dated between January 1, 2013, and December 31, 2017. The United States ranked first in terms of publications (152 articles on THS from 2013-2017). We found 279 news reports about THS from the Chinese media over the same period and 363 news reports from the United States. Given our analysis of the percentage of news related to THS in China, Topic 1 (Cancer) was the most popular among the topics and was mentioned in 31.9% of all news stories. Topic 2 (Control of quitting smoking) was related to roughly 15% of news items on THS.

Conclusions:

Data analysis and the visualization of news articles can generate useful information. Our study shows that topic modeling can offer insights into understanding news reports related to THS. This analysis of media trends indicated that related diseases, air and particulate matter (PM2.5), and control and restrictions are the major concerns of the Chinese media reporting on THS. The Chinese press still needs to consider fuller reports on THS based on scientific evidence and with less focus on sensational headlines. We recommend that additional studies be conducted related to sentiment analysis of news data to verify and measure the influence of THS-related topics.


 Citation

Please cite as:

Liu Q, Chen Q, Shen J, Wu H, Sun Y, Ming WK

Data Analysis and Visualization of Newspaper Articles on Thirdhand Smoke: A Topic Modeling Approach

JMIR Med Inform 2019;7(1):e12414

DOI: 10.2196/12414

PMID: 30694199

PMCID: 6371067

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.