Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 19, 2023
Date Accepted: Jan 3, 2024

The final, peer-reviewed published version of this preprint can be found here:

Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis

Gu D, Wang Q, Chai Y, Yang X, Xu Z, Zhao W, Zolotarev O

Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis

J Med Internet Res 2024;26:e48324

DOI: 10.2196/48324

PMID: 38386404

PMCID: 10921335

Identifying Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data: Using a Topic-Enhanced Word Embedding Model

  • Dongxiao Gu; 
  • Qin Wang; 
  • Yidong Chai; 
  • Xuejie Yang; 
  • Zhengfei Xu; 
  • Wang Zhao; 
  • Oleg Zolotarev

ABSTRACT

Background:

Allergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens, and inhalation irritants. Analyzing the potential risk factors that can trigger allergic rhinitis can provide reference material for patients to use to reduce its occurrence in their daily lives. Nowadays, the use of social media is part of daily life, with more and more people, using at least one platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and suffer the same afflictions. Notably, these channels promote the ability to share health information.

Objective:

This study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of allergic rhinitis based on these social media comments. The main questions were as follows: how many comments contained AR risk factor information? How many categories can these risk factors be summarized? How do these risk factors trigger allergic rhinitis?

Methods:

This study crawled all the data from May 2012 to May 2022 under the topic of “allergic rhinitis” on Zhihu, obtaining a total of 9,628 posts and 33,747 comments. We improve the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorize annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely, gaining insight into how risk factors trigger AR.

Results:

Our classifier identified more comments containing risk factors than other classification models, with an accuracy rate of 96.10% and a recall rate of 96.30%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common. We gain insight into the risk factors expressed in the category, for example, seasonal changes and increased temperature differences between day and night can disrupt the body's immune system and lead to the development of allergies.

Conclusions:

Our approach can handle the mass of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for patients to use to reduce AR in their daily lives. The experimental data also provides a potential pathway that triggers AR. This result can guide the development of management plans and interventions for AR.


 Citation

Please cite as:

Gu D, Wang Q, Chai Y, Yang X, Xu Z, Zhao W, Zolotarev O

Identifying the Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data Using a Topic-Enhanced Word-Embedding Model: Mixed Method Study and Cluster Analysis

J Med Internet Res 2024;26:e48324

DOI: 10.2196/48324

PMID: 38386404

PMCID: 10921335

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.