Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 19, 2023
Date Accepted: Jan 3, 2024
Identifying Risk Factors of Allergic Rhinitis Based on Zhihu Comment Data: Using a Topic-Enhanced Word Embedding Model
ABSTRACT
Background:
Allergic rhinitis (AR) is a chronic disease, and several risk factors predispose individuals to the condition in their daily lives, including exposure to allergens, and inhalation irritants. Analyzing the potential risk factors that can trigger allergic rhinitis can provide reference material for patients to use to reduce its occurrence in their daily lives. Nowadays, the use of social media is part of daily life, with more and more people, using at least one platform regularly. Social media enables users to share experiences among large groups of people who share the same interests and suffer the same afflictions. Notably, these channels promote the ability to share health information.
Objective:
This study aims to construct an intelligent method (TopicS-ClusterREV) for identifying the risk factors of allergic rhinitis based on these social media comments. The main questions were as follows: how many comments contained AR risk factor information? How many categories can these risk factors be summarized? How do these risk factors trigger allergic rhinitis?
Methods:
This study crawled all the data from May 2012 to May 2022 under the topic of “allergic rhinitis” on Zhihu, obtaining a total of 9,628 posts and 33,747 comments. We improve the Skip-gram model to train topic-enhanced word vector representations (TopicS) and then vectorize annotated text items for training the risk factor classifier. Furthermore, cluster analysis enabled a closer look into the opinions expressed in the category, namely, gaining insight into how risk factors trigger AR.
Results:
Our classifier identified more comments containing risk factors than other classification models, with an accuracy rate of 96.10% and a recall rate of 96.30%. In general, we clustered texts containing risk factors into 28 categories, with season, region, and mites being the most common. We gain insight into the risk factors expressed in the category, for example, seasonal changes and increased temperature differences between day and night can disrupt the body's immune system and lead to the development of allergies.
Conclusions:
Our approach can handle the mass of data and extract risk factors effectively. Moreover, the summary of risk factors can serve as a reference for patients to use to reduce AR in their daily lives. The experimental data also provides a potential pathway that triggers AR. This result can guide the development of management plans and interventions for AR.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.