Poster-Patients Relationships Show Differing Content and Language on Chinese Weibo: Text Classification, Sentiment Analysis, and Topic Modeling of Posts on Breast Cancer
ABSTRACT
Background:
Breast cancer affects the lives of not only those diagnosed, but also the people around them. Many of those affected share their experiences on social media. However, these narratives may differ according to who the poster is and their relationship with the patient; a patient posting about their experiences may post different content from someone who’s friends or family has breast cancer. In China, Weibo is one of the most popular social media platforms, and breast cancer-related posts are frequently found there. We used Weibo as a resource to examine how posts differ according to the different poster-patient relationships
Objective:
With the goal of understanding the different experiences of those affected by breast cancer in China, we aim to explore how content and language used in relevant posts differ according to who the poster is, and their relationship with the patient. Our goal is to examine if there are differences in emotional expression and topic content if the patient is the poster themselves, or a friend, family, relative or acquaintance.
Methods:
We scraped a total of N=10322 relevant Weibo posts. Using a 2-step analysis method, we first fine-tune two Chinese RoBERTa models, on a dataset annotated with poster-patient relationships. These models were lined in sequence, first a binary classifier (‘no_patient’ or ‘patient’), and multiclass classifier (‘post_user’, ‘family_members’, ‘friends_relatives’, ‘acquaintances’, ‘heard_relation’) to classify patient relationships. Next, we used the LIWC lexicon to conduct sentiment analysis on 5 emotion categories (positive and negative emotions, anger, sadness, and anxiety), followed by topic modeling (BERTopic).
Results:
Our binary model (F1 = 0.93) and multiclass model (F1 = 0.83) were largely able to classify patient relationships accurately. Subsequent sentiment analyses showed significant differences in emotion categories across all patient relationships. Notably, negative emotions and anger were higher for the ‘no_patient’ class, but sadness and anxiety were higher for the ‘family_member’ class. Focusing on the top 30 topics, we also noted that topics about fears and rage towards cancer were higher in the ‘no_patient’ class, but topics about cancer treatment were higher in the ‘family_member’ class.
Conclusions:
Chinese users posted different types of content depending on the poster-patient relationship. If the patient was family, posts were sadder and more anxious, but also contained more content on treatments. However, if no patient was detected, posts showed high levels of anger. We think that this may be the poster ranting, which may help with emotion regulation and gathering social support.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.