Accepted for/Published in: JMIR Formative Research
Date Submitted: Nov 2, 2022
Open Peer Review Period: Nov 2, 2022 - Dec 28, 2022
Date Accepted: Jul 24, 2023
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Using WeChat clinician-patient group communication data to identify symptom burdens in patients with uterine fibroids under focused ultrasound ablation surgery treatment
ABSTRACT
Background:
Unlike research project-based health data collections, such as questionnaires, interviews, and social media platforms, which allow patients to freely discuss their health status and obtain peer support, previous literature has pointed out that both public-facing websites and private Facebook can serve as data sources for patient-reported outcomes.
Objective:
This study aimed to use natural language processing (NLP) techniques based on machine learning to identify concerns regarding the postoperative quality of life and symptom burdens in uterine fibroids after focused ultrasound ablation surgery.
Methods:
Screenshots taken from the clinician-patient WeChat groups were converted into free texts using image text recognition technology and used as the research object of this study, which used regular expressions in Python to search for symptom burdens in over 900,000 words of WeChat group chats associated with 408 patients in Chongqing Haifu Hospital diagnosed with uterine fibroids between 2010 and 2020. We first built a corpus of symptoms by manually coding 30% of the WeChat texts, and then used regular expressions to crawl symptom information from the remaining texts based on this corpus. We compared the results with a manual review (gold standard) of the same records. The mixed method was used to access the relationship between the population baseline data and conceptual symptoms, Quantitative and qualitative results were examined
Results:
A total of 190,000 words of uterine fibroids patients' free text were finally obtained after data cleaning. A total of 408 patients were included in the study. The age of the patients was 39.94±6.81 years, and their BMI was 23.47±29.37 (kg/m^2). The median reporting times of the seven major symptoms were 21, 26, 57, 2, 18, 30, and 49 days. Results showed that patients with dysmenorrhea were younger and slimmer (mean (SD), P<.05), with lower fertility and parity (P<.05), and tended to stay longer in the hospital (P<.05). Logistic regression models identified menstrual duration (odds ratios (OR) (95%CI)), age at menarche (OR (95%CI)), reported symptoms before surgery (OR (95%CI)), and the number and size of fibroids as significant risk factors for postoperative symptoms.
Conclusions:
Unstructured free texts from social media platforms extracted by NLP technology can be used for analysis, to capture the conceptual information about patients' HRQol, screen out high-risk groups, and track the reporting time of certain symptoms, adopt personalized treatment for patients at different stages of recovery to improve the quality of life of patients. Python-based text mining of free-text data can accurately extract symptom burden administered and save considerable time compared to manual review, maximizing the utility of the extant information in population-based electronic health records for comparative effectiveness research.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.