Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 4, 2023
Date Accepted: Dec 5, 2023

The final, peer-reviewed published version of this preprint can be found here:

Figure Correction: Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach

Dolatabadi E, Moyano D, Bales M, Spasojevic S, Bhambhoria R, Bhatti J, Debnath S, Hoell N, Li X, Leng C, Nanda S, Saab J, Sahak E, Sie F, Uppal S, Vadlamudi NK, Vladimirova A, Yakimovich A, Yang X, Kocak SA, Cheung AM

Figure Correction: Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach

J Med Internet Res 2023;25:e55010

DOI: 10.2196/55010

PMID: 38064711

PMCID: 10746960

Figure Correction: Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach

  • Elham Dolatabadi; 
  • Diana Moyano; 
  • Michael Bales; 
  • Sofija Spasojevic; 
  • Rohan Bhambhoria; 
  • Junaid Bhatti; 
  • Shyamolima Debnath; 
  • Nicholas Hoell; 
  • Xin Li; 
  • Celine Leng; 
  • Sasha Nanda; 
  • Jaad Saab; 
  • Esmat Sahak; 
  • Fanny Sie; 
  • Sara Uppal; 
  • Nirma Khatri Vadlamudi; 
  • Antoaneta Vladimirova; 
  • Artur Yakimovich; 
  • Xiaoxue Yang; 
  • Sedef Akinli Kocak; 
  • Angela M Cheung

ABSTRACT

Background:

While scientific knowledge of post-COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians.

Objective:

In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline's potential as a surveillance tool.

Methods:

We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries.

Results:

UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as fatigue), neuropsychiatric (such as anxiety and brain fog), and respiratory (such as shortness of breath). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as infection and pain. Regarding the co-occurring symptoms, the pair of fatigue and headaches was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada.

Conclusions:

The outcome of our social media-derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient's journey that can help health care providers anticipate future needs.


 Citation

Please cite as:

Dolatabadi E, Moyano D, Bales M, Spasojevic S, Bhambhoria R, Bhatti J, Debnath S, Hoell N, Li X, Leng C, Nanda S, Saab J, Sahak E, Sie F, Uppal S, Vadlamudi NK, Vladimirova A, Yakimovich A, Yang X, Kocak SA, Cheung AM

Figure Correction: Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach

J Med Internet Res 2023;25:e55010

DOI: 10.2196/55010

PMID: 38064711

PMCID: 10746960

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.