Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Apr 20, 2018
Open Peer Review Period: Apr 23, 2018 - Jun 7, 2018
Date Accepted: Jul 23, 2018
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis

Tufts C, Polsky D, Volpp KG, Groeneveld PW, Ungar L, Merchant RM, Pelullo AP

Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis

JMIR Public Health Surveill 2018;4(4):e10834

DOI: 10.2196/10834

PMID: 30522989

PMCID: 6302232

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis

  • Christopher Tufts; 
  • Daniel Polsky; 
  • Kevin G Volpp; 
  • Peter W Groeneveld; 
  • Lyle Ungar; 
  • Raina M Merchant; 
  • Arthur P Pelullo

Background:

Tweets can provide broad, real-time perspectives about health and medical diagnoses that can inform disease surveillance in geographic regions. Less is known, however, about how much individuals post about common health conditions or what they post about.

Objective:

We sought to collect and analyze tweets from 1 state about high prevalence health conditions and characterize the tweet volume and content.

Methods:

We collected 408,296,620 tweets originating in Pennsylvania from 2012-2015 and compared the prevalence of 14 common diseases to the frequency of disease mentions on Twitter. We identified and corrected bias induced due to variance in disease term specificity and used the machine learning approach of differential language analysis to determine the content (words and themes) most highly correlated with each disease.

Results:

Common disease terms were included in 226,802 tweets (174,381 tweets after disease term correction). Posts about breast cancer (39,156/174,381 messages, 22.45%; 306,127/12,702,379 prevalence, 2.41%) and diabetes (40,217/174,381 messages, 23.06%; 2,189,890/12,702,379 prevalence, 17.24%) were overrepresented on Twitter relative to disease prevalence, whereas hypertension (17,245/174,381 messages, 9.89%; 4,614,776/12,702,379 prevalence, 36.33%), chronic obstructive pulmonary disease (1648/174,381 messages, 0.95%; 1,083,627/12,702,379 prevalence, 8.53%), and heart disease (13,669/174,381 messages, 7.84%; 2,461,721/12,702,379 prevalence, 19.38%) were underrepresented. The content of messages also varied by disease. Personal experience messages accounted for 12.88% (578/4487) of prostate cancer tweets and 24.17% (4046/16,742) of asthma tweets. Awareness-themed tweets were more often about breast cancer (9139/39,156 messages, 23.34%) than asthma (1040/16,742 messages, 6.21%). Tweets about risk factors were more often about heart disease (1375/13,669 messages, 10.06%) than lymphoma (105/4927 messages, 2.13%).

Conclusions:

Twitter provides a window into the Web-based visibility of diseases and how the volume of Web-based content about diseases varies by condition. Further, the potential value in tweets is in the rich content they provide about individuals’ perspectives about diseases (eg, personal experiences, awareness, and risk factors) that are not otherwise easily captured through traditional surveys or administrative data.


 Citation

Please cite as:

Tufts C, Polsky D, Volpp KG, Groeneveld PW, Ungar L, Merchant RM, Pelullo AP

Characterizing Tweet Volume and Content About Common Health Conditions Across Pennsylvania: Retrospective Analysis

JMIR Public Health Surveill 2018;4(4):e10834

DOI: 10.2196/10834

PMID: 30522989

PMCID: 6302232

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.