Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 16, 2018
Open Peer Review Period: Oct 25, 2018 - Dec 20, 2018
Date Accepted: Apr 14, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Monitoring Physical Activity Levels Using Twitter Data: Infodemiology Study

Liu S, Chen B, Kuo A

Monitoring Physical Activity Levels Using Twitter Data: Infodemiology Study

J Med Internet Res 2019;21(6):e12394

DOI: 10.2196/12394

PMID: 31162126

PMCID: 6682305

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Monitoring Physical Activity Levels Using Twitter Data: Infodemiology Study

  • Sam Liu; 
  • Brian Chen; 
  • Alex Kuo

Background:

Social media technology such as Twitter allows users to share their thoughts, feelings, and opinions online. The growing body of social media data is becoming a central part of infodemiology research as these data can be combined with other public health datasets (eg, physical activity levels) to provide real-time monitoring of psychological and behavior outcomes that inform health behaviors. Currently, it is unclear whether Twitter data can be used to monitor physical activity levels.

Objective:

The aim of this study was to establish the feasibility of using Twitter data to monitor physical activity levels by assessing whether the frequency and sentiment of physical activity–related tweets were associated with physical activity levels across the United States.

Methods:

Tweets were collected from Twitter’s application programming interface (API) between January 10, 2017 and January 2, 2018. We used Twitter's garden hose method of collecting tweets, which provided a random sample of approximately 1% of all tweets with location metadata falling within the United States. Geotagged tweets were filtered. A list of physical activity–related hashtags was collected and used to further classify these geolocated tweets. Twitter data were merged with physical activity data collected as part of the Behavioral Risk Factor Surveillance System. Multiple linear regression models were fit to assess the relationship between physical activity–related tweets and physical activity levels by county while controlling for population and socioeconomic status measures.

Results:

During the study period, 442,959,789 unique tweets were collected, of which 64,005,336 (14.44%) were geotagged with latitude and longitude coordinates. Aggregated data were obtained for a total of 3138 counties in the United States. The mean county-level percentage of physically active individuals was 74.05% (SD 5.2) and 75.30% (SD 4.96) after adjusting for age. The model showed that the percentage of physical activity–related tweets was significantly associated with physical activity levels (beta=.11; SE 0.2; P<.001) and age-adjusted physical activity (beta=.10; SE 0.20; P<.001) on a county level while adjusting for both Gini index and education level. However, the overall explained variance of the model was low (R2=.11). The sentiment of the physical activity–related tweets was not a significant predictor of physical activity level and age-adjusted physical activity on a county level after including the Gini index and education level in the model (P>.05).

Conclusions:

Social media data may be a valuable tool for public health organizations to monitor physical activity levels, as it can overcome the time lag in the reporting of physical activity epidemiology data faced by traditional research methods (eg, surveys and observational studies). Consequently, this tool may have the potential to help public health organizations better mobilize and target physical activity interventions.


 Citation

Please cite as:

Liu S, Chen B, Kuo A

Monitoring Physical Activity Levels Using Twitter Data: Infodemiology Study

J Med Internet Res 2019;21(6):e12394

DOI: 10.2196/12394

PMID: 31162126

PMCID: 6682305

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.