Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 16, 2018
Open Peer Review Period: Oct 25, 2018 - Dec 20, 2018
Date Accepted: Apr 14, 2019
(closed for review but you can still tweet)
Monitor Physical Activity Levels using Social Media Data.
ABSTRACT
Background:
Social media technology, such as Twitter, allows users to communicate with each other by sharing short messages. Users often share their thoughts, feelings and opinions on these social media platforms and as a result, social media data could be used to provide real-time monitoring of psychological and behaviour outcomes that inform health behaviours. The growing body of social media data is becoming a central part of big data research as these data can be combined other datasets (e.g. physical activity level) and used to predict outcomes from these datasets. Currently, it is unclear whether Twitter data can be used to monitor physical activity level.
Objective:
This study seeks to establish the feasibility of using Twitter data to monitor physical activity levels by assessing whether the frequency and sentiment of physical activity-related tweets were associated with physical activity levels across the United States.
Methods:
Tweets were collected from Twitter's Advanced Programming Interface (API) between January 30, 2017 and October 15, 2017. We used Twitter's ‘garden hose’ method of collecting tweets, which provided a random sample of approximately 1% of all tweets. Geo-tagged tweets were filtered. A list of physical activity keywords was compiled using the guidelines for exercise testing published by the American College of Sports Medicine. A tweet was classified as physical activity-related tweets if it contained one or more related keywords to physical activity (e.g. exercise, running). Twitter data was merged with physical activity data collected as part of the Behavioural Risk Factor Surveillance System. The beta regression model assessed the relationship between physical activity-related tweets and physical inactivity prevalence by county while controlling for population and socioeconomic status measures.
Results:
We collected 442,959,789 unique tweets collected, 64,005,336 (14.4%) were geo-tagged. Aggregated data were obtained for a total of 3138 counties in the United States. The mean county-level percentage of individuals that are physically active was 74.05% (SD 5.2) and 75.30% (SD 4.96) after adjusting for age. Our models showed that the percentage of physical activity related-tweets was significantly associated with physical activity level (=0.11; SE=0.2, p<.01) and age-adjusted physical activity (=0.10=; SE=0.20, p<.01) on a county level while adjusting for both Gini index and education level. However, the sentiment of the physical activity-related tweets was not a significant predictor of physical activity level and age-adjusted physical activity on a county level after including Gini index and education level in the model (p>.05).
Conclusions:
Social media data to monitor physical activity level can be a valuable tool for public health organizations as it can overcome the time lag in the reporting of physical activity epidemiology data faced by traditional research methods (e.g. surveys, observational studies). Consequently, this tool could have the potential to help public health organizations better mobilize and target physical activity interventions. Clinical Trial: Not Applicable
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.