Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Oct 14, 2021
Date Accepted: May 27, 2022
Using Social Media to Predict Food Deserts in the United States: Infodemiology Study of Tweets
ABSTRACT
Background:
The issue of food insecurity is becoming increasingly important to public health practitioners because of the adverse health outcomes and underlying racial disparities that are associated with insufficient access to healthy foods. Prior research has used data sources such as surveys, geographic information systems, and food store assessments to identify regions classified as food deserts, but perhaps the individuals in these regions unknowingly provide their own accounts of food consumption and food insecurity, via social media. Social media data have proved useful in answering questions related to public health, so it may prove to be a rich data source for identifying food deserts in the United States.
Objective:
The aim of this study was to develop, from geotagged Twitter data, a predictive model for the identification of food deserts in the United States, using the linguistic constructs found in food-related tweets.
Methods:
Twitter’s streaming application programming interface was used to collect a random 1% sample of public, geolocated tweets across 25 major cities, from March 2020 to December 2020. A total of 60,174 geolocated, food-related tweets were collected across the 25 cities. Each geolocated tweet was mapped to its respective census tract using point-to-polygon mapping, which allowed us to develop census-tract level features derived from the linguistic constructs found in food-related tweets, such as tweet sentiment and average nutritional value of foods mentioned in tweets. These features were then used to examine the associations between food desert status and the food-ingestion language and sentiment of tweets in a census tract, and to determine whether food-related tweets can be used to infer census tract-level food desert status.
Results:
We found associations between a census tract being classified as a food desert and an increase in the number of tweets in a census tract that mentioned unhealthy foods (P=.03), including foods high in cholesterol (P=.02) or lower in key nutrients, such as potassium (P=.01). We also found an association between a census tract being classified as a food desert and an increase the proportion of tweets that mentioned healthy foods (P=.03) and fast-food restaurants (P=.01), with positive sentiment. We also found that including food ingestion language derived from tweets in classification models that predict food desert status improves model performance when compared to baseline models that only include socio-economic characteristics.
Conclusions:
Social media data has been increasingly used to answer questions related to health and well-being. Using Twitter data, we found that food-related tweets can be used to develop models for predicting census tract food desert status, with high accuracy, and improves over baseline models. Food-ingestion language found in tweets, such as census-tract level measures of food sentiment and healthiness, are associated with census tract-level food desert status.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.