Identifying Marijuana Use Behaviors among Homeless Youth: A Machine Learning Approach
ABSTRACT
Background:
Youth experiencing homelessness (YEH) suffer from substance use problems disproportionately compared to other youth. A study found that 69% of YEH meet the criteria for dependence on at least one substance compared to 1.8% of all US adolescents. In addition, they experience major structural and social inequalities which further undermine their ability to get the care they need.
Objective:
The goal of this study is to develop a machine learning-based framework that utilizes homeless youth’s social media content (posts and interactions) to predict their substance use behaviors (i.e., the probability of using certain substances). With this framework, social workers and care providers can identify and reach out to YEH who are at a higher risk of substance use.
Methods:
We recruited 133 homeless youth at a non-profit organization located in a city in the western United States. After obtaining their consent, we collected types of data: (1) participants’ social media conversations for the past year before they were recruited; (2) we asked the participants to complete a survey on their demographic information, health conditions, sexual behaviors, and substance usage behaviors. Building on the social sharing of emotions theory and social support theory, we identified important features that can potentially predict substance use. Then we used natural language processing techniques to extract such features from social media conversations and reactions and built a series of machine learning models to predict participants’ marijuana use.
Results:
We evaluate our models based on their predictive performance as well as their conformity to measures of fairness. Without predictive features from survey information, which may introduce gender and racial biases, our machine-learning models can reach an AUC of 0.74 and an accuracy of 0.77 using social media data only. We also evaluated the false positive rate for each gender and age segmentation.
Conclusions:
We showed that textual interactions among YEH and their friends on social media can serve as a powerful resource to predict their substance usage. The framework we developed allows care providers to allocate resources efficiently to YEH in the greatest need while costing minimal overhead. It can be extended to analyze and predict other health-related behaviors and conditions observed in this vulnerable community.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.