Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jul 4, 2020
Date Accepted: Jan 31, 2021

The final, peer-reviewed published version of this preprint can be found here:

Leveraging Social Media Activity and Machine Learning for HIV and Substance Abuse Risk Assessment: Development and Validation Study

Ovalle A, Goldstein O, Kachuee M, Wu E, Holloway IW, Sarrafzadeh M

Leveraging Social Media Activity and Machine Learning for HIV and Substance Abuse Risk Assessment: Development and Validation Study

J Med Internet Res 2021;23(4):e22042

DOI: 10.2196/22042

PMID: 33900200

PMCID: 8111510

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Leveraging Social Media Activity and Machine Learning for HIV and Substance Abuse Risk Assessment

  • Anaelia Ovalle; 
  • Orpaz Goldstein; 
  • Mohammad Kachuee; 
  • Elizabeth Wu; 
  • Ian W Holloway; 
  • Majid Sarrafzadeh

ABSTRACT

Background:

Online social media networks provide an abundance of diverse information that can be leveraged for data-driven applications across various social and physical sciences. One opportunity to utilize such data exists in the public health domain, where data collection is often constrained by organizational funding and limited user adoption. Furthermore, the efficacy of health interventions are often based on self-reported data, which is not always reliable. Health-promotion strategies for communities facing multiple vulnerabilities, such as men who have sex with men, can benefit from an automated system that not only determines health behavior risk but also suggests appropriate intervention targets.

Objective:

This study aimed to determine the value in leveraging social media interactions to identify health risk behavior for men who have sex with men.

Methods:

The Gay Social Networking Analysis Program (GSNAP) was created as a preliminary framework for intelligent online health-promotion intervention. The program consisted of a data collection system that automatically gathered social media data, health questionnaires, and clinical results for sexually transmitted diseases and drug tests across 51 participants over a 3-month period. Machine learning techniques were utilized to assess the relationship between social media messages and participants' offline sexual health and substance use biological outcomes. The F1 score, a weighted average of precision and recall, was used to evaluate each algorithm. Natural language processing techniques were employed to create health behavior risk scores from participant messages.

Results:

Across several machine learning algorithms, offline HIV, amphetamine, and methamphetamine use were able to be identified using only social media data, with the best model providing F1 scores of 82.6\%, 85.9\%, and 85.3\%, respectively. Additionally, constructed risk scores were found to be reasonably comparable to risk scores adapted from the Center for Disease Control.

Conclusions:

To our knowledge, our study is the first implementation and empirical evaluation of a social-media based public health intervention framework in MSM. We found that social media data is correlated with offline sexual health and substance use, verified through biological testing. The proof of concept and initial results validate that public health interventions can indeed use social media-based systems to successfully determine offline health risk behaviors. The findings demonstrate the promise of deploying a social media-based just-in-time adaptive intervention to target substance use and HIV risk behavior.


 Citation

Please cite as:

Ovalle A, Goldstein O, Kachuee M, Wu E, Holloway IW, Sarrafzadeh M

Leveraging Social Media Activity and Machine Learning for HIV and Substance Abuse Risk Assessment: Development and Validation Study

J Med Internet Res 2021;23(4):e22042

DOI: 10.2196/22042

PMID: 33900200

PMCID: 8111510

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.