Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Online Journal of Public Health Informatics

Date Submitted: Oct 25, 2024
Open Peer Review Period: Dec 18, 2024 - Feb 12, 2025
Date Accepted: Jun 20, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Identifying Substance Use and High-Risk Sexual Behavior Among Sexual and Gender Minority Youth by Using Mobile Phone Data: Development and Validation Study

Beikzadeh M, Holloway IW, Kärkkäinen K, Hong C, Cascalheira C, Wu ES, Boka C, Avandaño A, Yonko EA, Sarrafzadeh M

Identifying Substance Use and High-Risk Sexual Behavior Among Sexual and Gender Minority Youth by Using Mobile Phone Data: Development and Validation Study

Online J Public Health Inform 2025;17:e68013

DOI: 10.2196/68013

PMID: 40795338

PMCID: 12360732

Identifying Substance Use and High-Risk Sexual Behavior Among Sexual and Gender Minority Young People Using Mobile Phone Data: Development and Validation Study

  • Mehrab Beikzadeh; 
  • Ian W. Holloway; 
  • Kimmo Kärkkäinen; 
  • Chenglin Hong; 
  • Cory Cascalheira; 
  • Elizabeth S.C. Wu; 
  • Callisto Boka; 
  • Alexandra Avandaño; 
  • Elizabeth Ann Yonko; 
  • Majid Sarrafzadeh

ABSTRACT

Background:

Sexual and gender minority (SGM) individuals are at heightened risk for substance use and sexually transmitted infections than their non-SGM peers. Collecting mobile phone usage data passively may open new opportunities for personalizing interventions, as behavioral risks could be identified without user input.

Objective:

Our objectives were to determine (1) whether passively sensed mobile phone data can be used to identify substance use and sexual risk behaviors for STI and HIV transmission among young SGM who have sex with men, (2) which outcomes can be predicted with a high level of accuracy, and (3) which passive data sources are most predictive of these outcomes.

Methods:

We developed a mobile phone app to collect participants’ messaging, location, and app use data and trained a machine learning model to predict risk behaviors for STI and HIV transmission. We used Scikit-learn to train logistic regression and gradient boosting classification models with simple linear model specification to predict participants substance use and sexual behaviors (i.e. condomless anal sex, number of sexual partners, and methamphetamine use), which were validated using self-report questionnaires. F1 scores were used to quantify prediction accuracy of the model utilizing different data sources (and combinations of these sources) for prediction. Differences between text, location, app use, and Linguistic Inquiry and Word Count (LIWC) domains by outcome were investigated using Independent t-tests where associations were considered significant at p<0.05.

Results:

Among participants (n=82) who identified as SGM, were sexually active, and reported recent substance use, our model was highly predictive of methamphetamine use and having 6+ sexual partners (F1 scores as high as 0.83 and 0.69 respectively). The model was less predictive of condomless anal sex (highest F1 score 0.38). Overall, text-based features were found to be most predictive, but app use and location data improved predictive accuracy, particularly for detecting 6+ sexual partners. Methamphetamine use was significantly associated with dating app use (p=0.01) and use of sex-related words (p=0.002). Having six or more sex partners was associated with dating app use (0.02), use of sex-related words (p=0.001), and traveling a further distance from home (p=0.03), on average, compared to participants with fewer sex partners. Methamphetamine users were more likely to use social (p=0.002) and affect words (p=0.003) and less likely to use drive-related words (p=0.02). People having 6 or more partners were more likely to use social, affect words, and cognitive process-related words (p=0.003 and 0.004 respectively).

Conclusions:

Our results show that passively collected mobile phone data may be useful in detecting sexual risk behaviors. Expanding data collection may improve the results further, as certain behaviors, such as injection drug use, were quite rare in the study sample. These models may be used to personalize STI and HIV prevention as well as substance use harm reduction interventions.


 Citation

Please cite as:

Beikzadeh M, Holloway IW, Kärkkäinen K, Hong C, Cascalheira C, Wu ES, Boka C, Avandaño A, Yonko EA, Sarrafzadeh M

Identifying Substance Use and High-Risk Sexual Behavior Among Sexual and Gender Minority Youth by Using Mobile Phone Data: Development and Validation Study

Online J Public Health Inform 2025;17:e68013

DOI: 10.2196/68013

PMID: 40795338

PMCID: 12360732

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.