Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Human Factors

Date Submitted: Jun 6, 2025
Open Peer Review Period: Dec 4, 2025 - Jan 29, 2026
Date Accepted: Dec 14, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Machine Learning for the Analysis of Healthy Lifestyle Data: Scoping Review and Guidelines

Estrella T, Capdevila L, Alfonso C, Losilla JM

Machine Learning for the Analysis of Healthy Lifestyle Data: Scoping Review and Guidelines

JMIR Hum Factors 2026;13:e78648

DOI: 10.2196/78648

PMID: 41773677

PMCID: 12954701

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Machine learning for the analysis of healthy lifestyle data: a scoping review and guidelines

  • Tony Estrella; 
  • Lluis Capdevila; 
  • Carla Alfonso; 
  • Josep-Maria Losilla

ABSTRACT

Background:

Advances in data science and technology have transformed lifestyle studies by enabling the integration of multimodal information and generation of large volumes of data. Despite the growing interest in machine learning (ML) in health behaviour research, significant methodological gaps remain.

Objective:

The study aims to systematically review the applications of supervised ML algorithms in analyzing healthy lifestyle (HL) data, with a specific focus on the methodological approach employed. The specific objectives are to explore the types and sources of data used in health outcomes, examine the ML processes employed, including explainability artificial intelligence (XAI) methods, and review the software tools utilized. Additionally, this review aims to provide practical guidelines to enhance the quality and transparency of future ML research in health.

Methods:

Following the PRISMA-ScR recommendations, the search was conducted across PubMed, PsychINFO, and Web of Science, resulting in 48 studies that meet the inclusion criteria.

Results:

Most studies (37, 77%), integrated multidomain data from physical activity, diet, sleep, and stress. Data sources were split between self-acquired (25, 52.08%) and health repositories (23, 47.92%). Single items measurements were common, particularly for physical activity, diet and sleep. Despite a multimodel approach in 28 studies, random forest was the most frequently used algorithm. Only 10 studies (20.83%) employed XAI methods, with 9 using SHapley Additive exPlanation (SHAP) values and 1 using Local Interpretable Model-agnostic Explanations (LIME). R was the most widely used software, with variations in the libraries employed.

Conclusions:

This review highlights methodological gaps in the application of supervised ML to HL data. The ML workflow should span from data acquisition to explainability, with iterative steps to improve the process. Multidomain approaches in data acquisition enhance understanding of health issues related to lifestyle but are constrained by low data representativeness due to methodological limitations in acquisition. While random forest was prevalent, a multimodel approach is recommended for comprehensive comparison. Lifestyle components consistently ranked among the top features in studies that incorporated XAI. Integrating XAI methods into the ML pipeline can support personalized interventions, provided the data is accurately collected. The R metapackage tidymodels facilitates process evaluation through unified syntax, improving replicability. Methodological and reporting guidelines are provided to enhance transparency and replicability in multidisciplinary ML research.


 Citation

Please cite as:

Estrella T, Capdevila L, Alfonso C, Losilla JM

Machine Learning for the Analysis of Healthy Lifestyle Data: Scoping Review and Guidelines

JMIR Hum Factors 2026;13:e78648

DOI: 10.2196/78648

PMID: 41773677

PMCID: 12954701

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.