Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR mHealth and uHealth

Date Submitted: Apr 16, 2024
Date Accepted: Aug 27, 2024
Date Submitted to PubMed: Apr 16, 2024

The final, peer-reviewed published version of this preprint can be found here:

Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care

Ortiz BL, Gupta V, Kumar R, Jalin A, Cao X, Ziegenbein C, Singhal A, Choi SW, Tewari M

Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care

JMIR Mhealth Uhealth 2024;12:e59587

DOI: 10.2196/59587

PMID: 38626290

PMCID: 11470224

Data Preprocessing Techniques for Artificial Intelligence (AI)/Machine Learning (ML)-Readiness: Systematic Review of Wearable Sensor Data in Cancer Care

  • Bengie L. Ortiz; 
  • Vibhuti Gupta; 
  • Rajnish Kumar; 
  • Aditya Jalin; 
  • Xiao Cao; 
  • Charles Ziegenbein; 
  • Ashutosh Singhal; 
  • Sung Won Choi; 
  • Muneesh Tewari

ABSTRACT

Background:

Wearable sensors are increasingly being explored in healthcare, including in cancer care, for their potential in continuously monitoring patients. Despite their growing adoption, significant challenges remain in the quality and consistency of data collected from wearable sensors. In particular, preprocessing pipelines to clean and standardize raw data have not been fully optimized.

Objective:

The aim of this study was to conduct a systematic review of preprocessing techniques employed on wearable sensor data to ensure their readiness for artificial intelligence/machine learning (“AI/ML-ready”) applications. Specifically, we sought to understand the landscape of current approaches applied in cleaning, normalizing, and transforming raw datasets into usable formats for subsequent AI/ML analysis.

Methods:

We systematically searched IEEE Xplore, PubMed, Embase (including Embase, Embase Classic, MEDLINE, PubMed-not-MEDLINE), and Scopus to identify potentially relevant studies for this review. The eligibility criteria included: (1) mHealth and wearable sensor studies in cancer; (2) written and published in English; (3) published between January 2018 and December 2023; (4) full text available rather than abstracts; (5) original studies published in peer-reviewed journals or appeared in conference proceedings. The Covidence app was used as a review resource for the screening stage. Statistical learning and image processing techniques were considered irrelevant.

Results:

In the initial phase, 2,147 papers were identified between January 2018–December 2023. After a thorough evaluation of these selected papers, we applied our predefined eligibility criteria, which resulted in a total of 20 papers. The following three categories for preprocessing techniques were identified: (1) Data Transformation, (2) Data Scaling, (3) and Data Cleaning.

Conclusions:

While wearable sensors are gaining traction in cancer care, there remain challenges in the application of standard AI/ML techniques due to low quality of raw data captured and not applying appropriate preprocessing pipelines to enrich the data quality. As of now, AI/ML methodologies remain individually tailored to specific studies or types of data, and limit the generalizability of research findings. A general framework for those multiple types of databases has been proposed in this work. Our findings suggest a pressing need to develop and adopt uniform data quality and pre-processing workflows of wearable sensor data that can support the breadth of cancer research and its diverse patient populations.


 Citation

Please cite as:

Ortiz BL, Gupta V, Kumar R, Jalin A, Cao X, Ziegenbein C, Singhal A, Choi SW, Tewari M

Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care

JMIR Mhealth Uhealth 2024;12:e59587

DOI: 10.2196/59587

PMID: 38626290

PMCID: 11470224

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.