Accepted for/Published in: JMIR mHealth and uHealth
Date Submitted: Apr 16, 2024
Date Accepted: Aug 27, 2024
Date Submitted to PubMed: Apr 16, 2024
Data Preprocessing Techniques for Artificial Intelligence (AI)/Machine Learning (ML)-Readiness: Systematic Review of Wearable Sensor Data in Cancer Care
ABSTRACT
Background:
Wearable sensors are increasingly being explored in healthcare, including in cancer care, for their potential in continuously monitoring patients. Despite their growing adoption, significant challenges remain in the quality and consistency of data collected from wearable sensors. In particular, preprocessing pipelines to clean and standardize raw data have not been fully optimized.
Objective:
The aim of this study was to conduct a systematic review of preprocessing techniques employed on wearable sensor data to ensure their readiness for artificial intelligence/machine learning (“AI/ML-ready”) applications. Specifically, we sought to understand the landscape of current approaches applied in cleaning, normalizing, and transforming raw datasets into usable formats for subsequent AI/ML analysis.
Methods:
We systematically searched IEEE Xplore, PubMed, Embase (including Embase, Embase Classic, MEDLINE, PubMed-not-MEDLINE), and Scopus to identify potentially relevant studies for this review. The eligibility criteria included: (1) mHealth and wearable sensor studies in cancer; (2) written and published in English; (3) published between January 2018 and December 2023; (4) full text available rather than abstracts; (5) original studies published in peer-reviewed journals or appeared in conference proceedings. The Covidence app was used as a review resource for the screening stage. Statistical learning and image processing techniques were considered irrelevant.
Results:
In the initial phase, 2,147 papers were identified between January 2018–December 2023. After a thorough evaluation of these selected papers, we applied our predefined eligibility criteria, which resulted in a total of 20 papers. The following three categories for preprocessing techniques were identified: (1) Data Transformation, (2) Data Scaling, (3) and Data Cleaning.
Conclusions:
While wearable sensors are gaining traction in cancer care, there remain challenges in the application of standard AI/ML techniques due to low quality of raw data captured and not applying appropriate preprocessing pipelines to enrich the data quality. As of now, AI/ML methodologies remain individually tailored to specific studies or types of data, and limit the generalizability of research findings. A general framework for those multiple types of databases has been proposed in this work. Our findings suggest a pressing need to develop and adopt uniform data quality and pre-processing workflows of wearable sensor data that can support the breadth of cancer research and its diverse patient populations.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.