Accepted for/Published in: JMIR Research Protocols
Date Submitted: Feb 28, 2025
Date Accepted: Sep 10, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluating Diversity in Open Photoplethysmography (PPG) Datasets: a protocol for systematic review
ABSTRACT
Background:
Photoplethysmography (PPG) is an optical method for measuring blood volume changes in microcirculation, through non-invasive photodetection. It has become a widespread and essential clinical tool, used in pulse oximeters and wearable devices. However, technical aspects of PPG make it susceptible to intrinsic bias, with the potential to adversely affect particular patient and consumer populations. Developments in PPG technology, increasingly driven by existing datasets as opposed to de novo experimentation, have the potential to help monitor an array of physiological variables. However, some populations may be under-represented in PPG datasets. We describe a protocol for a systematic review to assess the biases within open-access PPG datasets.
Objective:
The aim of this review is to critically appraise diversity within PPG datasets. The review will evaluate the demographic characteristics present in open access PPG datasets to evaluate what data is collected in studies and utilised in developing and training new medical technology such as wearables and pulse oximeters. By evaluating the structural components of PPG datasets, we can elucidate current gaps and areas for improvement to reduce systemic bias in PPG based device development.
Methods:
This review will be reported in accordance with the standard Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We will include primary studies that mention PPG and specifically reference openly accessible datasets since 2000. The datasets must contain physiological parameters such as heart rate (HR), blood pressure (BP), or respiratory rate (RR) as well as the PPG waveform data, collected from humans. Searches will be conducted in literature databases and data repositories. Studies will be evaluated in accordance with the Standing Together Initiative recommendations, which are urging for healthcare technologies supported by representative data.
Results:
All included studies and datasets will be described by their dataset characteristics and study outcomes using summary statistics and statistical tests. We will analyse the dataset diversity and the structural basis of PPG datasets, and critically evaluate the different variables included in the datasets. We predict that there will be heterogeneity in the data reporting quality and the terminology used to signify specific variables. By utilising statistical test fit for nominal variable comparisons it will be possible to evaluate the frequencies of dataset characteristics.
Conclusions:
This review will provide insight into the potential gaps of existing open-access PPG datasets, and the limitations of studies performed using these datasets. It will inform future collection and design of medical devices including wearables to avoid perpetuating biases, allowing for application in diverse clinical settings. Clinical Trial: PROSPERO [CRD42024564759]
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.