Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Previously submitted to: JMIR AI (no longer under consideration since Apr 27, 2026)

Date Submitted: Apr 21, 2026

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Normalization Is a Model-Level Design Choice in Outpatient Type 2 Diabetes AI: A Leakage-Safe Comparative Study on Public Datasets

  • Igor Korsakov

ABSTRACT

Background:

Feature normalization is frequently underreported in clinical machine learning studies, despite its strong influence on model behavior, calibration, and interpretability. In outpatient type 2 diabetes (T2D) decision-support settings, unclear preprocessing choices can reduce reproducibility and weaken translational reliability.

Objective:

This study aimed to evaluate normalization as an explicit model-selection factor in outpatient T2D prediction workflows and to quantify how different normalization strategies affect model performance across classifier families under a leakage-safe evaluation design

Methods:

We conducted a comparative benchmarking study on two public diabetes datasets from Hugging Face (Dataset A: GB2024/diabetes; Dataset B: khoaguin/pima-indians-diabetes-database-partitions). To ensure tractable and reproducible benchmarking across all experiments, datasets larger than 20,000 rows were capped using stratified random sampling (random_state=42). We compared 4 classifier families (Logistic Regression, SVC-RBF, KNN, Random Forest) across 6 normalization strategies (none, standard, min-max, robust, quantile-normal, Yeo-Johnson). Preprocessing (imputation, encoding, normalization) was fit on training folds only. Evaluation used stratified 5-fold cross-validation and held-out testing, with macro-F1 as the primary metric and AUC/accuracy as secondary metrics. A staged proxy-leakage sensitivity analysis was performed.

Results:

Normalization effects were model dependent. KNN and SVC-RBF showed larger performance sensitivity to normalization choice, while Random Forest was comparatively stable. In Dataset B, best test macro-F1 values approached 0.9916, but sensitivity analyses showed that near-ceiling performance can be partially inflated by leakage-adjacent proxy features. Across datasets, reporting only best final metrics masked important normalization-dependent performance spread.

Conclusions:

In outpatient T2D clinical AI, normalization should be treated as a high-impact methodological decision rather than a default preprocessing step. A transparent two-layer preprocessing strategy (clinically meaningful feature encoding plus statistical normalization), leakage-safe validation, and proxy-leakage sensitivity checks can improve reproducibility and support safer translation into treatment-support workflows. Clinical Trial: Not applicable.


 Citation

Please cite as:

Korsakov I

Normalization Is a Model-Level Design Choice in Outpatient Type 2 Diabetes AI: A Leakage-Safe Comparative Study on Public Datasets

JMIR Preprints. 21/04/2026:98989

DOI: 10.2196/preprints.98989

URL: https://preprints.jmir.org/preprint/98989

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.