JMIR Preprints #105364: Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

Damian Machlanski;
Junyu Yan;
Kurt Butler;
Panagiotis Dimitrakopoulos;
Ewen M Harrison;
Bruce Guthrie;
Sotirios A Tsaftaris

ABSTRACT

Background:

Predictive modelling is important for health data analysis and data-driven clinical decision-making. However, predictive studies are challenging to design optimally by hand when tens or even hundreds of features require selection, transformation, or interaction modelling. While complex machine learning models offer high performance, their "black-box" nature limits the clinical trust, transparency, and interpretability required for decision-making.

Objective:

The objective of our study was to develop an explainable Artificial Intelligence (AI)-based framework that provides data-driven feature-related recommendations, which, once incorporated, improve predictive performance of existing interpretable statistical models on high-dimensional data.

Methods:

We developed and evaluated an Exploratory AI Recommender that provides data-driven recommendations to improve predictive performance of existing interpretable statistical models. The developed framework uses flexible AI modelling to capture complex data patterns and explainable AI techniques to translate the patterns into three recommendation types: feature exclusion, non-linear terms, and feature interactions. We evaluate the framework by comparing predictive performance of a baseline (i.e., no interactions or non-linear terms) Cox Proportional Hazards (CPH) model against an augmented CPH incorporating recommendations suggested by our method.

Results:

The primary analysis predicts the time to the first occurrence of a fall or related injury in 245,614 patients (mean age 67 ± 12 years). Our method recommended excluding 23 features, including non-linear terms for two features, and including 221 suggested feature interactions. The C-index improved from 0.805 (95% CI 0.798-0.812) to 0.815 (95% CI 0.809-0.822), and so did calibration (intercept: -0.006 to 0.003; slope: 1.063 to 0.950). All recommendations were supported by existing literature. The method also proved effective on two additional public datasets, demonstrating wider applicability.

Conclusions:

The proposed Exploratory AI Recommender demonstrates the potential of explainable AI and data-driven study design to improve the process of developing, and the performance of high-dimensional transparent predictive models.

Citation

Please cite as:

Machlanski D, Yan J, Butler K, Dimitrakopoulos P, Harrison EM, Guthrie B, Tsaftaris SA

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

JMIR Preprints. 23/06/2026:105364

DOI: 10.2196/preprints.105364

URL: https://preprints.jmir.org/preprint/105364

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Jun 23, 2026

Open Peer Review Period: Jun 24, 2026 - Aug 19, 2026

(currently open for review)

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

ABSTRACT

Citation

Copyright