JMIR Preprints #80735: Paired-Sample and Pathway-Anchored MLOps Framework for Robust Transcriptomic Machine Learning in Small Cohorts

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Paired-Sample and Pathway-Anchored MLOps Framework for Robust Transcriptomic Machine Learning in Small Cohorts

Yves Andre Lussier;
Mahdieh shabanian;
Nima Pouladi;
Liam Wilson;
Mattia A. Prosperi

ABSTRACT

Background:

Ninety percent of the 65,000 human diseases are infrequent, collectively affecting ~ 400 million people, substantially limiting cohort accrual. This low prevalence constrains the development of robust transcriptome-based machine learning (ML) classifiers. Standard data-driven classifiers typically require cohorts of over 100 subjects per group to achieve clinical accuracy while managing high-dimensional input (~25,000 transcripts). These requirements are infeasible for micro-cohorts of ~20 individuals, where overfitting becomes pervasive

Objective:

To overcome these constraints, we developed a classification method that integrates three enabling strategies: (i) paired-sample transcriptome dynamics, (ii) N-of-1 pathway-based analytics, and (iii) reproducible machine learning operations (MLOps) for continuous model refinement.

Methods:

Unlike ML approaches relying on a single transcriptome per subject, within-subject paired-sample designs — such as pre- versus post-treatment or diseased versus adjacent-normal tissue —effectively control intra-individual variability under isogenic conditions and within-subject environmental exposures (e.g. smoking history, other medications, etc.), improve signal-to-noise ratios, and, when pre-processed as single-subject studies (N-of-1), can achieve statistical power comparable to that obtained in animal models. Pathway-level N-of-1 analytics further reduces each sample’s high-dimensional profile into ~4,000 biologically interpretable features, annotated with effect sizes, dispersion, and significance. Complementary MLOps practices—automated versioning, continuous monitoring, and adaptive hyperparameter tuning—improve model reproducibility and generalization.

Results:

In two case studies—human rhinovirus infection versus matched healthy controls (n=16 training; 3 test) and breast cancer tissues harboring TP53 or PIK3CA mutations versus adjacent normal tissue (n=27 training; 9 test)—this approach achieved 90% precision and recall on an unseen breast cancer test set and 92% precision with 90% recall in rhinovirus fivefold cross-validation. . Incorporating paired-sample dynamics boosted precision by up to 12% and recall by 13% in BC, and by 5% each in HRV. MLOps workflows yielded an additional ~14.5% accuracy improvement compared to traditional pipelines. Moreover, our method identified 42 critical gene-sets (pathways) for rhinovirus response and 21 for breast cancer mutation status, with retroactive ablation of top features reducing accuracy by ~25%.

Conclusions:

These proof-of-concept results support the utility of integrating intra-subject dynamics, “biological knowledge”-based feature reduction (pathway-level feature reduction grounded in prior biological knowledge; e.g., N-of-1-pathways analytics), and reproducible MLOps workflows can overcome cohort-size limitations in infrequent disease, offering a scalable, interpretable solution for high-dimensional transcriptomic classification. Future work will extend these advances across various therapeutic and small-cohort designs. Clinical Trial: not applicable

Citation

Please cite as:

Lussier YA, shabanian M, Pouladi N, Wilson L, Prosperi MA

Paired-Sample and Pathway-Anchored MLOps Framework for Robust Transcriptomic Machine Learning in Small Cohorts: Model Classification Study

JMIR Bioinform Biotech 2025;6:e80735

DOI: 10.2196/80735

PMID: 41342203

PMCID: 12507327

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Bioinformatics and Biotechnology

Date Submitted: Jul 17, 2025

Open Peer Review Period: Jul 17, 2025 - Sep 11, 2025

Date Accepted: Sep 13, 2025

(closed for review but you can still tweet)

Paired-Sample and Pathway-Anchored MLOps Framework for Robust Transcriptomic Machine Learning in Small Cohorts

ABSTRACT

Citation

Copyright