JMIR Preprints #83853: Performance of Large Language Models versus conventional Machine Learning for predicting clinical outcomes with limited data: A comparative study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Performance of Large Language Models versus conventional Machine Learning for predicting clinical outcomes with limited data: A comparative study

Erwan Bigan;
Stéphane Dufour

ABSTRACT

Background:

Machine Learning can be used to predict clinical outcomes. Training predictive models typically requires data for hundreds or thousands of patients. Lowering this requirement down to a few tens of patients would enable new applications in clinical trials (e.g., optimizing the design of a Phase III trial by training a predictive model on Phase II data and applying it to synthetic Phase III patients) or in clinical decision support systems (for rare diseases or narrow indications). Large Language Models (LLMs) have recently been shown to outperform conventional Machine Learning algorithms for predictions on tabular data, when the train data set is small.

Objective:

The objective of this study is to confirm the advantage of LLMs compared to conventional Machine Learning (ML) to predict clinical outcomes by applying state-of-the-art models on recently published clinical datasets.

Methods:

Proprietary (from OpenAI) and open source (from the Meta Llama family) LLMs were compared with conventional Machine Learning classification algorithms, to predict clinical outcomes using two recently published clinical datasets spanning distinct conditions (sepsis, gastric cancer). Datasets were chosen such that their publication date was posterior to the LLM knowledge cutoff date, to ensure that models were never exposed to this data during pre-training. Datasets were sampled to vary the train size.

Results:

Proprietary as well as open source LLMs consistently outperform conventional ML for train sizes below 100 patients, using the Receiver Operating Curve Area Under the Curve or the F1 score as metrics. Contextual information is found to be key to this advantage.

Conclusions:

These preliminary results may be further optimized. They already show the potential of LLM-based Machine Learning to enable new clinical use cases when data is only available for a few tens of patients.

Citation

Please cite as:

Bigan E, Dufour S

Performance of Large Language Models vs Conventional Machine Learning for Predicting Clinical Outcomes With Limited Data: Comparative Study

JMIR AI 2026;5:e83853

DOI: 10.2196/83853

PMID: 41921208

PMCID: 13085988

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Sep 9, 2025

Date Accepted: Feb 28, 2026

Performance of Large Language Models versus conventional Machine Learning for predicting clinical outcomes with limited data: A comparative study

ABSTRACT

Citation

Copyright