Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Sep 9, 2025
Date Accepted: Feb 28, 2026

The final, peer-reviewed published version of this preprint can be found here:

Performance of Large Language Models vs Conventional Machine Learning for Predicting Clinical Outcomes With Limited Data: Comparative Study

Bigan E, Dufour S

Performance of Large Language Models vs Conventional Machine Learning for Predicting Clinical Outcomes With Limited Data: Comparative Study

JMIR AI 2026;5:e83853

DOI: 10.2196/83853

PMID: 41921208

Performance of Large Language Models versus conventional Machine Learning for predicting clinical outcomes with limited data: A comparative study

  • Erwan Bigan; 
  • Stéphane Dufour

ABSTRACT

Background:

Machine Learning can be used to predict clinical outcomes. Training predictive models typically requires data for hundreds or thousands of patients. Lowering this requirement down to a few tens of patients would enable new applications in clinical trials (e.g., optimizing the design of a Phase III trial by training a predictive model on Phase II data and applying it to synthetic Phase III patients) or in clinical decision support systems (for rare diseases or narrow indications). Large Language Models (LLMs) have recently been shown to outperform conventional Machine Learning algorithms for predictions on tabular data, when the train data set is small.

Objective:

The objective of this study is to confirm the advantage of LLMs compared to conventional Machine Learning (ML) to predict clinical outcomes by applying state-of-the-art models on recently published clinical datasets.

Methods:

Proprietary (from OpenAI) and open source (from the Meta Llama family) LLMs were compared with conventional Machine Learning classification algorithms, to predict clinical outcomes using two recently published clinical datasets spanning distinct conditions (sepsis, gastric cancer). Datasets were chosen such that their publication date was posterior to the LLM knowledge cutoff date, to ensure that models were never exposed to this data during pre-training. Datasets were sampled to vary the train size.

Results:

Proprietary as well as open source LLMs consistently outperform conventional ML for train sizes below 100 patients, using the Receiver Operating Curve Area Under the Curve or the F1 score as metrics. Contextual information is found to be key to this advantage.

Conclusions:

These preliminary results may be further optimized. They already show the potential of LLM-based Machine Learning to enable new clinical use cases when data is only available for a few tens of patients.


 Citation

Please cite as:

Bigan E, Dufour S

Performance of Large Language Models vs Conventional Machine Learning for Predicting Clinical Outcomes With Limited Data: Comparative Study

JMIR AI 2026;5:e83853

DOI: 10.2196/83853

PMID: 41921208

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.