Performance of Large Language Models versus conventional Machine Learning for predicting clinical outcomes with limited data: A comparative study
ABSTRACT
Background:
Machine Learning can be used to predict clinical outcomes. Training predictive models typically requires data for hundreds or thousands of patients. Lowering this requirement down to a few tens of patients would enable new applications in clinical trials (e.g., optimizing the design of a Phase III trial by training a predictive model on Phase II data and applying it to synthetic Phase III patients) or in clinical decision support systems (for rare diseases or narrow indications). Large Language Models (LLMs) have recently been shown to outperform conventional Machine Learning algorithms for predictions on tabular data, when the train data set is small.
Objective:
The objective of this study is to confirm the advantage of LLMs compared to conventional Machine Learning (ML) to predict clinical outcomes by applying state-of-the-art models on recently published clinical datasets.
Methods:
Proprietary (from OpenAI) and open source (from the Meta Llama family) LLMs were compared with conventional Machine Learning classification algorithms, to predict clinical outcomes using two recently published clinical datasets spanning distinct conditions (sepsis, gastric cancer). Datasets were chosen such that their publication date was posterior to the LLM knowledge cutoff date, to ensure that models were never exposed to this data during pre-training. Datasets were sampled to vary the train size.
Results:
Proprietary as well as open source LLMs consistently outperform conventional ML for train sizes below 100 patients, using the Receiver Operating Curve Area Under the Curve or the F1 score as metrics. Contextual information is found to be key to this advantage.
Conclusions:
These preliminary results may be further optimized. They already show the potential of LLM-based Machine Learning to enable new clinical use cases when data is only available for a few tens of patients.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.