JMIR Preprints #86700: Leveraging Large Language Models to Integrate Clinical Knowledge and Machine Learning Predictions for Lymph Node Metastasis Prediction: A Knowledge-Augmented Framework

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Leveraging Large Language Models to Integrate Clinical Knowledge and Machine Learning Predictions for Lymph Node Metastasis Prediction: A Knowledge-Augmented Framework

Hongying Yu;
Bing Liu;
Xian Zeng;
Mecheng Ren;
Zheng Cao;
Xiaofeng Zhu;
Xudong Lu;
Jun Xu;
Nan Wu;
Danqing Hu

ABSTRACT

Background:

Lymph node metastasis (LNM) is a critical clinical indicator for determining the initial treatment strategy of lung cancer patients. However, accurately diagnosing LNM preoperatively remains a significant challenge. Data-driven predictive modeling has become a mainstream approach to address this issue, yet it often overlooks existing clinical knowledge.

Objective:

Large language models (LLMs) have demonstrated the potential to predict clinical risks in a zero-shot manner based on the extensive clinical knowledge learned from large-scale corpora. This study aims to investigate the integration of LLM-derived knowledge with data-driven patterns to enhance the accuracy of LNM prediction.

Methods:

We propose a novel ensemble framework that combines the strengths of LLMs and machine learning (ML) models for LNM prediction in lung cancer. Specifically, three ML models were trained using clinical data, and their predicted probabilities, along with the original clinical features, were incorporated into prompts for LLMs. Three LLMs, GPT-4o, GPT-o4-mini, and DeepSeek-R1, were employed to independently predict LNM risk five times, and four ensemble strategies were applied to aggregate their predictions into a final outcome.

Results:

The proposed approach was evaluated on clinical data from 767 lung cancer patients at Peking University Cancer Hospital. Experimental results show that our ensemble framework significantly outperforms standalone ML models, achieving an area under the curve (AUC) of 0.778 and an average precision (AP) of 0.418. Furthermore, reasoning-oriented LLMs achieved better performance than base chat LLMs within the ensemble framework.

Conclusions:

This study presents a concise and effective strategy for integrating the clinical knowledge embedded in LLMs with the latent data–outcome relationships captured by ML models, offering a promising direction for improving LNM prediction of lung cancer.

Citation

Please cite as:

Yu H, Liu B, Zeng X, Ren M, Cao Z, Zhu X, Lu X, Xu J, Wu N, Hu D

Leveraging Large Language Models to Integrate Clinical Knowledge and Machine Learning Predictions for Lymph Node Metastasis Prediction: Development of a Knowledge-Augmented Framework

JMIR Med Inform 2026;14:e86700

DOI: 10.2196/86700

PMID: 42330511

PMCID: 13286326

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 29, 2025

Date Accepted: Jun 3, 2026

Leveraging Large Language Models to Integrate Clinical Knowledge and Machine Learning Predictions for Lymph Node Metastasis Prediction: A Knowledge-Augmented Framework

ABSTRACT

Citation

Copyright