JMIR Preprints #78332: Enabling Just-in-Time Clinical Oncology Analysis with Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Enabling Just-in-Time Clinical Oncology Analysis with Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

Peter May;
Julian Greß;
Christoph Seidel;
Sebastian Sommer;
Markus Schuler;
Sina Nokodian;
Florian Schröder;
Johannes Jung

ABSTRACT

Background:

Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While Large Language Models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives – potentially bypassing intermediate structured databases for many analytical tasks – remains largely unexplored.

Objective:

This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by: 1) performing high-fidelity multiparameter data extraction, 2) answering complex clinical queries directly from raw text, 3) automating multi-step survival analyses including executable code generation, and 4) generating novel, clinically plausible hypotheses from free-text documentation.

Methods:

A synthetic dataset of 240 unstructured medical reports from stage IV non-small cell lung cancer (NSCLC) patients, embedding 14 predefined clinical variables, was used. Gemini 2.5 Pro was assessed on the four core JIT capabilities. Performance was measured by: extraction accuracy (compared to human annotation on n=40 reports and across the full n=240 dataset), numerical deviation for direct question answering (n=40 to 240 letters, 5 questions), log-rank concordance for LLM-generated vs. ground-truth Kaplan-Meier survival analyses (OS and PFS from n=80 and n=160 reports), and clinical plausibility of LLM-generated hypotheses from the full dataset (n=240 reports).

Results:

For multiparameter extraction from n=40 reports, the LLM achieved >99% average accuracy, comparable to a human annotator (Friedman test, p=0.139), but in significantly less time (LLM: 3.7 minutes vs. Human: 133.8 minutes). Across the full 240-report dataset, LLM multiparameter extraction maintained >98% accuracy for most variables. The LLM answered multi-conditional clinical queries directly from raw text with a relative deviation typically below 1% and rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth for OS (log-rank p=0.99) and PFS (log-rank p=0.89). Subgroup PFS analysis (driver mutation vs. wild type, n=160) was also accurately replicated (log-rank p < 0.0001), with comparable median PFS (e.g., Driver: LLM 26.0 vs. Ground Truth 28.0 months). Furthermore, the LLM generated clinically plausible hypotheses regarding biomarker–outcome associations and toxicities without specific prompting.

Conclusions:

LLMs can enable a paradigm shift towards dynamic, just-in-time clinical analysis and knowledge discovery directly from narrative data, offering a powerful alternative or complement to traditional registry architectures for many research and analytical needs. This suggests a future of AI-assisted, “living” oncology ecosystems capable of supporting timely, scalable, and hypothesis-driven research. Rigorous validation on real-world, multi-institutional datasets, with careful attention to ethics and data privacy, is essential before clinical implementation.

Citation

Please cite as:

May P, Greß J, Seidel C, Sommer S, Schuler M, Nokodian S, Schröder F, Jung J

Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

JMIR Med Inform 2025;13:e78332

DOI: 10.2196/78332

PMID: 41328496

PMCID: 12670046

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 12, 2025

Open Peer Review Period: Jun 12, 2025 - Aug 7, 2025

Date Accepted: Oct 23, 2025

(closed for review but you can still tweet)

Enabling Just-in-Time Clinical Oncology Analysis with Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

ABSTRACT

Citation

Copyright