Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 12, 2025
Open Peer Review Period: Jun 12, 2025 - Aug 7, 2025
Date Accepted: Oct 23, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

May P, Greß J, Seidel C, Sommer S, Schuler M, Nokodian S, Schröder F, Jung J

Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

JMIR Med Inform 2025;13:e78332

DOI: 10.2196/78332

PMID: 41328496

PMCID: 12670046

Enabling Just-in-Time Clinical Oncology Analysis with Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

  • Peter May; 
  • Julian Greß; 
  • Christoph Seidel; 
  • Sebastian Sommer; 
  • Markus Schuler; 
  • Sina Nokodian; 
  • Florian Schröder; 
  • Johannes Jung

ABSTRACT

Background:

Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While Large Language Models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives – potentially bypassing intermediate structured databases for many analytical tasks – remains largely unexplored.

Objective:

This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by: 1) performing high-fidelity multiparameter data extraction, 2) answering complex clinical queries directly from raw text, 3) automating multi-step survival analyses including executable code generation, and 4) generating novel, clinically plausible hypotheses from free-text documentation.

Methods:

A synthetic dataset of 240 unstructured medical reports from stage IV non-small cell lung cancer (NSCLC) patients, embedding 14 predefined clinical variables, was used. Gemini 2.5 Pro was assessed on the four core JIT capabilities. Performance was measured by: extraction accuracy (compared to human annotation on n=40 reports and across the full n=240 dataset), numerical deviation for direct question answering (n=40 to 240 letters, 5 questions), log-rank concordance for LLM-generated vs. ground-truth Kaplan-Meier survival analyses (OS and PFS from n=80 and n=160 reports), and clinical plausibility of LLM-generated hypotheses from the full dataset (n=240 reports).

Results:

For multiparameter extraction from n=40 reports, the LLM achieved >99% average accuracy, comparable to a human annotator (Friedman test, p=0.139), but in significantly less time (LLM: 3.7 minutes vs. Human: 133.8 minutes). Across the full 240-report dataset, LLM multiparameter extraction maintained >98% accuracy for most variables. The LLM answered multi-conditional clinical queries directly from raw text with a relative deviation typically below 1% and rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth for OS (log-rank p=0.99) and PFS (log-rank p=0.89). Subgroup PFS analysis (driver mutation vs. wild type, n=160) was also accurately replicated (log-rank p < 0.0001), with comparable median PFS (e.g., Driver: LLM 26.0 vs. Ground Truth 28.0 months). Furthermore, the LLM generated clinically plausible hypotheses regarding biomarker–outcome associations and toxicities without specific prompting.

Conclusions:

LLMs can enable a paradigm shift towards dynamic, just-in-time clinical analysis and knowledge discovery directly from narrative data, offering a powerful alternative or complement to traditional registry architectures for many research and analytical needs. This suggests a future of AI-assisted, “living” oncology ecosystems capable of supporting timely, scalable, and hypothesis-driven research. Rigorous validation on real-world, multi-institutional datasets, with careful attention to ethics and data privacy, is essential before clinical implementation.


 Citation

Please cite as:

May P, Greß J, Seidel C, Sommer S, Schuler M, Nokodian S, Schröder F, Jung J

Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data

JMIR Med Inform 2025;13:e78332

DOI: 10.2196/78332

PMID: 41328496

PMCID: 12670046

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.