JMIR Preprints #89957: Optimizing Clinical Temporal Relation Extraction with Large Language Models: Comparative Analysis

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Optimizing Clinical Temporal Relation Extraction with Large Language Models: Comparative Analysis

Jianping He;
Laila Rasmy;
Haifang Li;
Jianfu Li;
Zenan Sun;
Evan Yu;
Degui Zhi;
Cui Tao

ABSTRACT

Background:

Clinical Temporal Relation Extraction (CTRE) is essential for reconstructing patient timelines from unstructured Electronic Health Records (EHRs). However, the linguistic complexity of clinical notes and the high cost of expert annotation impede the development of large-scale training corpora. While Large Language Models (LLMs) have transformed general Natural Language Processing, their application to CTRE remains underexplored.

Objective:

This study aims to determine the optimal adaptation strategy for CTRE by conducting a comprehensive benchmarking of LLM architectures and fine-tuning methodologies in both data-rich and limited-data regimes.

Methods:

We evaluated four LLMs representing two distinct architectures: Transformer Encoders (GatorTron-Base, GatorTron-Large) and Transformer Decoders (LLaMA 3.1-8B, MeLLaMA-13B). We compared four adaptation strategies: (1) Standard Fine-Tuning, (2) Hard-Prompting, (3) Soft-Prompting, and (4) Low-Rank Adaptation (LoRA). Experiments were conducted on the 2012 i2b2 CTRE benchmark in both full-supervision and 1-shot scenarios.

Results:

We achieved results that exceed the current state-of-the-art (SOTA) on the 2012 i2b2 dataset. Comparative analysis reveals that hard-prompting consistently yields superior efficacy compared to standard fine-tuning. Regarding Parameter-Efficient Fine-Tuning (PEFT) strategies, Low-Rank Adaptation (LoRA) targeting query and value layers emerged as the optimal configuration. Conversely, soft-prompting demonstrated suboptimal performance, likely due to constraints on representational capacity. Architecturally, we observed a performance dichotomy based on data availability: Encoder-based models (GatorTron) exhibited superior stability and accuracy in few-shot scenarios, whereas Decoder-based models (LLaMA 3.1, MeLLaMA) demonstrated dominant performance in data-rich regimes.

Conclusions:

This study provides a rigorous roadmap for adapting LLMs to clinical extraction tasks. Based on our empirical findings, we recommend hard-prompting to maximize predictive accuracy and identify specific LoRA configurations (targeting query and value layers) as the preferred approach when computational efficiency is paramount. Furthermore, our findings suggest that while generative Decoders excel with abundant data, domain-specific Encoders remain the robust choice for few-shot clinical applications.

Citation

Please cite as:

He J, Rasmy L, Li H, Li J, Sun Z, Yu E, Zhi D, Tao C

Optimizing Clinical Temporal Relation Extraction with Large Language Models: Comparative Analysis

JMIR Preprints. 19/12/2025:89957

DOI: 10.2196/preprints.89957

URL: https://preprints.jmir.org/preprint/89957

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Informatics

Date Submitted: Dec 19, 2025

Open Peer Review Period: Jan 4, 2026 - Mar 1, 2026

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Optimizing Clinical Temporal Relation Extraction with Large Language Models: Comparative Analysis

ABSTRACT

Citation

Copyright