Accepted for/Published in: JMIR Bioinformatics and Biotechnology
Date Submitted: Oct 22, 2024
Date Accepted: Jul 7, 2025
Systemic Anticancer Therapy Timelines Extraction from Electronic Medical Records Text: Algorithm Development and Validation
ABSTRACT
Background:
The systemic treatment of cancer typically requires the use of multiple anticancer agents in combination and/or sequentially. Clinical narrative texts often contain extensive descriptions of the temporal sequencing of systemic anticancer therapy (SACT), setting up an important task that may be amenable to automated extraction of SACT timelines.
Objective:
We aimed to explore automatic methods for extracting patient-level SACT timelines from clinical narratives in the electronic medical records (EMRs).
Methods:
We used two datasets from two institutions: (1) THYME dataset, with 199 patients with colorectal cancer, and (2) 2024 ChemoTimelines shared task dataset with 149 patients with ovarian cancer, breast cancer and melanoma. We explored finetuning smaller language models trained to attend to events and time expressions, and few-shot prompting of Large Language Models (LLMs). Evaluation used the 2024 ChemoTimelines shared task configuration – Subtask1 involving the construction of SACT timelines from manually annotated SACT event and time expression mentions provided as input in addition to the patient’s notes, and Subtask2 requiring extraction of SACT timelines directly from the patient’s notes.
Results:
Our task-specific finetuned EntityBERT model achieved 93% F1 score, outperforming the best results in Subtask1 of the 2024 ChemoTimelines shared task (90%). It ranked second in Subtask2. LLM (LLaMA2, Mixtral) performance lagged the task-specific finetuned model performance for both the THYME and shared task datasets. On the shared task datasets, the best LLM performance was 77% F1, 16% lower than the task-specific finetuned system.
Conclusions:
In this paper, we explored approaches for patient-level timeline extraction through the SACT timeline extraction task. Our results and analysis add to the knowledge of extracting treatment timelines from EMR clinical narratives using Natural Language Processing methods.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.