Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Bioinformatics and Biotechnology

Date Submitted: Oct 22, 2024
Date Accepted: Jul 7, 2025

The final, peer-reviewed published version of this preprint can be found here:

Systemic Anticancer Therapy Timelines Extraction From Electronic Medical Records Text: Algorithm Development and Validation

Yao J, Goldner E, Hochheiser H, Finan S, Levander J, Harris D, Groen PCd, Buchbinder E, Bitterman D, Warner J, Savova G

Systemic Anticancer Therapy Timelines Extraction From Electronic Medical Records Text: Algorithm Development and Validation

JMIR Bioinform Biotech 2025;6:e67801

DOI: 10.2196/67801

PMID: 41342192

PMCID: 12408058

Systemic Anticancer Therapy Timelines Extraction from Electronic Medical Records Text: Algorithm Development and Validation

  • Jiarui Yao; 
  • Eli Goldner; 
  • Harry Hochheiser; 
  • Sean Finan; 
  • John Levander; 
  • David Harris; 
  • Piet C. de Groen; 
  • Elizabeth Buchbinder; 
  • Danielle Bitterman; 
  • Jeremy Warner; 
  • Guergana Savova

ABSTRACT

Background:

The systemic treatment of cancer typically requires the use of multiple anticancer agents in combination and/or sequentially. Clinical narrative texts often contain extensive descriptions of the temporal sequencing of systemic anticancer therapy (SACT), setting up an important task that may be amenable to automated extraction of SACT timelines.

Objective:

We aimed to explore automatic methods for extracting patient-level SACT timelines from clinical narratives in the electronic medical records (EMRs).

Methods:

We used two datasets from two institutions: (1) THYME dataset, with 199 patients with colorectal cancer, and (2) 2024 ChemoTimelines shared task dataset with 149 patients with ovarian cancer, breast cancer and melanoma. We explored finetuning smaller language models trained to attend to events and time expressions, and few-shot prompting of Large Language Models (LLMs). Evaluation used the 2024 ChemoTimelines shared task configuration – Subtask1 involving the construction of SACT timelines from manually annotated SACT event and time expression mentions provided as input in addition to the patient’s notes, and Subtask2 requiring extraction of SACT timelines directly from the patient’s notes.

Results:

Our task-specific finetuned EntityBERT model achieved 93% F1 score, outperforming the best results in Subtask1 of the 2024 ChemoTimelines shared task (90%). It ranked second in Subtask2. LLM (LLaMA2, Mixtral) performance lagged the task-specific finetuned model performance for both the THYME and shared task datasets. On the shared task datasets, the best LLM performance was 77% F1, 16% lower than the task-specific finetuned system.

Conclusions:

In this paper, we explored approaches for patient-level timeline extraction through the SACT timeline extraction task. Our results and analysis add to the knowledge of extracting treatment timelines from EMR clinical narratives using Natural Language Processing methods.


 Citation

Please cite as:

Yao J, Goldner E, Hochheiser H, Finan S, Levander J, Harris D, Groen PCd, Buchbinder E, Bitterman D, Warner J, Savova G

Systemic Anticancer Therapy Timelines Extraction From Electronic Medical Records Text: Algorithm Development and Validation

JMIR Bioinform Biotech 2025;6:e67801

DOI: 10.2196/67801

PMID: 41342192

PMCID: 12408058

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.