Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jul 26, 2023
Open Peer Review Period: Jul 25, 2023 - Sep 19, 2023
Date Accepted: Dec 16, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Settings: Natural Language Processing Approach

Xie F, Chang J, Luong T, Wu B, Lustigova E, Shrader E, Chen W

Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Settings: Natural Language Processing Approach

JMIR AI 2024;3:e51240

DOI: 10.2196/51240

PMID: 38875566

PMCID: 11041417

Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Setting: A Natural Language Processing Approach

  • Fagen Xie; 
  • Jenny Chang; 
  • Tiffany Luong; 
  • Bechien Wu; 
  • Eva Lustigova; 
  • Eva Shrader; 
  • Wansu Chen

ABSTRACT

Background:

Pancreatic cancer is the 3rd leading cause of cancer deaths in the US. Pancreatic ductal adenocarcinoma (PDAC) is the most common form of pancreatic cancer accounting for up to 90% of all cases. Patient-reported symptoms are often the triggers of cancer diagnosis and therefore, understanding the PDAC-associated symptoms and the timing of symptom onset could facilitate early detection of PDAC.

Objective:

This paper aims to develop a natural language process (NLP) algorithm to capture symptoms associated with PDAC from clinical notes within a large integrated healthcare system.

Methods:

We utilized unstructured data within two years prior to PDAC diagnosis between 2010-2019 and their matched patients without PDAC to identify seventeen PDAC-related symptoms. Related terms/phrases were first compiled from publicly available resources, then reviewed and enriched with input from clinicians and chart review recursively. A computerized NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review followed by adjudication. Finally, the developed algorithm was applied to the validation dataset to assess performance, and to the study implementation notes.

Results:

A total of 408,147 and 709,789 notes were retrieved among 2,611 patients with PDAC and their matched 10,085 patients without PDAC, respectively. In descending order, the symptom distribution of the study implementation notes ranged from 4.98% (abdominal/epigastric pain) to 0.05% (upper extremity DVT symptom) for PDAC group, and from 1.75% (back pain) to 0.01% (pale stool) for non-PDAC group. Validation of the NLP algorithm against adjudicated chart review results of 1,000 notes showed that precision ranged from 98.9% (jaundice) to 84.0% (upper extremity DVT symptom), recall from 98.1% (weight loss) to 82.8% (epigastric bloating), and F1 score from 0.97 (jaundice) to 0.86 (depression).

Conclusions:

The developed and validated NLP algorithm could be utilized for the early detection of PDAC.


 Citation

Please cite as:

Xie F, Chang J, Luong T, Wu B, Lustigova E, Shrader E, Chen W

Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Settings: Natural Language Processing Approach

JMIR AI 2024;3:e51240

DOI: 10.2196/51240

PMID: 38875566

PMCID: 11041417

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.