Accepted for/Published in: JMIR AI
Date Submitted: Jul 26, 2023
Open Peer Review Period: Jul 25, 2023 - Sep 19, 2023
Date Accepted: Dec 16, 2023
(closed for review but you can still tweet)
Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Setting: A Natural Language Processing Approach
ABSTRACT
Background:
Pancreatic cancer is the 3rd leading cause of cancer deaths in the US. Pancreatic ductal adenocarcinoma (PDAC) is the most common form of pancreatic cancer accounting for up to 90% of all cases. Patient-reported symptoms are often the triggers of cancer diagnosis and therefore, understanding the PDAC-associated symptoms and the timing of symptom onset could facilitate early detection of PDAC.
Objective:
This paper aims to develop a natural language process (NLP) algorithm to capture symptoms associated with PDAC from clinical notes within a large integrated healthcare system.
Methods:
We utilized unstructured data within two years prior to PDAC diagnosis between 2010-2019 and their matched patients without PDAC to identify seventeen PDAC-related symptoms. Related terms/phrases were first compiled from publicly available resources, then reviewed and enriched with input from clinicians and chart review recursively. A computerized NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review followed by adjudication. Finally, the developed algorithm was applied to the validation dataset to assess performance, and to the study implementation notes.
Results:
A total of 408,147 and 709,789 notes were retrieved among 2,611 patients with PDAC and their matched 10,085 patients without PDAC, respectively. In descending order, the symptom distribution of the study implementation notes ranged from 4.98% (abdominal/epigastric pain) to 0.05% (upper extremity DVT symptom) for PDAC group, and from 1.75% (back pain) to 0.01% (pale stool) for non-PDAC group. Validation of the NLP algorithm against adjudicated chart review results of 1,000 notes showed that precision ranged from 98.9% (jaundice) to 84.0% (upper extremity DVT symptom), recall from 98.1% (weight loss) to 82.8% (epigastric bloating), and F1 score from 0.97 (jaundice) to 0.86 (depression).
Conclusions:
The developed and validated NLP algorithm could be utilized for the early detection of PDAC.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.