Accepted for/Published in: JMIR AI
Date Submitted: Nov 23, 2022
Open Peer Review Period: Nov 23, 2022 - Jan 18, 2023
Date Accepted: Mar 31, 2023
(closed for review but you can still tweet)
Automated Extraction and Longitudinal Analysis of Ground Glass Opacity Features in Lung Cancer Patients Powered by Deep Learning-based Natural Language Processing
ABSTRACT
Background:
Ground-glass opacities (GGOs) appearing in computed tomography (CT) scans may indicate potential lung malignancy. Proper management of GGOs based on their features can prevent lung cancer (LCA) development. Electronic health records (EHRs) are rich sources of information on GGO nodules and their granular features, but most of the valuable information is embedded in unstructured clinical notes
Objective:
To develop, test, and validate a deep learning-based natural language processing (NLP) tool that automatically extracts GGO features to inform the longitudinal trajectory of GGO status from large-scale radiology notes.
Methods:
We developed a bidirectional-long-short-term memory with a conditional-random-field-based deep-learning NLP pipeline to extract GGO and granular features of GGO retrospectively from radiology notes of 13,216 lung cancer patients. We evaluated the pipeline with quality assessments and cohort characterization was analyzed on the distribution of nodule features longitudinally to assess changes in size and solidity over time.
Results:
Our NLP pipeline, built upon the GGO ontology we developed, achieved 95-100% precision, 89-100% recall, and 92-100% F1 scores on different GGO features. We deployed this GGO NLP model to extract and structure comprehensive characteristics of GGOs from 29,496 radiology notes of 4,521 lung cancer patients. Longitudinal analysis revealed that size increased in 17.5% of patients, decreased in 15.1%, and remained unchanged in 67.4% in their last note compared to the first note. Among 1,127 patients who had longitudinal radiology notes of GGO status, 815 patients (72.3%) were reported to have stable status and 259 patients (23%) had increased/progressed status in the subsequent notes.
Conclusions:
Our deep learning-based NLP pipeline can automatically extract granular GGO features at scale from EHRs when such information is documented in radiology notes and inform the natural history of GGO, which opens the way for a new paradigm in lung cancer prevention and early detection.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.