Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 15, 2021
Date Accepted: Jun 7, 2021

The final, peer-reviewed published version of this preprint can be found here:

Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach

Hu D, Li S, Wang Y, Zhang H, Wu N, Lu X

Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach

JMIR Med Inform 2021;9(7):e27955

DOI: 10.2196/27955

PMID: 34287213

PMCID: 8339987

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Automatic Extraction of Lung Cancer Staging Information from Chinese Computed Tomography Reports: Deep Learning Approach

  • Danqing Hu; 
  • Shaolei Li; 
  • Yuhong Wang; 
  • Huanyao Zhang; 
  • Nan Wu; 
  • Xudong Lu

ABSTRACT

Background:

Lung cancer is the leading cause of cancer death worldwide. Clinical staging of lung cancer plays a crucial role in treatment decision making and prognosis evaluation. However, in clinical practice, about one-half of the clinical stages of lung cancer patients are inconsistent with their pathological stages. As one of the most important diagnostic modalities for staging, chest computed tomography reports a wealth of information about cancer staging, but the free-text nature of the reports obstructs their computerized utilization.

Objective:

In this paper, we aim to automatically extract the staging-related information from CT reports to support accurate clinical staging.

Methods:

In this study, we developed an information extraction system to extract the staging-related information from CT reports. The system consisted of three parts, i.e., named entity recognition (NER), relation classification (RC), and question reasoning (QR). We first summarized 22 questions about lung cancer staging based on the TNM staging guideline. And then, two state-of-the-art NER algorithms were implemented to recognize the entities of interest. Next, we presented a novel RC method using the relation constraints to classify the relations between entities. Finally, a rule-based QR module was established to answer all questions by reasoning the results of NER and RC.

Results:

We evaluated the developed IE system on a clinical dataset containing 392 chest CT reports collected from the Department of Thoracic Surgery II of Peking University Cancer Hospital. The experimental results show that the Bi-LSTM-CRF outperforms the ID-CNN-CRF for the NER task with 77.27% and 89.96% macro F1 scores under the exact and inexact matching scheme, respectively. For the RC task, the proposed method, i.e., Attention-Bi-LSTM with relation constraints, achieves the best performances with 96.53% micro F1 score and 98.27% macro F1 score in comparison with CNN-MF and Attention-Bi-LSTM. Moreover, the rule-based QR module can correctly answer the staging questions by reasoning the extracted results of NER and RC, which achieves 93.56% macro F1 score and 94.73% micro F1 score for all 22 questions.

Conclusions:

We conclude that the developed IE system can effectively and accurately extract the information about lung cancer staging from the CT reports. Experimental results show that the extracted results have great potential for further utilization in stage verification and prediction to facilitate accurate clinical staging.


 Citation

Please cite as:

Hu D, Li S, Wang Y, Zhang H, Wu N, Lu X

Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach

JMIR Med Inform 2021;9(7):e27955

DOI: 10.2196/27955

PMID: 34287213

PMCID: 8339987

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.