JMIR Preprints #27955: Automatic Extraction of Lung Cancer Staging Information from Chinese Computed Tomography Reports: Deep Learning Approach

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Automatic Extraction of Lung Cancer Staging Information from Chinese Computed Tomography Reports: Deep Learning Approach

Danqing Hu;
Shaolei Li;
Yuhong Wang;
Huanyao Zhang;
Nan Wu;
Xudong Lu

ABSTRACT

Background:

Lung cancer is the leading cause of cancer death worldwide. Clinical staging of lung cancer plays a crucial role in treatment decision making and prognosis evaluation. However, in clinical practice, about one-half of the clinical stages of lung cancer patients are inconsistent with their pathological stages. As one of the most important diagnostic modalities for staging, chest computed tomography reports a wealth of information about cancer staging, but the free-text nature of the reports obstructs their computerized utilization.

Objective:

In this paper, we aim to automatically extract the staging-related information from CT reports to support accurate clinical staging.

Methods:

In this study, we developed an information extraction system to extract the staging-related information from CT reports. The system consisted of three parts, i.e., named entity recognition (NER), relation classification (RC), and question reasoning (QR). We first summarized 22 questions about lung cancer staging based on the TNM staging guideline. And then, two state-of-the-art NER algorithms were implemented to recognize the entities of interest. Next, we presented a novel RC method using the relation constraints to classify the relations between entities. Finally, a rule-based QR module was established to answer all questions by reasoning the results of NER and RC.

Results:

We evaluated the developed IE system on a clinical dataset containing 392 chest CT reports collected from the Department of Thoracic Surgery II of Peking University Cancer Hospital. The experimental results show that the Bi-LSTM-CRF outperforms the ID-CNN-CRF for the NER task with 77.27% and 89.96% macro F1 scores under the exact and inexact matching scheme, respectively. For the RC task, the proposed method, i.e., Attention-Bi-LSTM with relation constraints, achieves the best performances with 96.53% micro F1 score and 98.27% macro F1 score in comparison with CNN-MF and Attention-Bi-LSTM. Moreover, the rule-based QR module can correctly answer the staging questions by reasoning the extracted results of NER and RC, which achieves 93.56% macro F1 score and 94.73% micro F1 score for all 22 questions.

Conclusions:

We conclude that the developed IE system can effectively and accurately extract the information about lung cancer staging from the CT reports. Experimental results show that the extracted results have great potential for further utilization in stage verification and prediction to facilitate accurate clinical staging.

Citation

Please cite as:

Hu D, Li S, Wang Y, Zhang H, Wu N, Lu X

Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach

JMIR Med Inform 2021;9(7):e27955

DOI: 10.2196/27955

PMID: 34287213

PMCID: 8339987

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 15, 2021

Date Accepted: Jun 7, 2021

Automatic Extraction of Lung Cancer Staging Information from Chinese Computed Tomography Reports: Deep Learning Approach

ABSTRACT

Citation

Copyright