Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: May 2, 2020
Date Accepted: Sep 24, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
A Clinical Context-aware Automated Summarization using Deep Neural Network
ABSTRACT
Background:
Automatic Text Summarization (ATS) enables users to retrieve meaningful evidence from Big Data of biomedical repositories to make complex clinical decisions. Deep neural and recurrent networks outperform traditional machine learning techniques in areas of natural language processing and computer vision; however, they are yet to be explored in ATS domain, particularly for medical text summarization.
Objective:
Traditional approaches in ATS for biomedical text suffer from fundamental issues such as inability to capture clinical context, quality of evidence, and purpose-driven selection of text for the summary. Our aim is to circumvent these limitations through a precise, succinct, and coherent information extraction from credible published biomedical resources and to construct a simplified summary containing the most informative contents that provide review particular to the clinical needs.
Methods:
In our proposed approach, we introduce a novel framework, Biomed-Summarizer, that provides quality-aware PICO (patient/problem, intervention, comparison, and outcome) based intelligent and context-enabled summarization of biomedical text. Biomed-Summarizer integrates prognosis quality recognition (PQR) model with clinical context aware (CCA) model in order to locate text sequences in the body of biomedical article to use in the final summary. First, we develop a deep neural network (NN) binary classifier for quality recognition to acquire scientifically sound studies and filter out others. Second, we develop a Bi-LSTM (Bidirectional Long-Short Term Memory) recurrent neural network, CCA Classifier, trained on semantically enriched features generated using word embedding tokenizer for identification of meaningful sentences representing PICO text sequences. Third, we calculate similarity between query and PICO text sequences using Jaccard similarity with semantic enrichments (JS2E) where the semantic enrichments are obtained using medical ontologies. Last, we generate a representative summary from the high score PICO sequences aggregated with study type, publication venue’s credibility, and freshness score.
Results:
Evaluation of PQR model, using large dataset of biomedical literature of intracranial aneurysm, shows an accuracy of 95.41% (2562/2686) in terms of recognizing quality articles. The CCA multi-class classifier outperforms the traditional machine learning algorithms, including Support Vector Machine, Gradient Boosted Tree, Linear Regression, K Nearest Neighbor, and Naïve Bayes by achieving 93% (16127/17341) accuracy for classifying five categories- APIRO i.e. aim, population, intervention, results, ad outcome. The semantic similarity algorithm achieves significant Pearson Correlation Coefficient (PCC) of 0.61 (0~1 scale) on a well-known BIOSSES Dataset (100 pair sentences) after semantic enrichment, an improvement of 8.9% over baseline Jaccard Similarity. Finally, we found a highly positive correlation among the evaluations performed by three domain experts with respect to different metrics suggesting that the automated summarization is satisfactory.
Conclusions:
By employing proposed method, Biomed-Summarizer achieves high accuracy in ATS enabling seamless curation of research evidence from biomedical literature to use in clinical decisions.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.