Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Oct 30, 2021
Date Accepted: May 27, 2022

The final, peer-reviewed published version of this preprint can be found here:

Linguistic Pattern–Infused Dual-Channel Bidirectional Long Short-term Memory With Attention for Dengue Case Summary Generation From the Program for Monitoring Emerging Diseases–Mail Database: Algorithm Development Study

Chang YC, Chiu YW, Chuang TW

Linguistic Pattern–Infused Dual-Channel Bidirectional Long Short-term Memory With Attention for Dengue Case Summary Generation From the Program for Monitoring Emerging Diseases–Mail Database: Algorithm Development Study

JMIR Public Health Surveill 2022;8(7):e34583

DOI: 10.2196/34583

PMID: 35830225

PMCID: 9491834

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Summary Generation of Dengue Outbreaks from ProMED-mail Database using a Linguistic Pattern-infused Dual-channel BiLSTM

  • Yung-Chun Chang; 
  • Yu-Wen Chiu; 
  • Ting-Wu Chuang

ABSTRACT

Background:

Globalization and environmental changes have increased the emergence and re-emergence of infectious diseases worldwide. The collaboration of regional infectious disease surveillance systems is critical but difficult to achieve because of the different transparency levels of health information sharing systems among countries. ProMED-mail is the most comprehensive expert-curated platform that provides rich outbreak information among humans, animals, and plants from different countries. However, owing to unstructured text content in reports, it is difficult to analyze them for further applications. Therefore, we have devised an idea to develop an automatic summary of the alerting articles from ProMED-mail. In this research, we propose a text summarization method that uses natural language processing to extract important sentences automatically from alert articles in ProMED emails to generate summaries of dengue outbreaks in Southeast Asia. Our method, can be used to capture crucial information quickly and make decisions for epidemic surveillance.

Objective:

To generate automatic summaries of unstructured text content from reports.

Methods:

Our materials come from the ProMED-mail website, spanning a period from 1994 to 2019. The collected data were annotated by professionals to establish a unique Taiwan dengue corpus through, which achieved almost perfect agreement (90% Cohen’s Kappa statistic). To generate a ProMED-mail summary, we developed a dual-channel bidirectional long-short term memory with an attention mechanism that infuses latent syntactic features to identify crucial sentences from the alerting articles.

Results:

Our method is superior to many well-known machine learning and neural network approaches in identifying important sentences, achieving a macro average F1-score of 93%. Moreover, the method can successfully extract key information about dengue fever outbreaks in ProMED-mail, and help researchers or public health practitioners to capture important summaries quickly. Besides verifying the model, we also recruited five professional experts and five students from related fields to carry out a satisfaction survey on the generated summary. The results showed that 83.6% of the summaries received high satisfaction ratings.

Conclusions:

The proposed approach successfully fuses latent syntactic features into a deep neural network to analyze syntactic, semantic, and content information in the text. It then exploits the derived information to identify the crucial sentences in ProMED-mail. The experimental results show that the proposed method is effective and outperforms the comparisons. In addition, our method demonstrated the potential for summary generation from ProMED-mail. When a new alerting article arrives, public health decision makers can identify the outbreak information in a lengthy article quickly and deliver immediate responses to disease control and prevention. Clinical Trial: NA


 Citation

Please cite as:

Chang YC, Chiu YW, Chuang TW

Linguistic Pattern–Infused Dual-Channel Bidirectional Long Short-term Memory With Attention for Dengue Case Summary Generation From the Program for Monitoring Emerging Diseases–Mail Database: Algorithm Development Study

JMIR Public Health Surveill 2022;8(7):e34583

DOI: 10.2196/34583

PMID: 35830225

PMCID: 9491834

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.