Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 10, 2019
Open Peer Review Period: Apr 10, 2019 - Jun 5, 2019
Date Accepted: Dec 16, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study

Lanera C, Berchialla P, Baldi I, Lorenzoni G, Tramontan L, Scamarcia A, Cantarutti L, Giaquinto C, Gregori D

Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study

JMIR Med Inform 2020;8(5):e14330

DOI: 10.2196/14330

PMID: 32369038

PMCID: 7238079

Use of Machine Learning techniques for case-detection of Varicella Zoster using routinely collected textual ambulatory records

  • Corrado Lanera; 
  • Paola Berchialla; 
  • Ileana Baldi; 
  • Giulia Lorenzoni; 
  • Lara Tramontan; 
  • Antonio Scamarcia; 
  • Luigi Cantarutti; 
  • Carlo Giaquinto; 
  • Dario Gregori

ABSTRACT

Background:

The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns.

Objective:

Purpose of this paper is to compare Machine Learning Techniques with application to EHR analysis for disease detection.

Methods:

The PEDIANET database [1] was used as a data source for a real-world scenario on the identification of cases of varicella. The models’ training and test sets were based on two different Italian regions’ dataset of 7,631 patients and 1,230,355 records, and 2,347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. GLMNet (GLMNet), Maximum Entropy (MAXENT) and LogitBoost (Boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The Document-Term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than 99% of sparsity ratio.

Results:

The highest test accuracy was reached by Boosting (96.0% and 95% CI (93.8%, 98.1%)). GLMNet delivered superior predictive accuracy compared to MAXENT (86.6% vs 66.0%). MAXENT and GLMNet predictions weakly agree with each other (AC1 = 0.60, 95% CI of (0.58, 0.62)), as well as with LogitBoost ((AC1 = 0.64, 95% CI of (0.63, 0.66) and AC1 = 0.53, 95% CI of (0.51, 0.55) respectively)).

Conclusions:

Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification.


 Citation

Please cite as:

Lanera C, Berchialla P, Baldi I, Lorenzoni G, Tramontan L, Scamarcia A, Cantarutti L, Giaquinto C, Gregori D

Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study

JMIR Med Inform 2020;8(5):e14330

DOI: 10.2196/14330

PMID: 32369038

PMCID: 7238079

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.