JMIR Preprints #14330: Use of Machine Learning techniques for case-detection of Varicella Zoster using routinely collected textual ambulatory records

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Use of Machine Learning techniques for case-detection of Varicella Zoster using routinely collected textual ambulatory records

Corrado Lanera;
Paola Berchialla;
Ileana Baldi;
Giulia Lorenzoni;
Lara Tramontan;
Antonio Scamarcia;
Luigi Cantarutti;
Carlo Giaquinto;
Dario Gregori

ABSTRACT

Background:

The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns.

Objective:

Purpose of this paper is to compare Machine Learning Techniques with application to EHR analysis for disease detection.

Methods:

The PEDIANET database [1] was used as a data source for a real-world scenario on the identification of cases of varicella. The models’ training and test sets were based on two different Italian regions’ dataset of 7,631 patients and 1,230,355 records, and 2,347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. GLMNet (GLMNet), Maximum Entropy (MAXENT) and LogitBoost (Boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The Document-Term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than 99% of sparsity ratio.

Results:

The highest test accuracy was reached by Boosting (96.0% and 95% CI (93.8%, 98.1%)). GLMNet delivered superior predictive accuracy compared to MAXENT (86.6% vs 66.0%). MAXENT and GLMNet predictions weakly agree with each other (AC1 = 0.60, 95% CI of (0.58, 0.62)), as well as with LogitBoost ((AC1 = 0.64, 95% CI of (0.63, 0.66) and AC1 = 0.53, 95% CI of (0.51, 0.55) respectively)).

Conclusions:

Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification.

Citation

Please cite as:

Lanera C, Berchialla P, Baldi I, Lorenzoni G, Tramontan L, Scamarcia A, Cantarutti L, Giaquinto C, Gregori D

Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study

JMIR Med Inform 2020;8(5):e14330

DOI: 10.2196/14330

PMID: 32369038

PMCID: 7238079

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 10, 2019

Open Peer Review Period: Apr 10, 2019 - Jun 5, 2019

Date Accepted: Dec 16, 2019

(closed for review but you can still tweet)

Use of Machine Learning techniques for case-detection of Varicella Zoster using routinely collected textual ambulatory records

ABSTRACT

Citation

Copyright