JMIR Preprints #39057: Standard Vocabularies to Improve Machine Learning Model Transferability with Electronic Health Record Data: A Retrospective Cohort Study Using Healthcare-Associated Infection.

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Standard Vocabularies to Improve Machine Learning Model Transferability with Electronic Health Record Data: A Retrospective Cohort Study Using Healthcare-Associated Infection.

Amber Kiser;
Karen Eilbeck;
Jeffrey P Ferraro;
David E Skarda;
Matthew H Samore;
Brian Bucher

ABSTRACT

Background:

With the widespread adoption of electronic healthcare records (EHR) by U.S. hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes external validation of model discrimination, which is rare and often results in worse performance. One reason why machine learning models are not externally generalizable is data heterogeneity. A potential solution to address the significant data heterogeneity between healthcare systems is to utilize standard vocabularies to map EHR data elements. The advantage of these vocabularies is a hierarchical relationship between elements which allows aggregation of specific clinical features to more general grouped concepts.

Objective:

To evaluate grouping EHR data using standard vocabularies to improve the transferability of machine learning models for the detection of postoperative healthcare-associated infections (HAIs) across institutions with different EHR systems.

Methods:

Surgical patients from University of Utah Health and Intermountain Healthcare from July 2014 to August 2017 with complete follow-up data were included. The primary outcome was an HAI within 30 days of the procedure. EHR data from 0-30 days after the operation was mapped to standard vocabularies and grouped using the hierarchical relationships of the vocabularies. Model performance was measured using the AUC and F1 score in internal and external validation. To evaluate model transferability, a difference-in-difference (DiD) metric was defined as the difference in performance drop between internal and external validations for baseline and grouped models.

Results:

A total of 5,775 patients were included from University of Utah and 15,434 patients from Intermountain Healthcare. The prevalence of selected outcomes was 5% surgical site infections (SSI), 0.8-1% pneumonia, 3% sepsis, and 0.8-0.9% urinary tract infections (UTI). In all outcomes, the grouping of data using standard vocabularies resulted in a reduced drop in AUC and F1 in external validation compared to baseline features (P<.01). The DiD metrics ranged from 0.005 to 0.248 for AUC and 0.075 to 0.216 for F1.

Conclusions:

We demonstrated grouping machine learning model features based on standard vocabularies improved model transferability between datasets across two institutions. Improving model transferability using standard vocabularies has the potential to improve the generalization of clinical prediction models across the healthcare system.

Citation

Please cite as:

Kiser A, Eilbeck K, Ferraro JP, Skarda DE, Samore MH, Bucher B

Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection

JMIR Med Inform 2022;10(8):e39057

DOI: 10.2196/39057

PMID: 36040784

PMCID: 9472055

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 27, 2022

Date Accepted: Aug 15, 2022

Standard Vocabularies to Improve Machine Learning Model Transferability with Electronic Health Record Data: A Retrospective Cohort Study Using Healthcare-Associated Infection.

ABSTRACT

Citation

Copyright

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 27, 2022

Date Accepted: Aug 15, 2022

Standard Vocabularies to Improve Machine Learning Model Transferability with Electronic Health Record Data: A Retrospective Cohort Study Using Healthcare-Associated Infection.

ABSTRACT

Citation

The author of this paper has made a PDF available, but requires the user to login, or create an account.

Copyright