Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 27, 2022
Date Accepted: Aug 15, 2022

The final, peer-reviewed published version of this preprint can be found here:

Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection

Kiser A, Eilbeck K, Ferraro JP, Skarda DE, Samore MH, Bucher B

Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection

JMIR Med Inform 2022;10(8):e39057

DOI: 10.2196/39057

PMID: 36040784

PMCID: 9472055

Standard Vocabularies to Improve Machine Learning Model Transferability with Electronic Health Record Data: A Retrospective Cohort Study Using Healthcare-Associated Infection.

  • Amber Kiser; 
  • Karen Eilbeck; 
  • Jeffrey P Ferraro; 
  • David E Skarda; 
  • Matthew H Samore; 
  • Brian Bucher

ABSTRACT

Background:

With the widespread adoption of electronic healthcare records (EHR) by U.S. hospitals, there is an opportunity to leverage this data for the development of predictive algorithms to improve clinical care. A key barrier in model development and implementation includes external validation of model discrimination, which is rare and often results in worse performance. One reason why machine learning models are not externally generalizable is data heterogeneity. A potential solution to address the significant data heterogeneity between healthcare systems is to utilize standard vocabularies to map EHR data elements. The advantage of these vocabularies is a hierarchical relationship between elements which allows aggregation of specific clinical features to more general grouped concepts.

Objective:

To evaluate grouping EHR data using standard vocabularies to improve the transferability of machine learning models for the detection of postoperative healthcare-associated infections (HAIs) across institutions with different EHR systems.

Methods:

Surgical patients from University of Utah Health and Intermountain Healthcare from July 2014 to August 2017 with complete follow-up data were included. The primary outcome was an HAI within 30 days of the procedure. EHR data from 0-30 days after the operation was mapped to standard vocabularies and grouped using the hierarchical relationships of the vocabularies. Model performance was measured using the AUC and F1 score in internal and external validation. To evaluate model transferability, a difference-in-difference (DiD) metric was defined as the difference in performance drop between internal and external validations for baseline and grouped models.

Results:

A total of 5,775 patients were included from University of Utah and 15,434 patients from Intermountain Healthcare. The prevalence of selected outcomes was 5% surgical site infections (SSI), 0.8-1% pneumonia, 3% sepsis, and 0.8-0.9% urinary tract infections (UTI). In all outcomes, the grouping of data using standard vocabularies resulted in a reduced drop in AUC and F1 in external validation compared to baseline features (P<.01). The DiD metrics ranged from 0.005 to 0.248 for AUC and 0.075 to 0.216 for F1.

Conclusions:

We demonstrated grouping machine learning model features based on standard vocabularies improved model transferability between datasets across two institutions. Improving model transferability using standard vocabularies has the potential to improve the generalization of clinical prediction models across the healthcare system.


 Citation

Please cite as:

Kiser A, Eilbeck K, Ferraro JP, Skarda DE, Samore MH, Bucher B

Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care–Associated Infection

JMIR Med Inform 2022;10(8):e39057

DOI: 10.2196/39057

PMID: 36040784

PMCID: 9472055

The author of this paper has made a PDF available, but requires the user to login, or create an account.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.