JMIR Preprints #85255: Predicting Laboratory Test Utilization in Emergency Departments: A Machine Learning Study Integrating Structured and Unstructured Electronic Health Records

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Predicting Laboratory Test Utilization in Emergency Departments: A Machine Learning Study Integrating Structured and Unstructured Electronic Health Records

Xingyu Zhang;
Haipeng Ling;
Xin Zhang;
Anao Zhang

ABSTRACT

Background:

Laboratory testing is a critical component of diagnostic decision-making in emergency departments (EDs), yet overutilization contributes to patient burden and excess healthcare costs. Machine learning (ML) approaches integrating structured and unstructured electronic health record (EHR) data offer potential to improve the prediction of lab test utilization and guide more efficient diagnostic workflows.

Objective:

The objective of our study was to develop and evaluate machine learning models using structured and unstructured EHR data to predict laboratory test utilization during ED visits, thereby supporting evidence-informed diagnostic stewardship.

Methods:

We analyzed 15,115 adult visits from the 2021 National Hospital Ambulatory Medical Care Survey–Emergency Department (NHAMCS-ED). Structured predictors included demographic characteristics, vital signs, medical history, insurance, and visit attributes. Unstructured free-text fields (chief complaints and injury descriptions) were processed using a pretrained Bidirectional Encoder Representations from Transformers (BERT) model to generate text embeddings. Four supervised ML models—logistic regression, random forest, gradient boosting, and extreme gradient boosting—were trained under four configurations: structured-only, unstructured-only, combined, and mean probability ensemble. Model performance was assessed using area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score.

Results:

Among adult ED visits, 59.3% included laboratory test orders. The combined model integrating structured and unstructured data achieved the highest predictive performance (AUC = 0.83), outperforming structured-only (AUC = 0.78), unstructured-only (AUC = 0.74), and ensemble (AUC = 0.81) configurations. Strong predictors of laboratory testing included older age, ambulance arrival, abnormal vital signs, and chronic conditions such as hypertension, diabetes, chronic kidney disease, and cancer, whereas injury-related visits were associated with lower testing likelihood.

Conclusions:

As shown by the comparative performance of ML models, integrating structured and unstructured EHR data enhances prediction of laboratory test utilization in ED settings. These findings support the feasibility of developing real-time clinical decision support tools that promote more efficient, patient-centered, and evidence-based diagnostic practices.

Citation

Please cite as:

Zhang X, Ling H, Zhang X, Zhang A

Predicting Laboratory Test Ordering in Emergency Departments Using Integrated Structured and Unstructured Electronic Health Records: Machine Learning Study

JMIR Med Inform 2026;14:e85255

DOI: 10.2196/85255

PMID: 42296534

PMCID: 13268631

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 3, 2025

Date Accepted: May 17, 2026

Predicting Laboratory Test Utilization in Emergency Departments: A Machine Learning Study Integrating Structured and Unstructured Electronic Health Records

ABSTRACT

Citation