Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 3, 2025
Date Accepted: May 17, 2026

The final, peer-reviewed published version of this preprint can be found here:

Predicting Laboratory Test Ordering in Emergency Departments Using Integrated Structured and Unstructured Electronic Health Records: Machine Learning Study

Zhang X, Ling H, Zhang X, Zhang A

Predicting Laboratory Test Ordering in Emergency Departments Using Integrated Structured and Unstructured Electronic Health Records: Machine Learning Study

JMIR Med Inform 2026;14:e85255

DOI: 10.2196/85255

PMID: 42296534

Predicting Laboratory Test Utilization in Emergency Departments: A Machine Learning Study Integrating Structured and Unstructured Electronic Health Records

  • Xingyu Zhang; 
  • Haipeng Ling; 
  • Xin Zhang; 
  • Anao Zhang

ABSTRACT

Background:

Laboratory testing is a critical component of diagnostic decision-making in emergency departments (EDs), yet overutilization contributes to patient burden and excess healthcare costs. Machine learning (ML) approaches integrating structured and unstructured electronic health record (EHR) data offer potential to improve the prediction of lab test utilization and guide more efficient diagnostic workflows.

Objective:

The objective of our study was to develop and evaluate machine learning models using structured and unstructured EHR data to predict laboratory test utilization during ED visits, thereby supporting evidence-informed diagnostic stewardship.

Methods:

We analyzed 15,115 adult visits from the 2021 National Hospital Ambulatory Medical Care Survey–Emergency Department (NHAMCS-ED). Structured predictors included demographic characteristics, vital signs, medical history, insurance, and visit attributes. Unstructured free-text fields (chief complaints and injury descriptions) were processed using a pretrained Bidirectional Encoder Representations from Transformers (BERT) model to generate text embeddings. Four supervised ML models—logistic regression, random forest, gradient boosting, and extreme gradient boosting—were trained under four configurations: structured-only, unstructured-only, combined, and mean probability ensemble. Model performance was assessed using area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score.

Results:

Among adult ED visits, 59.3% included laboratory test orders. The combined model integrating structured and unstructured data achieved the highest predictive performance (AUC = 0.83), outperforming structured-only (AUC = 0.78), unstructured-only (AUC = 0.74), and ensemble (AUC = 0.81) configurations. Strong predictors of laboratory testing included older age, ambulance arrival, abnormal vital signs, and chronic conditions such as hypertension, diabetes, chronic kidney disease, and cancer, whereas injury-related visits were associated with lower testing likelihood.

Conclusions:

As shown by the comparative performance of ML models, integrating structured and unstructured EHR data enhances prediction of laboratory test utilization in ED settings. These findings support the feasibility of developing real-time clinical decision support tools that promote more efficient, patient-centered, and evidence-based diagnostic practices.


 Citation

Please cite as:

Zhang X, Ling H, Zhang X, Zhang A

Predicting Laboratory Test Ordering in Emergency Departments Using Integrated Structured and Unstructured Electronic Health Records: Machine Learning Study

JMIR Med Inform 2026;14:e85255

DOI: 10.2196/85255

PMID: 42296534

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.