Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jun 7, 2024
Date Accepted: Feb 23, 2025

The final, peer-reviewed published version of this preprint can be found here:

Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis

Loh DR, Hill E, Liu N, Dawson G, Engelhard M

Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis

JMIR AI 2025;4:e62985

DOI: 10.2196/62985

PMID: 40605770

PMCID: 12223692

On the Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-To-Event Approach: Empirical Analysis

  • De Rong Loh; 
  • Elliot Hill; 
  • Nan Liu; 
  • Geraldine Dawson; 
  • Matthew Engelhard

ABSTRACT

Background:

A major challenge in using electronic health records (EHR) is the inconsistency of patient follow-up, resulting in right-censored outcomes. This becomes particularly problematic in long-horizon event predictions, such as autism and attention-deficit/hyperactivity disorder (ADHD) diagnoses, where a significant number of patients are lost to follow-up before the outcome can be observed. Consequently, fully supervised methods like binary classification (BC), which are trained to predict observed diagnoses, are substantially affected by the probability of sufficient follow-up, leading to biased results.

Objective:

This empirical analysis aims to characterize inherent limitations of BC for long-horizon diagnosis prediction from EHR; and quantify the benefits of a specific time-to-event (TTE) approach, the discrete-time neural network (DTNN).

Methods:

Records within the Duke University Health System EHR were analyzed, extracting features such as ICD-10 diagnosis codes, medications, labs, and procedures. We compared a DTNN to three BC approaches and a deep cox proportional hazards model across four clinical conditions to examine distributional patterns across various sub-groups. Time-varying AUCt and APt were our primary evaluation metrics.

Results:

TTE models consistently had comparable or higher AUCt and APt than BC for all conditions. The probabilities predicted by BC models were positively correlated with censoring times, particularly for autism and ADHD prediction. Filtering strategies based on year-of-birth or length of follow-up only partially corrected these biases. In subgroup analyses, only DTNN predicted diagnosis probabilities that accurately reflect actual clinical prevalence and temporal trends.

Conclusions:

BC models substantially underpredicted diagnosis likelihood and inappropriately assigned lower probability scores to individuals with earlier censoring. Common filtering strategies did not adequately address this limitation. TTE approaches, particularly DTNN, effectively mitigated bias from the censoring distribution, resulting in superior discrimination and calibration performance and more accurate prediction of clinical prevalence. Machine learning practitioners should recognize the limitations of BC for long-horizon diagnosis prediction and adopt TTE approaches. The DTNN in particular is well-suited to mitigate effects of right-censoring and maximize prediction performance in this setting.


 Citation

Please cite as:

Loh DR, Hill E, Liu N, Dawson G, Engelhard M

Limitations of Binary Classification for Long-Horizon Diagnosis Prediction and Advantages of a Discrete-Time Time-to-Event Approach: Empirical Analysis

JMIR AI 2025;4:e62985

DOI: 10.2196/62985

PMID: 40605770

PMCID: 12223692

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.