Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Feb 8, 2023
Date Accepted: Sep 27, 2023

The final, peer-reviewed published version of this preprint can be found here:

Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study

Bentley KH, Madsen EM, Song E, Zhou Y, Castro V, Lee H, Lee YH, Smoller JW

Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study

JMIR Form Res 2024;8:e46364

DOI: 10.2196/46364

PMID: 38190236

PMCID: 10804255

Determining Distinct Suicide Attempts from Recurrent Electronic Health Record Codes: A Classification Study

  • Kate H Bentley; 
  • Emily M Madsen; 
  • Eugene Song; 
  • Yu Zhou; 
  • Victor Castro; 
  • Hyunjoon Lee; 
  • Younga H Lee; 
  • Jordan W Smoller

ABSTRACT

Background:

Prior suicide attempts are a relatively strong risk factor for future suicide attempts. There is growing interest in using longitudinal electronic health record (EHR) data to derive statistical risk prediction models for future suicide attempt and other suicidal behavior outcomes. However, model performance may be inflated by a largely unrecognized form of “data leakage” during model training: diagnostic codes for suicide attempt outcomes may refer to prior attempts that are also included in the model as predictors.

Objective:

We aimed to develop an automated rule for determining when documented suicide attempt diagnostic codes identify distinct suicide attempt events.

Methods:

From a large healthcare system’s EHR, we randomly sampled the suicide attempt codes for 300 patients with at least one pair of suicide attempt codes documented at least one but no more than 90 days apart. Supervised chart reviewers assigned the clinical setting(s) (i.e., emergency department [ED] versus non-ED), method(s) of suicide attempt, and inter-code interval (number of days). The probability (or positive predictive value, PPV) that the second suicide attempt code in a given pair of codes referred to a distinct suicide attempt event from its preceding suicide attempt code was calculated by clinical setting, method, and inter-code interval.

Results:

Of 1,105 code pairs reviewed, 82% were non-independent (i.e., the two codes referred to the same suicide attempt event). When the second code in a pair was documented in a clinical setting other than the ED, it represented a distinct suicide attempt less than 5% of the time. The more time elapsed between codes, the more likely the second code in a pair referred to a distinct suicide attempt event from its preceding code. Code pairs in which the second suicide attempt code was assigned in an ED at least 5 days after its preceding suicide attempt code had a PPV of 0.90.

Conclusions:

EHR-based suicide risk prediction models that include ICD codes for prior suicide attempts as a predictor may be highly susceptible to bias due to data leakage in model training. We derived a simple rule to distinguish ICD codes that reflect new, independent suicide attempts: suicide attempt codes documented in an ED setting at least 5 days after a preceding suicide attempt code can be confidently treated as new events in EHR-based suicide risk prediction models. This rule has the potential to minimize upward bias of model performance when prior suicide attempts are included as predictors in EHR-based suicide risk prediction models.


 Citation

Please cite as:

Bentley KH, Madsen EM, Song E, Zhou Y, Castro V, Lee H, Lee YH, Smoller JW

Determining Distinct Suicide Attempts From Recurrent Electronic Health Record Codes: Classification Study

JMIR Form Res 2024;8:e46364

DOI: 10.2196/46364

PMID: 38190236

PMCID: 10804255

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.