Accepted for/Published in: JMIR Formative Research
Date Submitted: Feb 8, 2023
Date Accepted: Sep 27, 2023
Determining Distinct Suicide Attempts from Recurrent Electronic Health Record Codes: A Classification Study
ABSTRACT
Background:
Prior suicide attempts are a relatively strong risk factor for future suicide attempts. There is growing interest in using longitudinal electronic health record (EHR) data to derive statistical risk prediction models for future suicide attempt and other suicidal behavior outcomes. However, model performance may be inflated by a largely unrecognized form of “data leakage” during model training: diagnostic codes for suicide attempt outcomes may refer to prior attempts that are also included in the model as predictors.
Objective:
We aimed to develop an automated rule for determining when documented suicide attempt diagnostic codes identify distinct suicide attempt events.
Methods:
From a large healthcare system’s EHR, we randomly sampled the suicide attempt codes for 300 patients with at least one pair of suicide attempt codes documented at least one but no more than 90 days apart. Supervised chart reviewers assigned the clinical setting(s) (i.e., emergency department [ED] versus non-ED), method(s) of suicide attempt, and inter-code interval (number of days). The probability (or positive predictive value, PPV) that the second suicide attempt code in a given pair of codes referred to a distinct suicide attempt event from its preceding suicide attempt code was calculated by clinical setting, method, and inter-code interval.
Results:
Of 1,105 code pairs reviewed, 82% were non-independent (i.e., the two codes referred to the same suicide attempt event). When the second code in a pair was documented in a clinical setting other than the ED, it represented a distinct suicide attempt less than 5% of the time. The more time elapsed between codes, the more likely the second code in a pair referred to a distinct suicide attempt event from its preceding code. Code pairs in which the second suicide attempt code was assigned in an ED at least 5 days after its preceding suicide attempt code had a PPV of 0.90.
Conclusions:
EHR-based suicide risk prediction models that include ICD codes for prior suicide attempts as a predictor may be highly susceptible to bias due to data leakage in model training. We derived a simple rule to distinguish ICD codes that reflect new, independent suicide attempts: suicide attempt codes documented in an ED setting at least 5 days after a preceding suicide attempt code can be confidently treated as new events in EHR-based suicide risk prediction models. This rule has the potential to minimize upward bias of model performance when prior suicide attempts are included as predictors in EHR-based suicide risk prediction models.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.