JMIR Preprints #89071: Detecting Uncoded Self-Harm in Veterans’ Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Observational Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Detecting Uncoded Self-Harm in Veterans’ Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Observational Study

Praveen Kumar;
Alexandria D. Viszolay;
Rajesh Upadhayaya;
Fariha Moomtaheen;
Donald R. Greer;
Cristian G. Bologa;
Kristan A. Schneider;
Sharon E. Davis;
Michael E. Matheny;
David van der Goes;
Gerrardo Villarreal;
Yiliang Zhu;
Mauricio Tohen;
Scott A. Malec;
Jeremy J. Yang;
Elliot M. Fielstein;
Christophe Gerard Lambert

ABSTRACT

Background:

Suicide and self-harm remain major public health concerns in the United States. Early identification is critical for effective intervention, yet underdiagnosis and undercoding are common across mental health conditions, and only positive cases are typically labeled in healthcare data. As a result, reliable negative examples are missing. Positive and Unlabeled (PU) learning is well suited to such data, enabling estimation of phenotype prevalence and identification of undiagnosed individuals at elevated risk for self-harm as well as other mental illnesses.

Objective:

To identify U.S. Veterans whose self-harm events were not explicitly captured through diagnostic codes in electronic health records (EHRs) and estimate the prevalence of ever self-harm cases among Veterans using a novel PU learning algorithm applicable to undetected mental health diagnoses.

Methods:

We analyzed Veterans Health Administration EHRs for 1,329,120 Veterans with at least 2 years of observation. We applied our PULSNAR (Positive Unlabeled Learning Selected Not At Random) algorithm to estimate the proportion of individuals with uncoded self-harm. Four experts (raters) independently reviewed charts of 97 uncoded Veterans, each selected from 1% intervals of calibrated PULSNAR probabilities from 0.01 to 0.97. Agreement was assessed among raters, PULSNAR classifications, and consensus review decisions. Post-hoc calibration was used to refine prevalence estimates.

Results:

Only 1.85% of Veterans had diagnostic codes indicating self-harm events, while 10.46% had either coded or uncoded self-harm by PULSNAR estimation, which, after post-hoc calibration based on chart review, was adjusted to 7.91%. Of the 97 chart-reviewed patients, 39 had documented but uncoded self-harm. PULSNAR estimates were post-hoc calibrated such that their sum over the 97 cases equaled 39. When applied to the 1.3M Veterans, PULSNAR suggests that coded self-harm represents only 23.4% of all documented (coded + notes) self-harm.

Conclusions:

PU learning under the selected not at random assumption can effectively identify uncoded self-harm, offering a scalable alternative to time-consuming chart reviews for detecting undetected mental illness diagnoses. This approach can enhance mental health prevalence estimation and support screening and early diagnosis, intervention, and research to improve outcomes.

Citation

Please cite as:

Kumar P, Viszolay AD, Upadhayaya R, Moomtaheen F, Greer DR, Bologa CG, Schneider KA, Davis SE, Matheny ME, van der Goes D, Villarreal G, Zhu Y, Tohen M, Malec SA, Yang JJ, Fielstein EM, Lambert CG

Detecting Uncoded Self-Harm in Veterans’ Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Cohort Study

J Med Internet Res 2026;28:e89071

DOI: 10.2196/89071

PMID: 42241701

PMCID: 13235979

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 5, 2025

Date Accepted: Apr 19, 2026

Date Submitted to PubMed: Apr 19, 2026

Detecting Uncoded Self-Harm in Veterans’ Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Observational Study

ABSTRACT

Citation

Copyright