Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 5, 2025
Date Accepted: Apr 19, 2026
Date Submitted to PubMed: Apr 19, 2026

The final, peer-reviewed published version of this preprint can be found here:

Detecting Uncoded Self-Harm in Veterans’ Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Cohort Study

Kumar P, Viszolay AD, Upadhayaya R, Moomtaheen F, Greer DR, Bologa CG, Schneider KA, Davis SE, Matheny ME, van der Goes D, Villarreal G, Zhu Y, Tohen M, Malec SA, Yang JJ, Fielstein EM, Lambert CG

Detecting Uncoded Self-Harm in Veterans’ Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Cohort Study

J Med Internet Res 2026;28:e89071

DOI: 10.2196/89071

PMID: 42241701

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Detecting Undiagnosed Mental Health Conditions Using Positive and Unlabeled Learning: Identifying Uncoded Self-Harm in Veterans’ Electronic Health Records

  • Praveen Kumar; 
  • Alexandria D. Viszolay; 
  • Rajesh Upadhayaya; 
  • Fariha Moomtaheen; 
  • Donald R. Greer; 
  • Cristian G. Bologa; 
  • Kristan A. Schneider; 
  • Sharon E. Davis; 
  • Michael E. Matheny; 
  • David van der Goes; 
  • Gerrardo Villarreal; 
  • Yiliang Zhu; 
  • Mauricio Tohen; 
  • Scott A. Malec; 
  • Jeremy J. Yang; 
  • Elliot M. Fielstein; 
  • Christophe Gerard Lambert

ABSTRACT

Background:

Suicide and self-harm remain major public health concerns in the United States. Early identification is critical for effective intervention, yet underdiagnosis and undercoding are common across mental health conditions, and only positive cases are typically labeled in healthcare data. As a result, reliable negative examples are missing. Positive and Unlabeled (PU) learning is well suited to such data, enabling estimation of phenotype prevalence and identification of undiagnosed individuals at elevated risk for self-harm as well as other mental illnesses.

Objective:

To identify U.S. Veterans whose self-harm events were not explicitly captured through diagnostic codes in electronic health records (EHRs) and estimate the prevalence of ever self-harm cases among Veterans using a novel PU learning algorithm applicable to undetected mental health diagnoses.

Methods:

We analyzed Veterans Health Administration EHRs for 1,329,120 Veterans with at least 2 years of observation. We applied our PULSNAR (Positive Unlabeled Learning Selected Not At Random) algorithm to estimate the proportion of individuals with uncoded self-harm. Four experts (raters) independently reviewed charts of 97 uncoded Veterans, each selected from 1% intervals of calibrated PULSNAR probabilities from 0.01 to 0.97. Agreement was assessed among raters, PULSNAR classifications, and consensus review decisions. Post-hoc calibration was used to refine prevalence estimates.

Results:

Only 1.85% of Veterans had diagnostic codes indicating self-harm events, while 10.46% had either coded or uncoded self-harm by PULSNAR estimation, which, after post-hoc calibration based on chart review, was adjusted to 7.91%. Of the 97 chart-reviewed patients, 39 had documented but uncoded self-harm. PULSNAR estimates were post-hoc calibrated such that their sum over the 97 cases equaled 39. When applied to the 1.3M Veterans, PULSNAR suggests that coded self-harm represents only 23.4% of all documented (coded + notes) self-harm.

Conclusions:

PU learning under the selected not at random assumption can effectively identify uncoded self-harm, offering a scalable alternative to time-consuming chart reviews for detecting undetected mental illness diagnoses. This approach can enhance mental health prevalence estimation and support screening and early diagnosis, intervention, and research to improve outcomes.


 Citation

Please cite as:

Kumar P, Viszolay AD, Upadhayaya R, Moomtaheen F, Greer DR, Bologa CG, Schneider KA, Davis SE, Matheny ME, van der Goes D, Villarreal G, Zhu Y, Tohen M, Malec SA, Yang JJ, Fielstein EM, Lambert CG

Detecting Uncoded Self-Harm in Veterans’ Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Cohort Study

J Med Internet Res 2026;28:e89071

DOI: 10.2196/89071

PMID: 42241701

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.