Accepted for/Published in: JMIR Pediatrics and Parenting
Date Submitted: Sep 15, 2021
Date Accepted: Jan 25, 2022
Locating Youth Exposed to Parental Justice-Involvement in the Electronic Health Record: Development of a Natural Language Processing Model
ABSTRACT
Background:
Parental justice-involvement (e.g. prison, jail, parole, probation) is an unfortunately common and disruptive household adversity for many US youths, disproportionately affecting Black, brown, and rural families. Data on this adversity has not been captured routinely in pediatric health care settings, and if it is, it is not discrete nor able to be readily analyzed for purposes of research.
Objective:
In this study, we outline our process training a state-of-the-art natural language processing model using unstructured clinician notes of one large pediatric health system to identify patients who have experienced a justice-involved parent.
Methods:
Using the electronic health record database of a large Midwestern pediatric hospital-based institution from 2011-2019, we located clinician notes (of any type and written by any type of provider) that were likely to contain such evidence of family justice involvement via a justice-keyword search (e.g. prison, jail). To train and validate the model, we used a labeled dataset of 7,500 clinician notes identifying whether the patient was ever exposed to parental justice-involvement. We calculated the precision and recall of the model and compared those rates to the keyword search.
Results:
The development of the machine learning model increased the precision (positive predictive value) of locating children affected by parental justice-involvement in the electronic health record from 61% (a simple keyword search) to 92%.
Conclusions:
The use of machine learning may be a feasible approach to addressing the gaps in our understanding of the health and health services of underrepresented youth who encounter childhood adversities not routinely captured – particularly for children of justice-involved parents. Clinical Trial: The project described in this study was supported by Award Number UL1TR002733 from the National Center For Advancing Translational Sciences (co-PIs: Dr. Samantha Boch, PhD, RN; Dr. Deena Chisolm, PhD). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Center For Advancing Translational Sciences or the National Institutes of Health. The search engine described in this study is supported through a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-2017C1-6413) under the name of “Unlocking Clinical Text in EMR by Query Refinement Using Both Knowledge Bases and Word Embedding” (PI Dr. Simon Lin, MD, MBA) All statements in this report, including its findings and conclusions are solely those of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute (PCORI), its Board of Governors or Methodology Committee.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.