Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Aug 12, 2020
Date Accepted: Jan 31, 2021
Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use among People Living with HIV
ABSTRACT
Background:
Mental illness and substance use are prevalent among people living with HIV (PLWH) and often lead to poor health outcomes. Electronic medical record (EMR) data are increasingly being utilized for HIV-related clinical research, but mental illness and substance use are often under-documented in structured EMR fields. Natural language processing (NLP) of unstructured text of clinical notes in the EMR may more accurately identify mental illness and substance use among PLWH than structured EMR fields alone.
Objective:
To utilize NLP of clinical notes to detect mental illness and substance use among PLWH and to determine how often these factors are documented in structured EMR fields.
Methods:
We collected both structured EMR data (diagnosis codes, social history, Problem List) as well as the unstructured text of clinical HIV care notes for adult PLWH. We developed NLP algorithms to identify words and phrases associated with mental illness and substance use in the clinical notes. The algorithms were validated base on chart review. We compared numbers of patients with mental illness or substance use identified by structured EMR fields vs. those identified by the NLP algorithms.
Results:
The NLP algorithm for detecting mental illness had PPV of 98% and NPV of 98%. The NLP algorithm for detecting substance use had a PPV of 92% and NPV of 98%. The NLP algorithm for mental illness identified 54.0% (420/778) of patients as having documentation of mental illness in the text of clinical notes. Among the patients with mental illness detected by NLP, 58.6% (246/420) had documentation of mental illness in at least one structured EMR field. 63 patients had documentation of mental illness in structured EMR fields that was not detected by NLP of clinical notes. The NLP algorithm for substance use detected substance use in the text of clinical notes in 22.0% (141/778) of participants. Among patients with substance use detected by NLP, 73.8% (104/141) had documentation of substance use in at least one structured EMR field. 77 patients had documentation of substance use in structured EMR fields that was not detected by NLP of clinical notes.
Conclusions:
Among patients in an urban HIV care clinic, NLP of clinical notes identified high rates of mental illness and substance use that were often not documented in structured EMR fields. This finding has important implications for clinical and epidemiologic research among PLWH.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.