Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 12, 2020
Date Accepted: Jan 31, 2021

The final, peer-reviewed published version of this preprint can be found here:

Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study

Ridgway J, Uvin A, Schmitt J, Oliwa T, Almirol E, Devlin S, Schneider J

Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study

JMIR Med Inform 2021;9(3):e23456

DOI: 10.2196/23456

PMID: 33688848

PMCID: 7991991

Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use among People Living with HIV

  • Jessica Ridgway; 
  • Arno Uvin; 
  • Jessica Schmitt; 
  • Tomasz Oliwa; 
  • Ellen Almirol; 
  • Samantha Devlin; 
  • John Schneider

ABSTRACT

Background:

Mental illness and substance use are prevalent among people living with HIV (PLWH) and often lead to poor health outcomes. Electronic medical record (EMR) data are increasingly being utilized for HIV-related clinical research, but mental illness and substance use are often under-documented in structured EMR fields. Natural language processing (NLP) of unstructured text of clinical notes in the EMR may more accurately identify mental illness and substance use among PLWH than structured EMR fields alone.

Objective:

To utilize NLP of clinical notes to detect mental illness and substance use among PLWH and to determine how often these factors are documented in structured EMR fields.

Methods:

We collected both structured EMR data (diagnosis codes, social history, Problem List) as well as the unstructured text of clinical HIV care notes for adult PLWH. We developed NLP algorithms to identify words and phrases associated with mental illness and substance use in the clinical notes. The algorithms were validated base on chart review. We compared numbers of patients with mental illness or substance use identified by structured EMR fields vs. those identified by the NLP algorithms.

Results:

The NLP algorithm for detecting mental illness had PPV of 98% and NPV of 98%. The NLP algorithm for detecting substance use had a PPV of 92% and NPV of 98%. The NLP algorithm for mental illness identified 54.0% (420/778) of patients as having documentation of mental illness in the text of clinical notes. Among the patients with mental illness detected by NLP, 58.6% (246/420) had documentation of mental illness in at least one structured EMR field. 63 patients had documentation of mental illness in structured EMR fields that was not detected by NLP of clinical notes. The NLP algorithm for substance use detected substance use in the text of clinical notes in 22.0% (141/778) of participants. Among patients with substance use detected by NLP, 73.8% (104/141) had documentation of substance use in at least one structured EMR field. 77 patients had documentation of substance use in structured EMR fields that was not detected by NLP of clinical notes.

Conclusions:

Among patients in an urban HIV care clinic, NLP of clinical notes identified high rates of mental illness and substance use that were often not documented in structured EMR fields. This finding has important implications for clinical and epidemiologic research among PLWH.


 Citation

Please cite as:

Ridgway J, Uvin A, Schmitt J, Oliwa T, Almirol E, Devlin S, Schneider J

Natural Language Processing of Clinical Notes to Identify Mental Illness and Substance Use Among People Living with HIV: Retrospective Cohort Study

JMIR Med Inform 2021;9(3):e23456

DOI: 10.2196/23456

PMID: 33688848

PMCID: 7991991

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.