Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jun 11, 2024
Date Accepted: Mar 17, 2025

The final, peer-reviewed published version of this preprint can be found here:

Natural Language Processing for Identification of Hospitalized People Who Use Drugs: Cohort Study

Sato T, Grussing ED, Patel R, Ridgway J, Suzuki J, Sweigart B, Miller R, Wurcel AG

Natural Language Processing for Identification of Hospitalized People Who Use Drugs: Cohort Study

JMIR AI 2025;4:e63147

DOI: 10.2196/63147

PMID: 40680182

PMCID: 12294639

Natural Language Processing For Identification of Hospitalized People Who Use Drugs

  • Taisuke Sato; 
  • Emily D Grussing; 
  • Ruchi Patel; 
  • Jessica Ridgway; 
  • Joji Suzuki; 
  • Benjamin Sweigart; 
  • Robert Miller; 
  • Alysse G Wurcel

ABSTRACT

Background:

People Who Use Drugs (PWUD) are at heightened risk for severe injection-related infections. Current clinical practices and research mostly rely on biomarkers, medication records, ICD codes, and self-screening forms for patients to identify PWUD; the combination of these tools still often fails to identify hospitalized SUD patients, missing crucial intervention opportunities for Serious Injection Related Infections (SIRI).

Objective:

This study explores using Natural Language Processing (NLP) to enhance the equitable and comprehensive identification of PWUD in electronic medical records (EMR).

Methods:

We retrospectively compiled a cohort of hospitalizations that involved PWUD at Tufts Medical Center (2020-2022). Criteria for entering the cohort included ICD10 codes for SUD, positive drug toxicology, SUD treatment prescriptions, and specific NLP keywords. We conducted human review of clinical notes in Electronic Health Records (EHR) to calculate the positive and negative predictive value of two subcohorts: admissions associated with a diagnosis code of substance use disorder only (D-only) and admissions associated with NLP identification of drug use only (N-only). We also conducted a regression analysis to evaluate the impact of race, ethnicity, and Social Vulnerability Index (SVI) on the outcomes of highly documented drug use versus drug use only documented with NLP.

Results:

The study identified 4548 hospitalizations with broad heterogeneity in how people entered the cohort and subcohorts. 288 hospitalizations entered the cohort through NLP presence alone. NLP demonstrated a 54% positive predictive value (PPV), outperforming biomarkers, medication records, and ICD codes in identifying hospitalizations of PWUD. Additionally, NLP significantly enhanced these methods when integrated into the identification algorithm. The study also found that people from racially and ethnically minoritized communities and lower socioeconomic patients were significantly more likely to have SUD not documented in EMRs.

Conclusions:

NLP proved effective in identifying hospitalizations of PWUD, surpassing traditional methods. While further refinement is needed, NLP shows a promising capability in minimizing healthcare disparities, particularly in infectious disease care for SUD patients, highlighting a crucial step towards more equitable healthcare.


 Citation

Please cite as:

Sato T, Grussing ED, Patel R, Ridgway J, Suzuki J, Sweigart B, Miller R, Wurcel AG

Natural Language Processing for Identification of Hospitalized People Who Use Drugs: Cohort Study

JMIR AI 2025;4:e63147

DOI: 10.2196/63147

PMID: 40680182

PMCID: 12294639

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.