Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Mar 31, 2026
Open Peer Review Period: Apr 1, 2026 - May 27, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Sensitive Topic Detection and Longitudinal Prevalence Tracking in Psychiatric Discharge Summaries: A Retrospective NLP Study

  • Akash Raghavendra; 
  • Leslie Miller; 
  • Aya Zirikly

ABSTRACT

Background:

There is a considerable risk of stigmatization and harm when sensitive topics such as psychiatric diagnoses, substance abuse, and self-harm are recorded in electronic health records, especially since federal laws like the 21st Century Cures Act now require patients to have access to their own clinical notes. Discharge summaries are particularly problematic since they combine all hospital experiences and perform concurrent administrative, legal, and patient-facing tasks, although they are still not well researched in sensitive topic studies. There is a major methodological vacuum in understanding how the documentation of certain sensitive topics changes over time because current NLP techniques have concentrated on single-topic detection tasks with little attention to prevalence measurement or longitudinal documentation patterns.

Objective:

To evaluate an NLP based framework for the automated detection of a predefined set of sensitive topics and measuring their change in prevalence over time using the MIMIC-IV database.

Methods:

Discharge summaries from the MIMIC-IV database were filtered using ICD codes to identify psychiatrically relevant admissions, resulting in 2670 notes from 2108 distinct patients. Both explicit keyword-based and implicit semantic mentions of each sensitive subject category were found using a dual NLP detection framework. Normalized mention counts were used to compute weighted prevalence, which was then examined at the note and patient levels. Fisher’s exact test, McNemar’s exact test, and Benjamin Hochberg FDR correction were used to evaluate temporal change.

Results:

The weighted prevalence varied across sensitive categories, ranging from 2.68% to 18.93%. Patient level prevalence was consistently higher than note level prevalence across all categories. No statistical significance was observed after FDR correction, suggesting stable documentation over the study period. Improving, worsening, and absent trajectories remained relatively steady independent of overall category predominance, but persistent and mixed documentation patterns showed the most variance among categories, according to trajectory analysis.

Conclusions:

This study demonstrates the feasibility of a scalable NLP framework for multi-topic sensitive topic detection and prevalence tracking in clinical discharge summaries. The findings highlight the value of combining note-level and patient-level analyzes and provide a foundation for future work examining the impact of policy changes on sensitive topic documentation practices.


 Citation

Please cite as:

Raghavendra A, Miller L, Zirikly A

Sensitive Topic Detection and Longitudinal Prevalence Tracking in Psychiatric Discharge Summaries: A Retrospective NLP Study

JMIR Preprints. 31/03/2026:96759

DOI: 10.2196/preprints.96759

URL: https://preprints.jmir.org/preprint/96759

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.