Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 4, 2022
Date Accepted: May 10, 2022

The final, peer-reviewed published version of this preprint can be found here:

Prevalence of Sensitive Terms in Clinical Notes Using Natural Language Processing Techniques: Observational Study

Lee J, Yang S, Holland-Hall C, Sezgin E, Gill M, Linwood S, Huang Y, Hoffman J

Prevalence of Sensitive Terms in Clinical Notes Using Natural Language Processing Techniques: Observational Study

JMIR Med Inform 2022;10(6):e38482

DOI: 10.2196/38482

PMID: 35687381

PMCID: 9233261

Prevalence of Sensitive Terms in Clinical Notes: observational study using natural language processing techniques

  • Jennifer Lee; 
  • Samuel Yang; 
  • Cynthia Holland-Hall; 
  • Emre Sezgin; 
  • Manjot Gill; 
  • Simon Linwood; 
  • Yungui Huang; 
  • Jeffrey Hoffman

ABSTRACT

Background:

With increased sharing of electronic health information as required by the 21st Century Cures Act, there is increased risk of breaching patient or parent/guardian confidentiality. The prevalence of sensitive terms in clinical notes is not known.

Objective:

The aim of this study is to define sensitive terms that represent documentation of content that may be private and determine prevalence and characteristics of provider notes that contain sensitive terms.

Methods:

Using keyword expansion, we defined a list of 781 sensitive terms. We searched all provider history and physical, progress, consult, and discharge summary notes for patients age 0-21 years written between January 1, 2019 to December 31, 2019 for direct string match of sensitive terms. We calculated prevalence of notes with sensitive terms and characterized clinical encounters and patient characteristics.

Results:

Sensitive terms were present in notes from every clinical context in all pediatric ages. Terms related to mental health category were most used overall (19.5%), but terms related to substance abuse and reproductive health were most common in patients age 0-3 years. History and physical notes (57.1%) and ambulatory progress notes (47.1%) were most likely to include sensitive terms. The highest prevalence of notes with sensitive terms was found in pain management (85.4%) and child abuse (85.2%) clinics.

Conclusions:

Notes containing sensitive terms are not limited to adolescent patients, specific note types, or certain specialties. Recognition of sensitive term(s) across all ages and clinical settings complicates efforts to protect patient and caregiver privacy in the era of information blocking regulations.


 Citation

Please cite as:

Lee J, Yang S, Holland-Hall C, Sezgin E, Gill M, Linwood S, Huang Y, Hoffman J

Prevalence of Sensitive Terms in Clinical Notes Using Natural Language Processing Techniques: Observational Study

JMIR Med Inform 2022;10(6):e38482

DOI: 10.2196/38482

PMID: 35687381

PMCID: 9233261

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.