Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 10, 2018
Date Accepted: Mar 7, 2019
Date Submitted to PubMed: Mar 13, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods

Chen T, Dredze M, Weiner JP, Hernandez L, Kimura J, Kharrazi H

Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods

JMIR Med Inform 2019;7(1):e13039

DOI: 10.2196/13039

PMID: 30862607

PMCID: 6454337

Statistical Natural Language Processing Methods for the Extraction of Geriatric Syndromes from Electronic Health Record Clinical Notes

  • Tao Chen; 
  • Mark Dredze; 
  • Jonathan P Weiner; 
  • Leilani Hernandez; 
  • Joe Kimura; 
  • Hadi Kharrazi

ABSTRACT

Background:

Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records.

Objective:

We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective.

Methods:

We applied Conditional Random Fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features/attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare (over 65) patients enrolled in a "Medicare-Advantage" HMO and cared for by a large group practice in Massachusetts.

Results:

A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macro-F1=0.834, micro-F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, CRF worked well on constructs like absence of fecal control (F1=0.857) and vision impairment (F1=0.798), but poorly on malnutrition (F1=0.155), weight loss (F1=0.394) and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (out-of-vocabulary) and a lack of context.

Conclusions:

This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.


 Citation

Please cite as:

Chen T, Dredze M, Weiner JP, Hernandez L, Kimura J, Kharrazi H

Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods

JMIR Med Inform 2019;7(1):e13039

DOI: 10.2196/13039

PMID: 30862607

PMCID: 6454337

Per the author's request the PDF is not available.