JMIR Preprints #25457: Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries using Natural Language Processing

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries using Natural Language Processing

Marta Fernandes;
Haoqi Sun;
Aayushee Jain;
Haitham S. Alabsi;
Laura N. Brenner;
Elissa Ye;
Wendong Ge;
Sarah I. Collens;
Michael Leone;
Sudeshna Das;
Gregory K. Robbins;
Shibani S. Mukerji;
M. Brandon Westover

ABSTRACT

Background:

Medical notes are a rich source of patient data, however the nature of unstructured text has largely precluded using these data in large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at two large hospitals for patients hospitalized with COVID-19 infections.

Objective:

Our study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled inpatient nursing facility (SNIF) and death) of patients hospitalized with COVID-19 based on hospital discharge summaries notes.

Methods:

Text mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women’s Hospital) between March 10, 2020, and June 30, 2020. The data was divided into 70% for training and 30% for a hold-out test set. Discharge summaries were represented as bags-of-words consisting of single words (1-grams), 2-grams and 3-grams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further using LASSO regularization while training a multiclass logistic regression model. Model performance was evaluated in the hold-out test set.

Results:

The study cohort comprised 1737 adult patients (median [SD] age, 61[18] years old; 55% men; 45% White and 16% Black; 14% non-survivors; 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams and trigrams. The top features contributing most to the classification by the model (for each outcome) were: ‘appointments specialty', ‘home health’ and ‘home care' (home), 'intubate’, and ‘ARDS’ (inpatient rehabilitation), ‘service’ (SNIF), ‘brief assessment' and ‘covid' (death). The model achieved micro average area under the receiver operating characteristic and average precision in the testing set of 0.98 (95% CI 0.97-0.98) and 0.81 (95% CI 0.75-0.84), respectively, for prediction of discharge disposition.

Conclusions:

A supervised learning-based NLP approach is able to classify discharge disposition of patients hospitalized with COVID-19 infection. This approach has the potential to accelerate and increase the scale of research on patients’ discharge disposition that is possible with EHR data. Clinical Trial: Not clinical trial.

Citation

Please cite as:

Fernandes M, Sun H, Jain A, Alabsi HS, Brenner LN, Ye E, Ge W, Collens SI, Leone M, Das S, Robbins GK, Mukerji SS, Westover MB

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing

JMIR Med Inform 2021;9(2):e25457

DOI: 10.2196/25457

PMID: 33449908

PMCID: 7879729

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 2, 2020

Date Accepted: Dec 12, 2020

Date Submitted to PubMed: Jan 15, 2021

Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries using Natural Language Processing

ABSTRACT

Citation

Copyright