Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jun 21, 2021
Open Peer Review Period: Jun 21, 2021 - Jul 2, 2021
Date Accepted: Sep 30, 2021
Date Submitted to PubMed: Nov 29, 2021
(closed for review but you can still tweet)
Assessing the Value of Unsupervised Clustering in Detecting Key Classes of Diagnostic and Medication Codes to Improve the Prediction of Persistent High Healthcare Utilizers
ABSTRACT
Background:
A high proportion of healthcare services are persistently utilized by a small subpopulation of patients. To improve clinical outcomes while reducing cost and utilization, population health management programs often provide targeted interventions to patients who may become persistent high users/utilizers (PHUs). Enhanced prediction and management of PHUs can improve healthcare system efficiencies and improve the overall quality of patient care.
Objective:
To detect key classes of diseases and medications among the study population; and, to assess the predictive value of these classes in identifying PHUs.
Methods:
This study is a retrospective analysis of insurance claims data of patients from the Johns Hopkins Health Care system. We defined a PHU as a patient incurring healthcare costs in the top 20% of all patients’ costs for four consecutive 6-month periods. We used 2013 claims data to predict PHU status in 2014-2015. We applied Latent Class Analysis (LCA), an unsupervised clustering approach, to identify patient subgroups with similar diagnostic and medication patterns to differentiate variations in healthcare utilization across PHUs. Logistic regression models were then built to predict PHUs in the full population and in select subpopulations. Predictors included LCA membership probabilities, demographic, and health utilization covariates. Predictive powers of regression models were assessed and compared using standard metrics.
Results:
We identified 164,221 patients with continuous enrollment between 2013 and 2015. The mean study population age was 19.7 years, 55.9% were female, 3.3% had ≥1 hospitalization, and 19.1% had 10+ outpatient visits in 2013. A total of 8359 (5.1%) patients were identified as PHUs in both 2014 and 2015. The LCA performed optimally when assigning patients to four probability disease/medication classes. Given the feedback provided by clinical experts, we further divided the population into four diagnostic groups for sensitivity analysis: Acute Upper Respiratory Infection (URI) (n=53,232; 4.6% PHUs), Mental Health (n = 34,456; 12.8% PHUs), Otitis Media (n=24,992; 4.5% PHUs), and Musculoskeletal (n=24,799; 15.5% PHUs). For the regression models predicting PHUs in the full population, the F1-score classification metric was lower using a parsimonious model which included LCA categories (F1=38.62%) compared to a complex risk stratification model with a full set of predictors (F1=48.20%). However, the LCA-enabled simple models were comparable to the complex model when predicting PHUs in the Mental Health and Musculoskeletal subpopulations (F1-scores of 48.69% and 48.15%, respectively). F1-scores were lower than the complex model when the LCA-enabled models were limited to Otitis Media and Acute URI subpopulations (45.77% and 43.05%).
Conclusions:
Our study illustrates the value of LCA in identifying subgroups of patients with similar patterns of diagnoses and medications. Our results show that LCA-derived classes can simplify predictive models of PHUs without compromising predictive accuracy. Future studies should investigate the value of LCA-derived classes for predicting PHUs in other healthcare settings.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.