Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 13, 2021
Date Accepted: Sep 18, 2021
Date Submitted to PubMed: Nov 29, 2021

The final, peer-reviewed published version of this preprint can be found here:

Developing the Total Health Profile, a Generalizable Unified Set of Multimorbidity Risk Scores Derived From Machine Learning for Broad Patient Populations: Retrospective Cohort Study

Mahajan A, Deonarine A, Bernal A, Lyons G, Patel C, Norgeot B

Developing the Total Health Profile, a Generalizable Unified Set of Multimorbidity Risk Scores Derived From Machine Learning for Broad Patient Populations: Retrospective Cohort Study

J Med Internet Res 2021;23(11):e32900

DOI: 10.2196/32900

PMID: 34842542

PMCID: 8665380

Total Health Profile: A Generalizable Unified Set of Multimorbidity Risk Scores Derived From Machine Learning For Broad Patient Populations

  • Abhishaike Mahajan; 
  • Andrew Deonarine; 
  • Axel Bernal; 
  • Genevieve Lyons; 
  • Chirag Patel; 
  • Beau Norgeot

ABSTRACT

Background:

Multimorbidity clinical risk scores allow clinicians to quickly assess their patients' health for decision making, often for recommendation to care management programs. However, these scores are limited by several issues: (1) existing multimorbidity scores are generally limited to one data group (e.g., diagnoses, labs, etc) and may be missing vital information, (2) are usually limited to specific demographic groups (e.g., age), and (3) do not formally provide any granularity in the form of more nuanced multimorbidity risk scores to direct clinician attention.

Objective:

Using diagnosis, lab, prescription, procedure, and demographic data from electronic health records (EHR) to develop a physiologically diverse and generalizable set of multimorbidity risk scores.

Methods:

Using EHR data from a nationwide cohort of patients, we developed the Total Health Profile (THP), a set of six integrated risk scores reflecting five distinct organ systems and overall health. We selected the occurrence of an inpatient-hospital visitation over a two-year follow-up window, attributable to specific organ-systems, as our risk endpoint. Using a physician-curated set of features, we trained six machine learning models on 794,294 patients to predict the calibrated probability of the aforementioned endpoint, producing risk scores for Heart, Lung, Neuro, Kidney, and Digestive function and a sixth score for combined risk. We evaluated the scores using a held-out test cohort of 198,574 patients.

Results:

Study patients closely matched national census averages with median age of 41, median income of $66,829, and racial averages by zip-code of 73.8% White, 5.9% Asian, and 11.9% African American. All models were well-calibrated and demonstrated strong performance with AUROCs of 0.83 for Total Health Score (THS), 0.89 for Heart, 0.86 for Lung, 0.85 for Neuro, 0.90 for Kidney, and 0.83 for Digestive. There was consistent performance of this scoring system across sexes, diverse patient ages, and zip-code income levels. Each model learned to generate predictions by focusing on appropriate clinically-relevant patient features, such as heart-related hospitalizations and chronic hypertension diagnosis for the heart model. The THS outperformed the other commonly-used multi-morbidity scoring systems, specifically the Charlson and Elixhauser comorbidity indexes overall (THS-0.823, CCI-0.735, ECI-0.649 AUROCs respectively) as well as for every age, sex, and income bracket. Performance improvements were most pronounced for middle-aged and lower income subgroups. Ablation tests using only diagnosis, prescription, social determinants of health, and lab feature groups, while retaining procedure-related features, showed that the combination of feature groups has the best predictive performance, though only marginally better than the diagnosis-only model on at-risk groups.

Conclusions:

Massive retrospective EHR datasets have made it possible to use machine learning to build practical multimorbidity risk scores that are highly predictive, personalizable, intuitive to explain, and generalize across diverse patient populations.


 Citation

Please cite as:

Mahajan A, Deonarine A, Bernal A, Lyons G, Patel C, Norgeot B

Developing the Total Health Profile, a Generalizable Unified Set of Multimorbidity Risk Scores Derived From Machine Learning for Broad Patient Populations: Retrospective Cohort Study

J Med Internet Res 2021;23(11):e32900

DOI: 10.2196/32900

PMID: 34842542

PMCID: 8665380

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.