Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 15, 2021
Date Accepted: Nov 14, 2021

The final, peer-reviewed published version of this preprint can be found here:

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study

Kumar S, Tan Li Shi AN, Mariappan R, Rajagopal A, Rajan V

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study

JMIR Med Inform 2022;10(1):e28842

DOI: 10.2196/28842

PMID: 35049514

PMCID: 8814927

Patient Representation Learning from Heterogeneous Data Sources and Knowledge Graphs using Deep Collective Matrix Factorization: Evaluation Study

  • Sajit Kumar; 
  • Alicia Nanelia Tan Li Shi; 
  • Ragunathan Mariappan; 
  • Adithya Rajagopal; 
  • Vaibhav Rajan

ABSTRACT

Background:

Patient Representation Learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images or graphs. Most previous techniques have used neural network based autoencoders to learn patient representations, primarily from clinical notes in Electronic Medical Records (EMR). Knowledge Graphs (KG), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature, and provide complementary information to EMR data that have been found to provide valuable predictive signals.

Objective:

We evaluate the efficacy of Collective Matrix Factorization (CMF) - both classical variants and a recent neural architecture called Deep CMF (DCMF) - in integrating heterogeneous data sources from EMR and KG to obtain patient representations for Clinical Decision Support Tasks.

Methods:

Using a recent formulation of obtaining graph representations through matrix factorization, within the context of CMF, we infuse auxiliary information during patient representation learning. We also extend the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predict. We compare the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluate patient representation learning using CMF-based methods and autoencoders for two clinical decision support tasks on a large EMR dataset.

Results:

Our experiments show that DCMF provides a seamless way to integrate multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable to that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous non-neural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance.

Conclusions:

Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources, and to combine information from EMR data and Knowledge Graphs. Further, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.


 Citation

Please cite as:

Kumar S, Tan Li Shi AN, Mariappan R, Rajagopal A, Rajan V

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study

JMIR Med Inform 2022;10(1):e28842

DOI: 10.2196/28842

PMID: 35049514

PMCID: 8814927

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.