JMIR Preprints #9455: Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources

Current Preprint Settings

(as selected by the authors)

1. Allow access to the preprint PDF upon submission to:

(a) Open peer-review purposes
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

2. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

3. When a final paper is published in a JMIR journal, display the preprint as follows:

(a) Allow download
(b) Show abstract only
(c) Do not display anything

4. If the paper is rejected from JMIR journals, display the preprint to:

(a) Logged-in users only
(b) Anybody, anytime
(c) Nobody

Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources

Yingxiang Huang;
Junghye Lee;
Shuang Wang;
Jimeng Sun;
Hongfang Liu;
Xiaoqian Jiang

Background:

Data sharing has been a big challenge in biomedical informatics because of privacy concerns. Contextual embedding models have demonstrated a very strong representative capability to describe medical concepts (and their context), and they have shown promise as an alternative way to support deep-learning applications without the need to disclose original data. However, contextual embedding models acquired from individual hospitals cannot be directly combined because their embedding spaces are different, and naive pooling renders combined embeddings useless.

Objective:

The aim of this study was to present a novel approach to address these issues and to promote sharing representation without sharing data. Without sacrificing privacy, we also aimed to build a global model from representations learned from local private data and synchronize information from multiple sources.

Methods:

We propose a methodology that harmonizes different local contextual embeddings into a global model. We used Word2Vec to generate contextual embeddings from each source and Procrustes to fuse different vector models into one common space by using a list of corresponding pairs as anchor points. We performed prediction analysis with harmonized embeddings.

Results:

We used sequential medical events extracted from the Medical Information Mart for Intensive Care III database to evaluate the proposed methodology in predicting the next likely diagnosis of a new patient using either structured data or unstructured data. Under different experimental scenarios, we confirmed that the global model built from harmonized local models achieves a more accurate prediction than local models and global models built from naive pooling.

Conclusions:

Such aggregation of local models using our unique harmonization can serve as the proxy for a global model, combining information from a wide range of institutions and information sources. It allows information unique to a certain hospital to become available to other sites, increasing the fluidity of information flow in health care.

Citation

Please cite as:

Huang Y, Lee J, Wang S, Sun J, Liu H, Jiang X

Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources

JMIR Med Inform 2018;6(2):e33

DOI: 10.2196/medinform.9455

PMID: 29769172

PMCID: 5981054

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 20, 2017

Open Peer Review Period: Nov 21, 2017 - Jan 4, 2018

Date Accepted: Mar 9, 2018

(closed for review but you can still tweet)

Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources

Citation

Copyright

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 20, 2017

Open Peer Review Period: Nov 21, 2017 - Jan 4, 2018

Date Accepted: Mar 9, 2018

(closed for review but you can still tweet)

Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources

Citation

Per the author's request the PDF is not available.

Copyright