Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 17, 2020
Date Accepted: Jan 27, 2021

The final, peer-reviewed published version of this preprint can be found here:

Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study

Zong N, Ngo V, Stone DJ, Wen A, Zhao Y, Yu Y, Liu S, Huang M, Wang C, Jiang G

Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study

JMIR Med Inform 2021;9(5):e23586

DOI: 10.2196/23586

PMID: 34032581

PMCID: 8188315

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Leveraging Genetic Reports and Electronic Health Records for Predicting Primary Cancers Based on FHIR and RDF

  • Nansu Zong; 
  • Victoria Ngo; 
  • Daniel J. Stone; 
  • Andrew Wen; 
  • Yiqing Zhao; 
  • Yue Yu; 
  • Sijia Liu; 
  • Ming Huang; 
  • Chen Wang; 
  • Guoqian Jiang

ABSTRACT

Background:

Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnose, and treatments. A key research area focuses on early detection of primary cancers and the potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions.

Objective:

This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict unknown primaries.

Methods:

We extracted the genetic data elements from a collection of oncology genetic reports of 1,011 cancer patients, and corresponding phenotypical data from the Mayo Clinic electronic health records (EHRs). We modeled both genetic and EHR data with HL7 Fast Healthcare Interoperability Resources (FHIR). The semantic web Resource Description Framework (RDF) was employed to generate the network-based data representation (i.e., patient-phenotypic-genetic network). Based on RDF data graph, graph embedding algorithm Node2vec was applied to generate features, and then multiple machine learning and deep learning backbone models were adopted for cancer prediction.

Results:

With six machine-learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types and predicting unknown primaries. To demonstrate the interpretability, phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review.

Conclusions:

Accurate prediction of cancer types can be achieved with existing EHR data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnose stage for cancer patients.


 Citation

Please cite as:

Zong N, Ngo V, Stone DJ, Wen A, Zhao Y, Yu Y, Liu S, Huang M, Wang C, Jiang G

Leveraging Genetic Reports and Electronic Health Records for the Prediction of Primary Cancers: Algorithm Development and Validation Study

JMIR Med Inform 2021;9(5):e23586

DOI: 10.2196/23586

PMID: 34032581

PMCID: 8188315

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.