Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jun 21, 2022
Date Accepted: Oct 25, 2022
Generation of lung cancer health factor distribution using EMR patient graph
ABSTRACT
Background:
Electronic medical records (EMR) of lung cancer patients capture a variety of health factors. Understanding the distribution of these factors will help identify key factors for risk prediction in preventive screening.
Objective:
We aimed to create an integrated biomedical graph from EMR data and UMLS ontology for lung cancer, and generate a lung cancer health factor distribution from a hospital EMR of about 1M patients.
Methods:
The data were collected from the same number (1397) of patients with or without lung cancer. Then the patient-centered health factor graph was built with 108K standardized data A Neo4j graph database was built to integrate the graphs of patient health factors and the UMLS ontology. With the patient graph, we calculated the connection delta ratio (CDR) was proposed and calculated for each of the health factors to measure the relative strength of the factor’s relationship to lung cancer.
Results:
The patient graph had 93K relations between the 2794 patient nodes and 650 factor nodes. A lung cancer graph with 187 related biomedical concepts and 188 horizontal biomedical relations was created and linked to the patient graph. Searching the integrated biomedical graph with any number or category of health factors resulted in graph presentations of relations among patients and factors. Searching any patient could present patient’s health factors from the EMR and the lung cancer knowledge graph from UMLS in a same graph. Sorting the factors by CDR in descending order generated a distribution of health factors for lung cancer. Top 70 CDR-ranked factors of disease, symptom, medical history, observation and lab test categories were verified in the literatures.
Conclusions:
By collecting standardized data of thousands of patients with or without lung cancer from the EMR, it was possible to build a hospital-wide patient-centered health factor graph for graph search and presentation. The patient graph could be integrated with the UMLS knowledge graph for lung cancer and thus enable hospitals to bring continuously updated international standard biomedical knowledge graph from UML to hospitals’ clinical care. CDR analysis of the lung cancer patient graph was able to generate a CDR-sorted distribution of health factors, in which the top CDR-ranked health factors were in good agreement with reports in the literatures. The resulted distribution of lung cancer health factors might be used to help personalize risk evaluation and preventive screening recommendations.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.