Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Dec 31, 2018
Open Peer Review Period: Jan 2, 2019 - Jan 7, 2019
Date Accepted: Apr 23, 2019
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Prediction of the 1-Year Risk of Incident Lung Cancer: Prospective Study Using Electronic Health Records from the State of Maine
Background:
Lung cancer is the leading cause of cancer death worldwide. Early detection of individuals at risk of lung cancer is critical to reduce the mortality rate.
Objective:
The aim of this study was to develop and validate a prospective risk prediction model to identify patients at risk of new incident lung cancer within the next 1 year in the general population.
Methods:
Data from individual patient electronic health records (EHRs) were extracted from the Maine Health Information Exchange network. The study population consisted of patients with at least one EHR between April 1, 2016, and March 31, 2018, who had no history of lung cancer. A retrospective cohort (N=873,598) and a prospective cohort (N=836,659) were formed for model construction and validation. An Extreme Gradient Boosting (XGBoost) algorithm was adopted to build the model. It assigned a score to each individual to quantify the probability of a new incident lung cancer diagnosis from October 1, 2016, to September 31, 2017. The model was trained with the clinical profile in the retrospective cohort from the preceding 6 months and validated with the prospective cohort to predict the risk of incident lung cancer from April 1, 2017, to March 31, 2018.
Results:
The model had an area under the curve (AUC) of 0.881 (95% CI 0.873-0.889) in the prospective cohort. Two thresholds of 0.0045 and 0.01 were applied to the predictive scores to stratify the population into low-, medium-, and high-risk categories. The incidence of lung cancer in the high-risk category (579/53,922, 1.07%) was 7.7 times higher than that in the overall cohort (1167/836,659, 0.14%). Age, a history of pulmonary diseases and other chronic diseases, medications for mental disorders, and social disparities were found to be associated with new incident lung cancer.
Conclusions:
We retrospectively developed and prospectively validated an accurate risk prediction model of new incident lung cancer occurring in the next 1 year. Through statistical learning from the statewide EHR data in the preceding 6 months, our model was able to identify statewide high-risk patients, which will benefit the population health through establishment of preventive interventions or more intensive surveillance.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.