JMIR Preprints #29807: Development of patient level cancer prediction models from a nationwide patient cohort: Model development and validation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Development of patient level cancer prediction models from a nationwide patient cohort: Model development and validation

Eunsaem Lee;
Seyoung Jung;
Hyung Ju Hwang;
Jaewoo Jung

ABSTRACT

Background:

Nationwide population-based cohorts provide a new opportunity to build automated risk prediction models at patient level, as claim data is one of the useful resources to that end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient level prediction models should be developed

Objective:

We aimed at developing cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real world environments.

Methods:

As source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health check-up every two years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, as well as previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, and survival analysis, as well as one class embedding classifier methods to effectively analyze high dimension data based on deep learning-based anomaly detection. Performance was measured with area under the curve (AUROC), area under precision recall curve (AUPRC). We validated our models externally with a health check-up database from a tertiary hospital.

Results:

One class embedding classifier model received the highest AUROC scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749 and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast and cervical cancers respectively. For AURPC, light gradient boosting models has the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357 and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast and cervical cancers.

Conclusions:

Our results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The seven models have acceptable performances and explainability, which can be distributed easily in real world environments.

Citation

Please cite as:

Lee E, Jung S, Hwang HJ, Jung J

Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation

JMIR Med Inform 2021;9(8):e29807

DOI: 10.2196/29807

PMID: 34459743

PMCID: 8438609

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 21, 2021

Open Peer Review Period: Apr 21, 2021 - May 11, 2021

Date Accepted: Jul 26, 2021

(closed for review but you can still tweet)

Development of patient level cancer prediction models from a nationwide patient cohort: Model development and validation

ABSTRACT

Citation

Copyright