Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 31, 2024
Date Accepted: Nov 7, 2024
Combining a risk factor score designed from electronic health records with a digital cytology image scoring system to improve bladder-cancer detection
ABSTRACT
Background:
To reduce the mortality induced by bladder-cancer, efforts need to be concentrated on early detection of the disease for more effective therapeutic intervention. Strong risk factors have been identified (e.g., smoking status, age, professional exposure…) and some diagnostic tools (e.g., by the mean of cystoscopy) were proposed. However, to date, no full-satisfactory (non-invasive, inexpensive, high performance) solution for widespread deployment has yet been proposed. Some new models based on cytology images classification have been recently developed and bring good perspectives but there are still avenues to explore to improve their performance.
Objective:
Our team aimed to evaluate the benefit of combining massive clinical data reuse to build a risk factor model and a digital cytology image-based model for bladder cancer detection
Methods:
First step relied on the designing of a predictive model based on clinical data (i.e., risk factors identified in the literature) extracted from the Clinical Data Warehouse of the Rennes Hospital and machine learning algorithms (Logistic Regression, Random Forest and Support Vector Machine). It provides a score corresponding to the risk of developing bladder cancer based on patient clinical profile. Secondly, we investigated three strategies (Logistic Regression, Decision Tree and a Custom proposal based on scores interpretation) to combine its score with the ones of a image-based model to produce a robust bladder-cancer scoring.
Results:
Two datasets were collected. The first one, including clinical data of 5422 patients extracted from the Clinical Data Warehouse was used to design the risk factor-based model. The second one was used for measuring the models' performances and was composed of 651 patients from a clinical trial for which cytology images were collected along with clinico-biological features. On this second dataset, the combination of both models obtains an AUC of 0.81 on train and 0.83 on test sets, demonstrating the interest of combining risk factor-based and image-based models. We have seen that it offers a higher associated risk of cancer than VisioCyt for all classes, especially for low-grade bladder cancer.
Conclusions:
These results demonstrate the value of combining clinical and biological information, especially to improve detection of low-grade bladder cancer patients. Some improvements will need to be made to the automatic extraction of clinical features to make the risk factor-based model more robust. However, as of now, they support the assumption that this type of approach will be of benefit to patients.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.