Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Nov 8, 2022
Date Accepted: Nov 22, 2023
Performance test of the well-trained model for meningioma segmentation in healthcare center: secondary analysis based on four retrospective, multicenter datasets.
ABSTRACT
Background:
Convolutional neural network (CNN) has produced state-of-the-art results in meningioma segmentation on magnetic resonance imaging (MRI). Whereas, images obtained from different institutions, protocols, or scanners may show significant domain shift, leading to performance degradation and challenging model deployment in real clinical scenario.
Objective:
Unsupervised domain adaptation can provide a viable solution, but its realistic performance has not been verified and compared with the supervised-trained model in large, multi-center data yet.
Methods:
A total of 2,039 patients from three institutions were retrospectively included in this study. Manual segmentations were obtained by neuro-radiologists in a consensus reading and set as the ground truth. Two types of MRIs, including magnetization-prepared rapid gradient-echo (MPR-AGE) and fat-suppressed fast or turbo spin echo (FSE/TSE) images, were involved in our research. First, by using the golden standard network in semantic segmentation, called Deeplab V3+, we trained a segmentation model for MPR-AGEs (model 1), and tested its performance on FSE/TSEs. Then, by the previously proposed unsupervised adversarial domain adaptation method, model 2 was generated and tested for FSE/TSEs. Finally, another supervised CNN model (model 3) was trained for FSE/TSEs to compare its performance with model 2.
Results:
Model 1 showed state-of-art performance in MPR-AGE images with Dice ratio=0.912, and also represented robustness in the external test in MPR-AGE images with Dice ratio=0.879. Whereas, its performance significantly degraded to Dice ratio=0.238 when tested on FSE/TSEs. With unsupervised adversarial domain adaptation method, model 2 showed significant improvement with performance of Dice ratio=0.847 in validation group, and 0.856 in the external test group. Whereas it could not outperform the supervised model 3, whose performance was Dice ratio=0.908 in validation group, and 0.874 in the external test group.
Conclusions:
Unsupervised domain adaptation can achieve satisfactory improvement for dataset lacking of ground-truth label. Whereas, the choice of this method or supervised training should consider the balance among clinical needs, model performance, and data size.
Citation

Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.