Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training vs Performance Study
ABSTRACT
Background:
Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from non-expert participants has been used for numerous medical applications, including image labeling and segmentation tasks.
Objective:
To assess the ability of crowds of non-experts to annotate photos of cGVHD affected skin and study the effect of training and feedback on their performance.
Methods:
A total of 360 photographs of the skin of 36 cGVHD patients were taken using the Canfield Vectra H1 3D camera. Ground truth demarcations were provided by a single trained annotator in 3D and reviewed by a board-certified dermatologist. 3000 2D images (projections from various angles) were created for demarcation. Crowdsourcing was conducted through the "DiagnosUs" mobile app, and raters were split into high and low feedback groups.
Results:
Crowds of non-expert raters achieved good overall performance for segmenting cGVHD-affected skin with minimal training, with a median surface area error less than 12% for all crowds in both high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters for each image was able to recover performance similar to the high feedback crowd. No significant learning was observed during the task as more photos and feedback were seen.
Conclusions:
Crowds of non-expert raters can be used to annotate cGVHD images with good overall performance. Tracking the top-5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.