Accepted for/Published in: JMIR AI
Date Submitted: Mar 11, 2024
Open Peer Review Period: Mar 14, 2024 - May 14, 2024
Date Accepted: Oct 1, 2024
(closed for review but you can still tweet)
Ensuring appropriate representation in AI-generated medical imagery: A Methodological Approach to Address Skin Tone Bias
ABSTRACT
Background:
In medical education, particularly in anatomy and dermatology, generative artificial intelligence (AI) can be used to create customized illustrations. However, the underrepresentation of darker skin tones in medical textbooks and elsewhere, which serve as training data for AI, poses a significant challenge in ensuring diverse and inclusive educational materials.
Objective:
This study aims to evaluate the extent of skin tone diversity in AI-generated medical images and to test whether the representation of skin tones can be improved by modifying AI prompts to better reflect the demographic makeup of the US population.
Methods:
Two standard AI models (Dall-E and Midjourney) each generated 100 images of people with psoriasis. Additionally, a custom model was developed which incorporated a prompt injection aimed at “forcing” the AI (Dall-E 3) to reflect the skin tone distribution of the US population according to the 2012 American National Election Survey. This custom model generated another set of 100 images. The skin tones in these images were assessed by three researchers using the New Immigrant Survey skin tone scale, with the median value representing each image. A Chi-Square Goodness of Fit analysis compared the skin tone distributions from each set of images to that of the US population.
Results:
The standard AI models (Dalle-3 and Midjourney) demonstrated a significant difference between the expected skin tones of the US population and the observed tones in the generated images (P=8.62E-11 and P=1.12E-21 respectively). Both standard AI models over-represented lighter skin. Conversely, the custom model with the modified prompt yielded a distribution of skin tones that closely matched the expected demographic representation, showing no significant difference (P=0.0435).
Conclusions:
This study reveals a notable bias in AI-generated medical images, predominantly underrepresenting darker skin tones. This bias can be effectively addressed by modifying AI prompts to incorporate real-life demographic distributions. The findings emphasize the need for conscious efforts in AI development to ensure diverse and representative outputs, particularly in educational and medical contexts. Users of generative AI tools should be aware that these biases exist, and that similar tendencies may also exist in other types of generative AI (e.g. large language models) and in other characteristics (e.g. sex/gender, culture/ethnicity). Injecting demographic data into AI prompts can effectively counteract these biases, ensuring a more accurate representation of the general population.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.