Race/Ethnicity Stratified Analysis of an Artificial Intelligence Based Tool for Skin Condition Diagnosis by Primary Care Physicians and Nurse Practitioners
ABSTRACT
Background:
Many dermatologic cases are first evaluated by primary care physicians (PCPs) or nurse practitioners (NPs).
Objective:
To evaluate an artificial intelligence (AI)-based tool that assists with interpreting dermatologic conditions.
Methods:
We developed an AI-based tool and conducted a randomized multi-reader, multi-case study (20 PCPs, 20 NPs, 1047 retrospective teledermatology cases) to evaluate its utility. Cases were enriched and comprised 120 skin conditions. Readers were recruited to optimize for geographical diversity; the PCPs practiced across 12 states (2-32 years of experience, mean: 11.3) and the NPs practiced across 9 states (2-34 years of experience, mean: 13.1). To avoid memory effects from incomplete washout, each case was read once by each clinician: either with or without AI assistance, with the assignment randomized. The primary analyses evaluated the top-1 agreement, defined as the agreement rate of the clinicians’ primary diagnosis with the reference diagnoses provided by a panel of dermatologists (per case: 3 dermatologists from a pool of 12, practicing across 8 states, 5-13 years of experience [mean: 7.2]). We additionally conducted subgroup analyses stratified by cases’ self-reported race/ethnicity, and measured the performance spread: the maximum performance subtracted by the minimum across subgroups.
Results:
The AI’s standalone top-1 agreement was 63% and AI assistance was significantly associated with higher agreement with reference diagnoses (Figure). For PCPs, the increase in diagnostic agreement was 10% (p<0.001), from 48% to 58%; for NPs, the increase was 12% (p<0.001), from 46% to 58%. When stratified by cases’ self-reported race/ethnicity (Figure), the AI’s performance was 59-62% for Asian / Native Hawaiian / Pacific Islander, Other, and Hispanic / Latino, and 67% for both Black / African American and White subgroups. For the clinicians, AI-assistance associated improvements across subgroups were in the range of 8-12% for PCPs and 8-15% for NPs. The performance spread across subgroups was 5.3% unassisted vs. 6.6% assisted for PCPs, and 5.2% unassisted vs. 6.0% assisted for NPs. In both unassisted and AI-assisted modalities, and for both PCPs and NPs, the subgroup with the highest performance on average was Black / African Americans, though the differences with other subgroups were small and had overlapping confidence intervals.
Conclusions:
AI assistance was associated with significantly improved diagnostic agreement with dermatologists. Across race/ethnicity subgroups, for both PCPs and NPs, the effect of AI assistance remained high at 8-15% and the performance spread was similar at 5-7%.
Citation

Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.