Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluating AI Chatbots in Addressing the Unmet Survivorship Needs of Adolescents and Young Adults with Melanoma
ABSTRACT
Background:
Melanoma, a highly aggressive form of skin cancer, is the second most common type of cancer for adolescent and young adult (AYA, ages 15-39 years) patients. AYA melanoma patients may turn to internet sources, especially AI-chatbots, to manage uncertainty about prognosis and treatment.
Objective:
To evaluate the quality, empathy, and readability of responses generated by leading AI chatbots when addressing the top unmet needs of AYA melanoma patients receiving treatment.
Methods:
Our research team recently surveyed 152 AYA melanoma patients using the Needs Assessment Service Bridge (NA-SB), a validated instrument that assesses psychosocial needs for AYA cancer patients. The survey identified the top 5 needs for advanced AYA melanoma patients receiving treatment. Each need was reframed into a question and brief clinical history, then entered into each chatbot by five individuals who cleared their pre- and post-question history. Chatbot responses were evaluated to assess information quality (Global Quality Score (GQS) and DISCERN), accessibility and readability (GQS, Flesch Kincaid Grade Level, Flesch Reading Ease), and perceived empathy (Perceived Empathy of Technology (PETS)).
Results:
Across 75 chatbot responses, ChatGPT achieved the highest average quality (mean GQS 4.4, mean DISC 3.2) and empathy (PETS-ER 5.4, PETS-UT 6.4), though with greater variability (SD~1.8). Copilot produced the lowest quality and empathy scores, while Gemini responses were consistently midrange. PETS-UT exceeded PETS-ER across all models, suggesting stronger cognitive empathy than emotional responsiveness. Readability analysis showed outputs exceeded the average U.S. reading level (mean FKGL 11.8, FRE 38.6), limiting accessibility. The most readable responses were found in Question 2, which also scored higher in quality and empathy, whereas Question 4 and 5 produced the most complex, difficult to read responses corresponding with lower quality and empathy ratings.
Conclusions:
AI chatbots can provide moderately accurate and supportive responses to AYA melanoma patient needs, but outputs are inconsistent, written above the recommended reading level for health information, and limited in empathy. Question framing strongly influenced chatbot performance, with more emotional prompts drawing greater empathy, and readability aligning with both quality and empathy. Chatbot use in this population should remain adjunctive, with further research needed to standardize quality, improve readability, and enhance empathetic communication.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.