Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis
ABSTRACT
Background:
Disadvantaged cancer survivors and their caregivers (e.g., individuals with limited health literacy, racial and ethnic minorities facing language barriers) face a disproportionately increased risk of symptom burden from cancer and its treatments. Large language models (LLMs) offer researchers an opportunity to develop educational materials tailored to these populations.
Objective:
The purposes of this study were to: 1) evaluate the overall performance of LLMs in generating tailored educational content for disadvantaged cancer survivors and their caregivers; 2) compare the performances of three Generative Pre-trained Transformer (GPT) models (i.e., GPT-3.5 Turbo, GPT-4, GPT-4 Turbo); and 3) explore different prompts that can help LLMs generate better content.
Methods:
We selected 30 topics from national guidelines on cancer care and education. GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo were used to generate tailored content of up to 250 words at a 6th-grade reading level, with translations into Spanish and Chinese for each topic. Nine oncology experts evaluated the content based on pre-determined criteria: word limit, reading level, and quality assessment (i.e., clarity, accuracy, relevance, completeness, and comprehensibility). ANOVA or Chi-square analyses were employed to compare differences among the various GPT models and prompts.
Results:
Overall, LLMs showed excellent performance in tailoring educational content, with 74.2% (n=360) adhering to the specified word limit and achieving an average quality assessment score of 8.933 out of 10. However, LLMs showed moderate performance in reading level, with 41.1% of content failing to meet the 6th-grade reading level. LLMs demonstrated strong translation capabilities, achieving an accuracy of 88.9% for Spanish and 81.1% for Chinese translations. The more advanced GPT-4 family models showed better overall performance compared to GPT-3.5 Turbo. Prompting GPTs to produce bulleted-format content was likely to result in better educational materials compared to textual-format content.
Conclusions:
This study highlights the application of LLMs in cancer care and education while acknowledging their potential limitations. The findings can inform the development and implementation of interventions in cancer symptom management and supportive care, thereby advancing health equity.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.