Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cancer

Date Submitted: Oct 24, 2024
Date Accepted: Feb 28, 2025

The final, peer-reviewed published version of this preprint can be found here:

Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis

Liu D, Hu X, Xiao C, Bai J, Barandouzi Z, Lee S, Webster C, Brock LU, Lee L, Bold D, Lin Y

Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis

JMIR Cancer 2025;11:e67914

DOI: 10.2196/67914

PMID: 40192716

PMCID: 11995809

Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis

  • Darren Liu; 
  • Xiao Hu; 
  • Canhua Xiao; 
  • Jinbing Bai; 
  • Zahra Barandouzi; 
  • Stephanie Lee; 
  • Caitlin Webster; 
  • La-Urshalar Brock; 
  • Lindsay Lee; 
  • Delgersuren Bold; 
  • Yufen Lin

ABSTRACT

Background:

Disadvantaged cancer survivors and their caregivers (e.g., individuals with limited health literacy, racial and ethnic minorities facing language barriers) face a disproportionately increased risk of symptom burden from cancer and its treatments. Large language models (LLMs) offer researchers an opportunity to develop educational materials tailored to these populations.

Objective:

The purposes of this study were to: 1) evaluate the overall performance of LLMs in generating tailored educational content for disadvantaged cancer survivors and their caregivers; 2) compare the performances of three Generative Pre-trained Transformer (GPT) models (i.e., GPT-3.5 Turbo, GPT-4, GPT-4 Turbo); and 3) explore different prompts that can help LLMs generate better content.

Methods:

We selected 30 topics from national guidelines on cancer care and education. GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo were used to generate tailored content of up to 250 words at a 6th-grade reading level, with translations into Spanish and Chinese for each topic. Nine oncology experts evaluated the content based on pre-determined criteria: word limit, reading level, and quality assessment (i.e., clarity, accuracy, relevance, completeness, and comprehensibility). ANOVA or Chi-square analyses were employed to compare differences among the various GPT models and prompts.

Results:

Overall, LLMs showed excellent performance in tailoring educational content, with 74.2% (n=360) adhering to the specified word limit and achieving an average quality assessment score of 8.933 out of 10. However, LLMs showed moderate performance in reading level, with 41.1% of content failing to meet the 6th-grade reading level. LLMs demonstrated strong translation capabilities, achieving an accuracy of 88.9% for Spanish and 81.1% for Chinese translations. The more advanced GPT-4 family models showed better overall performance compared to GPT-3.5 Turbo. Prompting GPTs to produce bulleted-format content was likely to result in better educational materials compared to textual-format content.

Conclusions:

This study highlights the application of LLMs in cancer care and education while acknowledging their potential limitations. The findings can inform the development and implementation of interventions in cancer symptom management and supportive care, thereby advancing health equity.


 Citation

Please cite as:

Liu D, Hu X, Xiao C, Bai J, Barandouzi Z, Lee S, Webster C, Brock LU, Lee L, Bold D, Lin Y

Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis

JMIR Cancer 2025;11:e67914

DOI: 10.2196/67914

PMID: 40192716

PMCID: 11995809

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.