Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 30, 2024
Date Accepted: Apr 3, 2025

The final, peer-reviewed published version of this preprint can be found here:

AI in Home Care—Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study

Pérez-Esteve C, Guilabert M, Matarredona V, Srulovici e, Tella S, Strametz RS, Mira J

AI in Home Care—Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study

J Med Internet Res 2025;27:e70703

DOI: 10.2196/70703

PMID: 40294407

PMCID: 12070015

Artificial Intelligence in Home Care: Evaluation of Large Language Models for Future Training of Informal Caregivers

  • Clara Pérez-Esteve; 
  • Mercedes Guilabert; 
  • Valerie Matarredona; 
  • einav Srulovici; 
  • Susanna Tella; 
  • Reinhard Strametz Strametz; 
  • Jose Mira

ABSTRACT

Background:

The aging population presents an accomplishment for society but also poses significant challenges for governments, healthcare systems, and caregivers. Elevated rates of functional limitations among older adults, primarily caused by chronic conditions, necessitate adequate and safe care, including in home settings. Traditionally, informal caregiver training has relied on verbal and written instructions. However, the advent of digital resources has introduced videos and interactive platforms, offering more accessible and effective training. Large Language Models (LLMs) have emerged as potential tools for personalized information delivery. While LLMs exhibit the capacity to mimic clinical reasoning and support decision-making, their potential to serve as alternatives to evidence-based professional instructions remains unexplored

Objective:

This study aims to evaluate the appropriateness of home care instructions generated by LLMs (including GPTs) in comparison to a professional Gold Standard.

Methods:

An observational, comparative case study evaluated three LLMs -GPT-3.5, GPT-4.o, and Copilot- in ten home care scenarios. A rubric assessed the models against a reference standard (Gold Standard) created by healthcare professionals. Independent reviewers evaluated variables such as specificity, clarity, and self-efficacy. Statistical analyses compared LLMs performance to the Gold Standard to ensure consistency and validity.

Results:

The study revealed that while no LLM achieved the precision of the professional Gold Standard, GPT-4.o outperformed GPT-3.5 and Copilot in specificity (4.6 vs. 3.7 and 3.6), clarity (4.8 vs. 4.1 and 3.9), and self-efficacy (4.6 vs. 3.8 and 3.4). GPT-4.0 delivered detailed and comprehensible explanations, with fewer critical errors. However, limitations included a 60% omission rate for relevant details.

Conclusions:

LLMs, particularly GPT-4.o subscription-based, show potential as tools for training informal caregivers by providing tailored guidance and reducing errors. Although not yet surpassing professional instruction quality, these models offer a flexible and accessible alternative that could enhance home safety and care quality. Further research is necessary to address limitations and optimize their performance. Future implementation of LLMs may alleviate healthcare system burdens by reducing common caregiver errors.


 Citation

Please cite as:

Pérez-Esteve C, Guilabert M, Matarredona V, Srulovici e, Tella S, Strametz RS, Mira J

AI in Home Care—Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study

J Med Internet Res 2025;27:e70703

DOI: 10.2196/70703

PMID: 40294407

PMCID: 12070015

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.