Accepted for/Published in: JMIR Formative Research
Date Submitted: Nov 16, 2023
Open Peer Review Period: Nov 16, 2023 - Nov 30, 2023
Date Accepted: May 9, 2024
(closed for review but you can still tweet)
x
ABSTRACT
Background:
One in five adults in the US currently serves as a family caregiver for an individual with a serious illness or disability. Unlike professional caregivers, family caregivers often assume this role without formal preparation or training. Because of this, there is an urgent need to enhance the capacity of family caregivers to provide quality care. Leveraging technology as educational tools or as adjunct to care is a promising approach that has the potential to enhance the learning and caregiving capabilities of family caregivers. Large language models (LLM) can potentially be used as a foundation technology for supporting caregivers. LLMs fall into a category called Foundation Models (FMs), a large-scale model trained on a broad data set that can be adapted to a range of different domain tasks. Despite their potential, FMs have a critical weakness known as “hallucination'” where the model generates information that can be misleading or inaccurate. The reliability of information is essential when language models are deployed as a front-line help for caregivers.
Objective:
This study aimed to (1) develop a reliable Caregiving Language Model (CaLM) by using FMs and a caregiving knowledge base, (2) develop an accessible CaLM using a small FM that requires fewer computing resources, and (3) evaluate the performance of the model compared to a large FM.
Methods:
We developed CaLM using the Retrieval Augmented Generation (RAG) framework combined with FM fine-tuning for improving the quality of FM answers by grounding the model on a caregiving knowledge base. The key components of CaLM are the Caregiving Knowledge Base, a Fine-tuned Foundation Model (FM), and a Retriever module. We used two small FMs as candidates for the foundation of CaLM (LLaMA-2 and Falcon with 7 billion parameters) and a large FM GPT-3.5 (175 billion parameters) as a benchmark. We developed the caregiving knowledge base by gathering various types of documents from the Internet. In this study, we focused on caregivers of individuals with Alzheimer's Disease Related Dementias (ADRD). We evaluated the models’ performance using the benchmark metrics commonly used in evaluating language models and their reliability to provide accurate references with the answers.
Results:
The RAG framework improved the performance of all FMs used in this study across all measures. As expected, the large FM performed better than small FMs across all metrics. The most interesting result is that small fine-tuned FMs with RAG performed significantly better than GPT 3.5 across all metrics. The fine-tuned LLaMA-2 small FM performed better than GPT 3.5 (even with RAG) in returning references with the answers.
Conclusions:
The study shows that reliable and accessible CaLM can be developed by using small FMs with a knowledge base specific to the caregiving domain.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.