JMIR Preprints #81936: Two-Wave Comparative Study With Independent Samples on Integrating a Large Language Model Into a Socially Assistive Robot in a Hospital Geriatric Unit: Performance, Engagement, and User Perceptions

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Two-Wave Comparative Study With Independent Samples on Integrating a Large Language Model Into a Socially Assistive Robot in a Hospital Geriatric Unit: Performance, Engagement, and User Perceptions

Lauriane Blavette;
Sébastien Dacunha;
Xavier Alameda-Pineda;
Jeanne Cattoni;
Maribel Pino;
Anne-Sophie Rigaud

ABSTRACT

Background:

Addressing the complex medical and psychosocial needs of older adults (OAs) is increasingly difficult in resource-limited care settings. In this context, socially assistive robots (SARs) provide support and practical functions such as orientation and information delivery. Integrating large language models (LLMs) into SARs dialogue systems offers opportunities to improve interaction fluency and adaptability. Yet in real-world use, acceptability also depends on minimizing both technical and conversational errors, ensuring successful user interactions, and adapting to individual user characteristics.

Objective:

This study aimed to evaluate the impact of integrating a large language model into a SAR dialogue system in a hospital geriatric unit by (1) comparing system performance and interaction success across two experimental waves, (2) examining the links between robot errors, interaction success, and multidimensional user engagement, and (3) exploring how user characteristics influence performance and perceptions of acceptability and usability.

Methods:

Over an 8-month period, 28 older adults (OAs; ≥60 years) attending a geriatric day care hospital (Paris, France) participated in a single-session evaluation of a SAR. Interactions took place in the DCH and were video-recorded across two waves: wave 1 (basic dialogue system) and wave 2 (LLM- enhanced system). From the recordings, system performance (error types, interaction success) and user engagement (verbal, physical, and emotional dimensions) were coded. Acceptability and usability were measured using the Acceptability E-scale and the System Usability Scale. Sociodemographic data were collected, and quantitative results were supplemented with a thematic analysis of qualitative observations.

Results:

Following LLM integration, error-free interactions increased from 27.8% to 70.2% (P<.001), comprehension failures decreased from 47.2% to 17% (P<.001), and interaction success rose from 25.0% to 74.5% (P<.001). Acceptability (AES: 12.8 vs 20.8, P=.003) and usability (SUS: 40.0 vs 60.4, P=.04) were significantly higher in wave 2. Engagement scores did not differ significantly between waves, though emotional engagement correlated positively with interaction success (r=0.28, P<.01), and age was negatively associated with both physical engagement (r=–0.30, P<.001) and acceptability (r=–0.20, P<.05).

Conclusions:

Behavioral engagement with a SAR in geriatric care is shaped by both system performance and individual user characteristics. Improvements in dialogue quality, particularly through the integration of a LLM, were associated with higher interaction success and enhanced user experience. These findings highlight the importance of combining multimodal behavioral analysis with self-reported measures to inform the iterative, user-centered design of socially responsive robots in clinical contexts. Clinical Trial: The study was approved by the French National Ethics Committee (“Comité de Protection des Personnes, CPP Ouest II, Maison de la Recherche Clinique – CHU Angers”; Institutional Review Board [IRB] 2021/20) and complied with the General Data Protection Regulation (GDPR). Data processing was registered with the Data Protection Officer (DPO) under reference number 20210114153645 in the AP-HP registry.

Citation

Please cite as:

Blavette L, Dacunha S, Alameda-Pineda X, Cattoni J, Pino M, Rigaud AS

Integrating a Large Language Model Into a Socially Assistive Robot in a Hospital Geriatric Unit: Two-Wave Comparative Study on Performance, Engagement, and User Perceptions

JMIR Hum Factors 2025;12:e81936

DOI: 10.2196/81936

PMID: 41337745

PMCID: 12712570

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Human Factors

Date Submitted: Aug 13, 2025

Open Peer Review Period: Aug 15, 2025 - Oct 10, 2025

Date Accepted: Oct 29, 2025

(closed for review but you can still tweet)

Two-Wave Comparative Study With Independent Samples on Integrating a Large Language Model Into a Socially Assistive Robot in a Hospital Geriatric Unit: Performance, Engagement, and User Perceptions

ABSTRACT

Citation

Copyright