JMIR Preprints #95668: Large Language Models for the Classification of Verbal Communication in Dementia Care

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Large Language Models for the Classification of Verbal Communication in Dementia Care

Atsushi Nakazawa;
Yuto Hogan;
Yuri Nagai;
Mio Ito

ABSTRACT

Background:

Effective verbal communication is a core component of nursing care, particularly in dementia care such as Humanitude. However, manual evaluation of communication quality is time-consuming, subjective, and difficult to scale in training settings. Large language models (LLMs) may enable automated and scalable analysis of verbal communication in caregiving.

Objective:

This study evaluated whether LLMs can reliably classify verbal communication in nursing care training sessions and detect differences in communication patterns across caregiver expertise levels.

Methods:

Care sessions involving simulated patients were conducted with 18 participants, including Humanitude instructors, intermediate practitioners, and novice nurses. Audio recordings were transcribed, segmented into utterances, and classified into 6 communication categories: positive/affectionate expression, request/suggestion, gratitude, explanation, question/confirmation, and none. Four human annotators independently labeled the utterances, and the same transcripts were analyzed using GPT, Claude, and Gemini. Agreement was evaluated using pairwise agreement rates and Cohen’s kappa coefficients. Model performance was further assessed against consensus labels derived from multiple annotators, and non-inferiority/equivalence was tested using two one-sided tests (TOST).

Results:

Inter-annotator agreement among the human annotators was moderate, with pairwise agreement rates ranging from 64.44% to 74.21% and Cohen’s kappa values ranging from 0.554 to 0.664. Among the evaluated LLMs, Claude showed the highest agreement with human annotations, followed by Gemini and GPT. Against consensus labels, Claude achieved the highest accuracy (0.836 for ≥2-annotator consensus; 0.902 for ≥3-annotator consensus), followed by Gemini (0.779; 0.837) and GPT (0.672; 0.732). TOST analysis showed that Gemini achieved statistical equivalence with human annotation (p=0.040), while Claude demonstrated non-inferiority and exceeded the human baseline (p=0.001). Across caregiver groups, instructors showed a higher proportion of positive/affectionate expressions, whereas novice caregivers showed a higher proportion of task-oriented and uncategorized utterances. Overall, LLM-based classification reproduced the general communication patterns observed in human annotations.

Conclusions:

LLM-based classification demonstrated reliability comparable to human annotation for caregiving communication analysis. Claude showed the strongest overall performance, and Gemini achieved statistical equivalence with human annotation. These findings suggest that LLM-based analysis may provide a scalable and objective approach to assessing communication behaviors in Humanitude training and support communication assessment in nursing and medical education. Clinical Trial: Gunma University Hospital (HS2024-044)

Citation

Please cite as:

Nakazawa A, Hogan Y, Nagai Y, Ito M

Large Language Models for the Classification of Verbal Communication in Dementia Care

JMIR Preprints. 25/03/2026:95668

DOI: 10.2196/preprints.95668

URL: https://preprints.jmir.org/preprint/95668

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Education

Date Submitted: Mar 25, 2026

Open Peer Review Period: Mar 26, 2026 - May 21, 2026

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Large Language Models for the Classification of Verbal Communication in Dementia Care

ABSTRACT

Citation

Copyright