Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Development and Qualitative Evaluation of R-Speak: Acceptability and Usability of a Smartphone App System Using AI to Enhance Communication in People With Expressive Aphasia
ABSTRACT
Background:
Aphasia, an acquired language disorder impacting the ability to understand and produce language, greatly impacts effective communication. Large language models (LLMs) like GPT-5 offer potential to support communication by generating human-like sentences and coherent speech and subsequently enhance functional communication for individuals with aphasia.
Objective:
Co-produce a system using LLMs to support communication and explore potential utility and acceptability in people with mild-to-moderate aphasia.
Methods:
: Using the Double Diamond approach: Phase 1: Discover and define; Stroke survivor PPI group (n=5) and research team used MoSCoW prioritisation to develop and prioritise ideas and co-design a software solution (R-SPEAK) to augment verbal communication. Phase 2: Develop and demonstrate; eight LLM’s were evaluated for interpretation using existing datasets from AphasiaBank, ratified by team members. The best-performing model was used for prototype development. Prototype testing was undertaken with 4 people with aphasia (PwA) and 1 carer using semi-structured interviews. A healthcare professional (HCP) focus group (n=6) evaluated the concept and prototype. The topic guide was informed by, and themes from thematic analysis were mapped onto the Technology Acceptance Model (TAM). Participants rated usability with the System Usability Scale (SUS). Phase 3: Refine and resign. To increase the processing speed, we systematically evaluated 12 lightweight open-weight LLMs (0.5B–3.8B) on interpreting real aphasic speech, using clinician-curated dialogues and an LLM-as-a-judge framework assessing relevance, faithfulness, and completeness.
Results:
Initially Mixtral (8x7b), was the best-performing LLM for aphasic utterances, and was utilised for the prototype. PwA rated R-SPEAK as good using the SUS (mean 75). Themes extracted from qualitative data mapped across all three TAM constructs. Attitude towards using; PwA had high hopes whilst clinicians demonstrated more caution about its benefits. Perceived ease of use; participants found it easy to use but it may be more challenging for those with other post stroke impairments or more severe aphasia and training might be needed. Perceived usefulness: R-SPEAK could be useful in many scenarios and has potential to improve independence for PwA. Recommendations for development included improved accuracy, speed and modifications to the interface according to the individual’s needs. Further refinement demonstrated that Qwen (2.5:3b) achieved the strongest overall performance with high faithfulness and sub-second latency, while models under 1.5b parameters showed pronounced hallucination issues, indicating a lower bound on model capacity for reliable clinical speech interpretation.
Conclusions:
Our co-designed R-SPEAK prototype was considered acceptable to patients. Next steps involve ongoing refinement, development of a phone-based app for feasibility testing in a larger and broader cohort of people with mild-to-moderate aphasia.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.