JMIR Preprints #79411: Early Detection of Alzheimer’s Disease and Related Dementias from Spontaneous Speech: A Benchmarking Study of Foundation Speech and Language Models

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Early Detection of Alzheimer’s Disease and Related Dementias from Spontaneous Speech: A Benchmarking Study of Foundation Speech and Language Models

Jingyu Li;
Lingchao Mao;
Hairong Wang;
Zhendong Wang;
Xi Mao;
Xuelei Sherry Ni

ABSTRACT

Background:

Alzheimer’s disease and related dementias (ADRD) are progressive neurodegenerative conditions where early detection is critical for timely intervention and care planning. However, current diagnostic methods are often inaccessible, costly, and delayed, especially for underserved populations. There is a growing need for scalable, non-invasive tools that can support timely diagnosis. Spontaneous speech contains rich acoustic and linguistic markers that can serve as non-invasive biomarkers for cognitive decline. Foundation models, pre-trained on large-scale audio or text data, generate high-dimensional embeddings that encode rich contextual and acoustic information.

Objective:

This study benchmarks open-source foundation language and speech models to evaluate their effectiveness in detecting ADRD from spontaneous speech as a potential solution for early, non-invasive, and scalable ADRD detection.

Methods:

In this study, we used Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA (PREPARE) Challenge dataset which consists of audio recordings from over 1,600 participants with three distinct categories of cognitive decline: healthy control (HC), mild cognitive impairment (MCI), and Alzheimer’s Disease (AD). We further excluded samples that are non-English, non-spontaneous speech, or of poor quality. Our final samples included 703 (59.13%) HC, 81 (6.81%) MCI, and 405 (34.06%) AD cases. We systematically benchmarked a range of open-source foundation speech and language models to classify cognitive status into three categories (HC, MCI, or AD).

Results:

Whisper-medium model achieved the highest performance among speech models at 0.731 accuracy and 0.802 Area Under the Curve (AUC), while BERT with pause annotation achieved the top accuracy of 0.662 and 0.744 AUC among language models. Overall, ADRD detection based on state-of-the-art automatic speech recognition (ASR) model-generated audio-embeddings outperformed other models, and the inclusion of non-semantic information such as pause patterns consistently improved classification performance of text-embedding based models.

Conclusions:

Our work presents a comprehensive benchmarking framework built on state-of-the-art foundation models and validated on a large, clinically relevant dataset. Acoustic-based approaches – particularly ASR-derived embeddings – present great potential for the development of a more scalable, non-invasive, and cost-effective early detection tool for ADRD.

Citation

Please cite as:

Li J, Mao L, Wang H, Wang Z, Mao X, Ni XS

Early Detection of Alzheimer's Disease and Related Dementias From Spontaneous Speech Using Foundation Speech and Language Models: Comparative Evaluation

JMIR Form Res 2026;10:e79411

DOI: 10.2196/79411

PMID: 42126910

PMCID: 13216759

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jun 20, 2025

Date Accepted: Dec 22, 2025

Early Detection of Alzheimer’s Disease and Related Dementias from Spontaneous Speech: A Benchmarking Study of Foundation Speech and Language Models

ABSTRACT

Citation

Copyright