Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jun 20, 2025
Date Accepted: Dec 22, 2025

The final, peer-reviewed published version of this preprint can be found here:

Early Detection of Alzheimer's Disease and Related Dementias From Spontaneous Speech Using Foundation Speech and Language Models: Comparative Evaluation

Li J, Mao L, Wang H, Wang Z, Mao X, Ni XS

Early Detection of Alzheimer's Disease and Related Dementias From Spontaneous Speech Using Foundation Speech and Language Models: Comparative Evaluation

JMIR Form Res 2026;10:e79411

DOI: 10.2196/79411

PMID: 42126910

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Early Detection of Alzheimer’s Disease and Related Dementias from Spontaneous Speech: A Benchmarking Study of Foundation Speech and Language Models

  • Jingyu Li; 
  • Lingchao Mao; 
  • Hairong Wang; 
  • Zhendong Wang; 
  • Xi Mao; 
  • Xuelei Sherry Ni

ABSTRACT

Background:

Alzheimer’s disease and related dementias (ADRD) are progressive neurodegenerative conditions where early detection is critical for timely intervention and care planning. However, current diagnostic methods are often inaccessible, costly, and delayed, especially for underserved populations. There is a growing need for scalable, non-invasive tools that can support timely diagnosis. Spontaneous speech contains rich acoustic and linguistic markers that can serve as non-invasive biomarkers for cognitive decline. Foundation models, pre-trained on large-scale audio or text data, generate high-dimensional embeddings that encode rich contextual and acoustic information.

Objective:

This study benchmarks open-source foundation language and speech models to evaluate their effectiveness in detecting ADRD from spontaneous speech as a potential solution for early, non-invasive, and scalable ADRD detection.

Methods:

In this study, we used Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA (PREPARE) Challenge dataset which consists of audio recordings from over 1,600 participants with three distinct categories of cognitive decline: healthy control (HC), mild cognitive impairment (MCI), and Alzheimer’s Disease (AD). We further excluded samples that are non-English, non-spontaneous speech, or of poor quality. Our final samples included 703 (59.13%) HC, 81 (6.81%) MCI, and 405 (34.06%) AD cases. We systematically benchmarked a range of open-source foundation speech and language models to classify cognitive status into three categories (HC, MCI, or AD).

Results:

Whisper-medium model achieved the highest performance among speech models at 0.731 accuracy and 0.802 Area Under the Curve (AUC), while BERT with pause annotation achieved the top accuracy of 0.662 and 0.744 AUC among language models. Overall, ADRD detection based on state-of-the-art automatic speech recognition (ASR) model-generated audio-embeddings outperformed other models, and the inclusion of non-semantic information such as pause patterns consistently improved classification performance of text-embedding based models.

Conclusions:

Our work presents a comprehensive benchmarking framework built on state-of-the-art foundation models and validated on a large, clinically relevant dataset. Acoustic-based approaches – particularly ASR-derived embeddings – present great potential for the development of a more scalable, non-invasive, and cost-effective early detection tool for ADRD.


 Citation

Please cite as:

Li J, Mao L, Wang H, Wang Z, Mao X, Ni XS

Early Detection of Alzheimer's Disease and Related Dementias From Spontaneous Speech Using Foundation Speech and Language Models: Comparative Evaluation

JMIR Form Res 2026;10:e79411

DOI: 10.2196/79411

PMID: 42126910

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.