Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 2, 2025
Date Accepted: Apr 21, 2026

The final, peer-reviewed published version of this preprint can be found here:

Advancing Alzheimer Disease Prediction With Large Language Model–Based Linguistic Feature Analysis: Development and Validation Study

Hsu Mh, Hwang SY, Tsai YH, Chang YC

Advancing Alzheimer Disease Prediction With Large Language Model–Based Linguistic Feature Analysis: Development and Validation Study

JMIR Med Inform 2026;14:e86965

DOI: 10.2196/86965

PMID: 42208123

Advancing Alzheimer’s Disease Prediction with LLM-Based Linguistic Feature Analysis

  • Ming-hsia Hsu; 
  • San-Yih Hwang; 
  • Yi-Hang Tsai; 
  • Yun-Chi Chang

ABSTRACT

Background:

Alzheimer's disease (AD) affects 55 million people globally, projected to reach 139 million by 2050. Early detection is essential for timely intervention, yet traditional diagnostic methods face accessibility barriers due to high costs and invasive procedures. Language impairment is an early AD symptom that can be assessed non-invasively through speech analysis. While large language models (LLMs) show promise for analyzing speech transcripts, existing approaches lack transparency in evaluation processes and standardized frameworks for systematic linguistic analysis, limiting their clinical applicability.

Objective:

This study aims to investigate the influence of linguistic features extracted from transcribed speech, as analyzed by large language models (LLMs), on the accuracy and interpretability of Alzheimer’s disease (AD) prediction.

Methods:

We propose a framework that leverages LLMs to analyze linguistic features extracted from transcribed speech for AD classification. Our approach focuses on four key aspects: readability, fluency, richness of detail, and keyword relevance. To enhance classification accuracy, the framework integrates transcript embeddings with feature explanation embeddings, forming a comprehensive linguistic representation. We conducted extensive ablation studies to evaluate the contributions of individual features and benchmarked our framework against existing LLM-driven methodologies through pairwise explainability evaluations.

Results:

Our framework achieved a precision of 92.07%, a sensitivity of 91.55%, and a specificity of 97.22% on the ADReSSo 2021 dataset, outperforming the performance of existing state-of-the-art LLM-based approaches. Ablation studies identified keyword relevance as the most influential feature, while the integration of transcript and feature embeddings significantly improved predictive performance compared to using either in isolation. Furthermore, explainability evaluations through both LLM-based and human expert assessments demonstrated that our method outperformed benchmark approaches.

Conclusions:

These findings highlight that a structured linguistic feature analysis utilizing LLMs provides a robust and interpretable framework for preliminary AD detection. Our approach offers a scalable and accessible solution that bridges AI-driven text analysis with clinical applications, supporting early detection of cognitive decline through non-invasive assessment methods.


 Citation

Please cite as:

Hsu Mh, Hwang SY, Tsai YH, Chang YC

Advancing Alzheimer Disease Prediction With Large Language Model–Based Linguistic Feature Analysis: Development and Validation Study

JMIR Med Inform 2026;14:e86965

DOI: 10.2196/86965

PMID: 42208123

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.