Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 24, 2026
Date Accepted: May 10, 2026

The final, peer-reviewed published version of this preprint can be found here:

A Machine Learning Approach to Voice-Based Parkinson Disease Screening Using Multiview Spectrogram and Speech Recognition Features: Diagnostic Study

Zahir A, Yu J, Jun JS, Park K, Kim R, Jeong H

A Machine Learning Approach to Voice-Based Parkinson Disease Screening Using Multiview Spectrogram and Speech Recognition Features: Diagnostic Study

JMIR Med Inform 2026;14:e94063

DOI: 10.2196/94063

PMID: 42274996

A Machine Learning Approach to Voice-Based Parkinson Disease Screening Using Multiview Spectrogram and Speech Recognition Features: Diagnostic Study

  • Arifa Zahir; 
  • Jaehong Yu; 
  • Jin-Sun Jun; 
  • Kiwon Park; 
  • Ryul Kim; 
  • Hyundoo Jeong

ABSTRACT

Background:

Parkinson’s disease (PD) frequently manifests early vocal impairment, motivating the development of non-invasive and scalable digital screening tools.

Objective:

This study proposes MSR-PDNet, a multiview spectrogram- based deep learning framework integrating recognition-aware context for PD detection from voice recordings.

Methods:

Voice recordings from 203 participants (121 PD, 82 healthy controls) were collected prospectively. Three spectrogram representations(Mel, STFT, and CQT) were extracted and processed via parallel convo- lutional neural network branches. A recognition ratio (RR) feature vector derived from automatic speech recognition transcript agreement was option- ally fused with spectrogram embeddings. Models were evaluated using strict subject-wise 5-fold cross-validation.

Results:

MSR-PDNet achieved 86.9% mean test accuracy using three- view spectrogram fusion, improving to 97.4% when incorporating RR. RR integration reduced the false negative rate by approximately 84.5%, substan- tially improving sensitivity in screening-oriented settings.

Conclusions:

Combining multiview spectrogram learning with recognition- aware context significantly enhances voice-based PD classification under leakage- free evaluation. The proposed framework supports deployment-oriented, non-invasive PD screening systems.


 Citation

Please cite as:

Zahir A, Yu J, Jun JS, Park K, Kim R, Jeong H

A Machine Learning Approach to Voice-Based Parkinson Disease Screening Using Multiview Spectrogram and Speech Recognition Features: Diagnostic Study

JMIR Med Inform 2026;14:e94063

DOI: 10.2196/94063

PMID: 42274996

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.