Accepted for/Published in: JMIR Biomedical Engineering
Date Submitted: Feb 4, 2019
Date Accepted: May 14, 2020
Robust Feature Engineering for Parkinson’s Disease Diagnosis
ABSTRACT
Background:
Parkinson’s disease is a common neurodegenerative disorder which affects between seven and ten million people worldwide. No objective test for Parkinson’s disease currently exists, and studies suggest misdiagnosis rates of up to 34 percent. Machine learning presents an opportunity to improve diagnosis; however, the size and nature of datasets makes it difficult to generalize the performance of machine learning models to real-world applications.
Objective:
This paper aims to consolidate prior work and introduce new techniques in feature engineering and machine learning for diagnosis based on vowel phonation. Additional features and machine learning techniques are introduced, showing major performance improvements on the large mPower vocal phonation dataset.
Methods:
We use 1,600 randomly-selected /aa/ phonation samples from the entire dataset to derive rules for filtering out faulty samples from the dataset. The application of these rules, along with a joint age-gender balancing filter results in a dataset of 511 PD patients and 511 controls. We calculate features on a 1.5-second window of audio, beginning at the 1-second mark, for a support vector machine. This is evaluated with ten times repeated ten-fold cross-validation, with stratification for balancing the number of patients and controls for each cross-validation fold.
Results:
We show that features used in prior literature do not perform well when extrapolated to the much larger mPower dataset. Due to the natural variation in speech, the separation of patients and controls is not as simple as previously believed. We present significant performance improvements using additional novel features (with 88.6% certainty, derived from a Bayesian correlated t-test) in separating patients and controls, with accuracy exceeding 58%.
Conclusions:
The results are promising, showing the potential for machine learning in detecting symptoms imperceptible to a neurologist. Clinical Trial: N/A
Citation
