Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Biomedical Engineering

Date Submitted: Feb 4, 2019
Date Accepted: May 14, 2020

The final, peer-reviewed published version of this preprint can be found here:

Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques

Wang M, Ge W, Apthorp D, Suominen H

Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques

JMIR Biomed Eng 2020;5(1):e13611

DOI: 10.2196/13611

Robust Feature Engineering for Parkinson’s Disease Diagnosis

  • Max Wang; 
  • Wenbo Ge; 
  • Deborah Apthorp; 
  • Hanna Suominen

ABSTRACT

Background:

Parkinson’s disease is a common neurodegenerative disorder which affects between seven and ten million people worldwide. No objective test for Parkinson’s disease currently exists, and studies suggest misdiagnosis rates of up to 34 percent. Machine learning presents an opportunity to improve diagnosis; however, the size and nature of datasets makes it difficult to generalize the performance of machine learning models to real-world applications.

Objective:

This paper aims to consolidate prior work and introduce new techniques in feature engineering and machine learning for diagnosis based on vowel phonation. Additional features and machine learning techniques are introduced, showing major performance improvements on the large mPower vocal phonation dataset.

Methods:

We use 1,600 randomly-selected /aa/ phonation samples from the entire dataset to derive rules for filtering out faulty samples from the dataset. The application of these rules, along with a joint age-gender balancing filter results in a dataset of 511 PD patients and 511 controls. We calculate features on a 1.5-second window of audio, beginning at the 1-second mark, for a support vector machine. This is evaluated with ten times repeated ten-fold cross-validation, with stratification for balancing the number of patients and controls for each cross-validation fold.

Results:

We show that features used in prior literature do not perform well when extrapolated to the much larger mPower dataset. Due to the natural variation in speech, the separation of patients and controls is not as simple as previously believed. We present significant performance improvements using additional novel features (with 88.6% certainty, derived from a Bayesian correlated t-test) in separating patients and controls, with accuracy exceeding 58%.

Conclusions:

The results are promising, showing the potential for machine learning in detecting symptoms imperceptible to a neurologist. Clinical Trial: N/A


 Citation

Please cite as:

Wang M, Ge W, Apthorp D, Suominen H

Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques

JMIR Biomed Eng 2020;5(1):e13611

DOI: 10.2196/13611

Per the author's request the PDF is not available.