JMIR Preprints #13611: Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques

Current Preprint Settings

(as selected by the authors)

1. Allow access to the preprint PDF upon submission to:

(a) Open peer-review purposes
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

2. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

3. When a final paper is published in a JMIR journal, display the preprint as follows:

(a) Allow download
(b) Show abstract only
(c) Do not display anything

4. If the paper is rejected from JMIR journals, display the preprint to:

(a) Logged-in users only
(b) Anybody, anytime
(c) Nobody

Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques

Max Wang;
Wenbo Ge;
Deborah Apthorp;
Hanna Suominen

Background:

Parkinson disease (PD) is a common neurodegenerative disorder that affects between 7 and 10 million people worldwide. No objective test for PD currently exists, and studies suggest misdiagnosis rates of up to 34%. Machine learning (ML) presents an opportunity to improve diagnosis; however, the size and nature of data sets make it difficult to generalize the performance of ML models to real-world applications.

Objective:

This study aims to consolidate prior work and introduce new techniques in feature engineering and ML for diagnosis based on vowel phonation. Additional features and ML techniques were introduced, showing major performance improvements on the large mPower vocal phonation data set.

Methods:

We used 1600 randomly selected /aa/ phonation samples from the entire data set to derive rules for filtering out faulty samples from the data set. The application of these rules, along with a joint age-gender balancing filter, results in a data set of 511 PD patients and 511 controls. We calculated features on a 1.5-second window of audio, beginning at the 1-second mark, for a support vector machine. This was evaluated with 10-fold cross-validation (CV), with stratification for balancing the number of patients and controls for each CV fold.

Results:

We showed that the features used in prior literature do not perform well when extrapolated to the much larger mPower data set. Owing to the natural variation in speech, the separation of patients and controls is not as simple as previously believed. We presented significant performance improvements using additional novel features (with 88.6% certainty, derived from a Bayesian correlated t test) in separating patients and controls, with accuracy exceeding 58%.

Conclusions:

The results are promising, showing the potential for ML in detecting symptoms imperceptible to a neurologist.

Citation

Please cite as:

Wang M, Ge W, Apthorp D, Suominen H

Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques

JMIR Biomed Eng 2020;5(1):e13611

DOI: 10.2196/13611

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Biomedical Engineering

Date Submitted: Feb 4, 2019

Date Accepted: May 14, 2020

Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques

Citation

JMIR Preprints

Accepted for/Published in: JMIR Biomedical Engineering

Date Submitted: Feb 4, 2019

Date Accepted: May 14, 2020

Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques

Citation

Per the author's request the PDF is not available.