Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 25, 2020
Date Accepted: Apr 25, 2021

The final, peer-reviewed published version of this preprint can be found here:

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

Hu HC, Chang SY, Wang CH, Li KJ, Cho HY, Chen YT, Lu CJ, Tsai TP, Lee OKS

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

J Med Internet Res 2021;23(6):e25247

DOI: 10.2196/25247

PMID: 34100770

PMCID: 8241431

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Artificial Intelligence Application for Vocal Fold Disease Prediction Through Voice Recognition: Development and Usability Study

  • Hao-Chun Hu; 
  • Shyue-Yih Chang; 
  • Chuen-Heng Wang; 
  • Kai-Jun Li; 
  • Hsiao-Yun Cho; 
  • Yi-Ting Chen; 
  • Chang-Jung Lu; 
  • Tzu-Pei Tsai; 
  • Oscar Kuang-Sheng Lee

ABSTRACT

Background:

Dysphonia influences the quality of life by interfering with communication. However, laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis.

Objective:

This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence.

Methods:

We collected 29 normal voice samples and 527 samples of individuals with voice disorders, including vocal atrophy (n=210), unilateral vocal paralysis (n=43), organic vocal fold lesions (n=244), and adductor spasmodic dysphonia (n=30). The 556 samples were divided into two sets: 440 samples as the training set and 116 samples as the testing set. A convolutional neural network approach was applied to train the model and findings were compared with human specialists.

Results:

The convolutional neural network model achieved a sensitivity of 0.70, a specificity of 0.90, and an overall accuracy of 65.5% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared to human specialists, the overall accuracy was 58.6% and 49.1% for the two laryngologists, and 38.8% and 34.5% for the two general ear, nose, and throat doctors.

Conclusions:

We developed an artificial intelligence-based screening tool for common vocal fold diseases, which possessed high specificity after training with our Mandarin pathological voice database. This approach has clinical potential to use artificial intelligence for general vocal fold disease screening via voice and includes a quick survey during a general health examination. It can be applied in telemedicine for areas that lack laryngoscopic abilities in primary care units.


 Citation

Please cite as:

Hu HC, Chang SY, Wang CH, Li KJ, Cho HY, Chen YT, Lu CJ, Tsai TP, Lee OKS

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

J Med Internet Res 2021;23(6):e25247

DOI: 10.2196/25247

PMID: 34100770

PMCID: 8241431

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.