Accepted for/Published in: JMIR mHealth and uHealth
Date Submitted: Oct 23, 2019
Date Accepted: Oct 3, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Real-time phonation monitoring system based on speech envelope
ABSTRACT
Background:
Voice disorders mainly result from chronic overuse or abuse, particularly for teachers or other occupational voice users. Previous studies have proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with this device and its lack of real-time processing limit its daily application.
Objective:
A wireless microphone (WM)-based real-time monitoring approach, namely the AutoSpeechDetect (ASD) system, was proposed to monitor phonation behavior and dose for occupational voice users; the performance in terms of accuracy was examined in this study.
Methods:
The ASD system received acoustic signals and extracted an energy envelope via the WM. In the proposed ASD system, the adaptive threshold (AT) function was used to detect the presence of speech, based on serial frames of acoustic signals. The genetic algorithm was used to search adequate parameters for the AT in the proposed system. Five teachers were invited to participate in this study to test the performance of the proposed ASD system, by means of the phonation ratio and the detection accuracy. Moreover, we investigated whether the noise reduction (NR) algorithm can overcome the influence of environmental noise in the proposed system.
Results:
The ASD system exhibited speech detection accuracy ranging from 87.57% to 93.34%. Subsequent analyses revealed a phonation ratio between 30% and 45% during 40 min of teaching a class, with most segments less than 10 s. Background noise can significantly reduce the accuracy of the ASD, while the log minimum mean squared error (logMMSE) function can effectively overcome these limitations under noisy conditions.
Conclusions:
This study proposed and validated a novel ASD system consisting of a WM to sense the acoustic energy and a real-time AT for speech detection. An average detection accuracy of 90.75% was demonstrated, and the analytical results were comparable to those of previous research. Although the ASD system using the WM is more susceptible to background noise, unsupervised NR using the logMMSE function can be applied to overcome this limitation. These results indicate that the proposed ASD system can potentially be applied to ambulatory voice monitoring for occupational voice users.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.