Accepted for/Published in: JMIR Formative Research
Date Submitted: May 29, 2022
Open Peer Review Period: May 27, 2022 - Jun 8, 2022
Date Accepted: Aug 9, 2022
Date Submitted to PubMed: Aug 19, 2022
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Training and Profiling a Pediatric Emotion Recognition Classifier on Mobile Devices for Autism Detection and Treatment
ABSTRACT
Background:
Implementing automated emotion recognition on mobile devices could provide an accessible diagnostic and therapeutic tool for those who struggle to recognize emotion, including children with developmental behavioral conditions such as autism. Although recent advances have been made in building more accurate emotion classifiers, existing models are too computationally expensive to be deployed on smartphones.
Objective:
In this study, we explored the deployment of several state-of-the-art emotion classifiers designed for usage on mobile devices. We additionally explored various post-training optimization techniques for both classification performance and efficiency on a Motorola Moto G6 phone.
Methods:
We collected images from twelve public datasets and used video frames crowdsourced from the GuessWhat app to train our classifiers. All images were annotated for 7 emotions: neutrality, fear, happiness, sadness, surprise, anger, and disgust. We tested two copies for each of five different convolutional neural network architectures: MobileNetV3-Small 1.0x, MobileNetV2 1.0x, EfficientNetB0, MobileNetV3-Large 1.0x, and NASNetMobile. The first copy trained on images from all public datasets but excluded the GuessWhat frames, which the second copy included. We evaluated each model against the Child Affective Facial Expression set. We then performed weight pruning, weight clustering, and quantize-aware training when possible and profiled the performance of each model on the Moto G6.
Results:
Our best model, a MobileNetV3-Large network pre-trained on ImageNet, achieved 65.21% balanced accuracy and 64.50% F1-score on CAFE while achieving a 90-millisecond inference latency on a Motorola Moto G6 phone when trained on all public datasets and the GuessWhat images. This balanced accuracy is only 1.89% lower than the current state of the art for CAFE, a model with 13.91x more parameters and which was unable to run on the Moto G6 due to its size, even when fully optimized.
Conclusions:
This work demonstrates that with specialized design and optimization techniques, machine learning models can become lightweight enough to run on mobile devices while achieving high performance on difficult image classification tasks. The models developed in this study can be integrated into mobile health therapies to diagnose ASD and to provide targeted therapeutic treatment to children.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.