Efficient GRU-based Facial Expression Recognition with Adaptive Loss Selection
Keywords:
facial expression recognition, computational efficiency, recurrent neural networks, GRU, LSTM, adaptive loss selection, one-vs-all classification, MediaPipe, statistical equivalence testing
Abstract
As real-world deployment of facial expression recognition systems becomes increasingly prevalent, computational efficiency emerges as a critical consideration alongside recognition accuracy. Current research demonstrates pronounced emphasis on accuracy maximization through sophisticated convolutional architectures, yet systematic evaluation of efficiency-performance trade-offs remains insufficient for practical deployment scenarios. This paper addresses this gap through comprehensive analysis of recurrent neural network architectures for facial expression recognition, specifically comparing Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) implementations within a novel one-vs-all classification framework incorporating adaptive loss function selection. A rigorous 2 × 2 × 2 factorial experimental design systematically evaluates architecture (GRU vs LSTM), optimization strategy (Bayesian vs predefined), and loss function complexity (standard vs advanced with auto-selection) across six basic emotions using the CK+ dataset with MediaPipebased facial landmark features. The investigation reveals that GRU architectures achieve statistical performance equivalence with LSTM while demonstrating 25% computational efficiency advantage (relative complexity 0.75 vs 1.0). The proposed adaptive loss selection mechanism automatically selects focal loss for severe class imbalance (ratio > 11.5), weighted binary cross-entropy for moderate imbalance (ratio 3.5-11.5), and standard binary cross-entropy otherwise. System performance achieves 92.7% ± 5.0% overall accuracy, with per-emotion F1-scores exhibiting substantial variability from 0.215 (fear) to 0.967 (surprise). Comprehensive statistical analysis incorporating power analysis and practical equivalence testing demonstrates optimization strategy equivalence across 25% of evaluated metrics, while architectural comparisons reveal non-equivalence despite similar performance levels. The study acknowledges significant limitations including critically small sample size (n=6 per condition), single dataset validation, and theoretical rather than empirical efficiency validation. These findings provide evidence-based guidelines for architecture selection in resource-constrained facial expression recognition applications, with the adaptive loss selection framework representing a significant methodological contribution for addressing class imbalance challenges in emotion recognition systems.
Published
2025-11-18
How to Cite
Winarno, S., Alzami, F., Santoso, D. A., Naufal, M., Azies, H. A., Brilianto, R. M., & Muthu, K. A. S. (2025). Efficient GRU-based Facial Expression Recognition with Adaptive Loss Selection. Statistics, Optimization & Information Computing, 14(6), 3468-3499. https://doi.org/10.19139/soic-2310-5070-3043
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).