A Hybrid Approach of Long Short Term Memory and Transformer Models for Speech Emotion Recognition
Keywords:
Speech Recognition, Emotion Recognition, Sentiment Analysis, LSTM Model, Transformer Model
Abstract
Speech emotion recognition (SER) has become a critical component of the next generations of technologies that interact between humans and machines. However, in this paper, we explore the advantage of the hybrid LSTM + Transformer model over the solo LSTM and Transformer models. The proposed method contains the following steps: data loading using benchmark datasets such as the Toronto Emotional Speech Set (TESS), Berlin Emotional Speech Database (EMO-DB), and (SAVEE). Secondly, to create a meaningful representation to preprocess raw audio data, Mel-Frequency Cepstral Coefficients (MFCCs) are used; thirdly, the model’s architecture is designed and explained. Finally, we evaluate the precision, recall, F1 score, classification reports, and confusion matrices of the model. The outcome of this experiment based on classification reports and confusion matrices shows that the hybrid LSTM + Transformer model has a remarkable performance on the TESS-DB, surpassing the other models with a 99.64% accuracy rate, while the LSTM model gained 97.50% and the Transformer model achieved 98.21%. For the EMO-DB, the LSTM model achieved the highest accuracy of 73.83%, followed by the hybrid that gained 71.96%, and the Transformer model achieved 70.09%. Lastly, LSTM obtained the highest performance on SAVEE-DB of 65.62% accuracy, followed by the Transformer model which achieved 58.33%, and the hybrid model achieved 56.25%.
Published
2025-05-01
How to Cite
AbuAin , T. (2025). A Hybrid Approach of Long Short Term Memory and Transformer Models for Speech Emotion Recognition. Statistics, Optimization & Information Computing, 14(1), 340-351. https://doi.org/10.19139/soic-2310-5070-2521
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).