Abnormal Behavior Detection in Surveillance Systems Using a Hybrid EfficientNet-Transformer Model
Keywords:
Anomaly detection Deep learning Unsupervised learning Transformers
Abstract
Anomaly detection in video surveillance is vital for public safety, but challenges arise from the unpredictability of abnormal behaviors and large-scale systems. We propose a hybrid architecture combining EfficientNetV2S for efficient feature extraction with a transformer encoder to capture long-range dependencies through self-attention. This model robustly detects abnormal events by modeling local and global patterns in video frames. Evaluated on UCSD Ped1, UCSD Ped2, and Avenue datasets, our approach achieved accuracies of 99.51, 99.80, and 94.82, outperforming existing methods and proving their suitability for real-time smart surveillance applications.References
[1] Nayak, Rashmiranjan, Umesh Chandra Pati, and Santos Kumar Das. "A comprehensive review on deep learning-based methods for video anomaly detection." Image and Vision Computing 106 (2021): 104078.
[2] Sultani, Waqas, Chen Chen, and Mubarak Shah. "Real-world anomaly detection in surveillance videos." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6479-6488. 2018.
[3] Markovitz, Amir, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, and Shai Avidan. "Graph embedded pose clustering for anomaly detection." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10539-10547. 2020.
[4] Morais, Romero, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Mansour, and Svetha Venkatesh. "Learning regularity in skeleton trajectories for anomaly detection in videos." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11996-12004. 2019.
[5] Calderara, Simone, Uri Heinemann, Andrea Prati, Rita Cucchiara, and Naftali Tishby. "Detecting anomalies in people's trajectories using spectral graph analysis." Computer Vision and Image Understanding 115, no. 8 (2011): 1099-1111.
[6] Tung, Frederick, John S. Zelek, and David A. Clausi. "Goal-based trajectory analysis for unusual behavior detection in intelligent surveillance." Image and Vision Computing 29, no. 4 (2011): 230-240.
[7] Li, Ce, Zhenjun Han, Qixiang Ye, and Jianbin Jiao. "Visual abnormal behavior detection based on trajectory sparse reconstruction analysis." Neurocomputing 119 (2013): 94-100.
[8] Saruwatari, Kota, Fumihiko Sakaue, and Jun Sato. "Detection of abnormal driving using multiple view geometry in space-time." In 2012 IEEE Intelligent Vehicles Symposium, pp. 1102-1107. IEEE, 2012.
[9] Mehran, Ramin, Alexis Oyama, and Mubarak Shah. "Abnormal crowd behavior detection using social force model." In 2009 IEEE conference on computer vision and pattern recognition, pp. 935-942. IEEE, 2009.
[10] Gu, Xuxin, Jinrong Cui, and Qi Zhu. "Abnormal crowd behavior detection by using the particle entropy." Optik 125, no. 14 (2014): 3428-3433.
[11] Sargano, Allah Bux, Plamen Angelov, and Zulfiqar Habib. "A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition." Applied Sciences 7, no. 1 (2017): 110.
[12] Saligrama, Venkatesh, and Zhu Chen. "Video anomaly detection based on local statistical aggregates." In 2012 IEEE Conference on computer vision and pattern recognition, pp. 2112-2119. IEEE, 2012.
[13] Xu, Dan, Yan Yan, Elisa Ricci, and Nicu Sebe. "Detecting anomalous events in videos by learning deep representations of appearance and motion." Computer Vision and Image Understanding 156 (2017): 117-127.
[14] Zhang, Ying, Huchuan Lu, Lihe Zhang, and Xiang Ruan. "Combining motion and appearance cues for anomaly detection." Pattern Recognition 51 (2016): 443-452.
[15] Wang, Siqi, En Zhu, Jianping Yin, and Fatih Porikli. "Video anomaly detection and localization by local motion-based joint video representation and OCELM." Neurocomputing 277 (2018): 161-175.
[16] Anala, M. R., Malika Makker, and Aakanksha Ashok. "Anomaly detection in surveillance videos." In 2019 26th International Conference on High-Performance Computing, Data and Analytics Workshop (HiPCW), pp. 93-98. IEEE, 2019.
[17] Zhou, Joey Tianyi, Jiawei Du, Hongyuan Zhu, Xi Peng, Yong Liu, and Rick Siow Mong Goh. "Anomalynet: An anomaly detection network for video surveillance." IEEE Transactions on Information Forensics and Security 14, no. 10 (2019): 2537-2550.
[18] Hu, Jingtao, En Zhu, Siqi Wang, Siwei Wang, Xinwang Liu, and Jianping Yin. "Two-stage unsupervised video anomaly detection using low-rank based unsupervised one-class learning with ridge regression." In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1-8. IEEE, 2019.
[19] Singh, Kuldeep, Shantanu Rajora, Dinesh Kumar Vishwakarma, Gaurav Tripathi, Sandeep Kumar, and Gurjit Singh Walia. "Crowd anomaly detection using aggregation of ensembles of fine-tuned convents." Neurocomputing 371 (2020): 188-198.
[20] Zhou, Shifu, Wei Shen, Dan Zeng, Mei Fang, Yuanwang Wei, and Zhijiang Zhang. "Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes." Signal Processing: Image Communication 47 (2016): 358-368.
[21] Khaire, Pushpajit, and Praveen Kumar. "A semi-supervised deep learning-based video anomaly detection framework using RGB-D for surveillance of real-world critical environments." Forensic Science International: Digital Investigation 40 (2022): 301346.
[22] Lv, Hui, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, and Jian Yang. "Learning normal dynamics in videos with meta prototype network." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15425-15434. 2021.
[23] Medel, Jefferson Ryan, and Andreas Savakis. "Anomaly detection in video using predictive convolutional long short-term memory networks." arXiv preprint arXiv:1612.00390 (2016).
[24] Gong, Dong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. "Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705-1714. 2019.
[25] Liu, Zhian, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. "A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 13588-13597. 2021.
[26] Prawiro, Herman, Jian-Wei Peng, Tse-Yu Pan, and Min-Chun Hu. "Abnormal event detection in surveillance videos using the two-stream decoder." In 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1-6. IEEE, 2020.
[27] Kommanduri, Rangachary, and Mrinmoy Ghorai. "Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection." Journal of Visual Communication and Image Representation (2023): 103860.
[28] Tang, Yao, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. "Integrating prediction and reconstruction for anomaly detection." Pattern Recognition Letters 129 (2020): 123-130.
[29] Lindemann, Benjamin, Benjamin Maschler, Nada Sahlab, and Michael Weyrich. "A survey on anomaly detection for technical systems using LSTM networks." Computers in Industry 131 (2021): 103498.
[30] Ullah, Waseem, Amin Ullah, Ijaz Ul Haq, Khan Muhammad, Muhammad Sajjad, and Sung Wook Baik. "CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks." Multimedia tools and applications 80 (2021): 16979-16995.
[31] K. Han et al., "A Survey on Vision Transformer," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 87-110, 1 Jan. 2023, doi: 10.1109/TPAMI.2022.3152247.
[32] Chicco, Davide, and Giuseppe Jurman. "The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification." BioData Mining 16, no. 1 (2023): 1-23.
[33] Mahadevan, Vijay, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. "Anomaly detection in crowded scenes." In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975-1981. IEEE, 2010.
[34] Lu, Cewu, Jianping Shi, and Jiaya Jia. "Abnormal event detection at 150 fps in Matlab." In Proceedings of the IEEE International Conference on Computer Vision, pp. 2720-2727. 2013.
[35] Zhang, Qianqian, Guorui Feng, and Hangzhou Wu. "Surveillance video anomaly detection via non-local U-Net frame prediction." Multimedia Tools and Applications (2022): 1-16.
[36] Hao, Yi, Jie Li, Nannan Wang, Xiaoyu Wang, and Xinbo Gao. "Spatiotemporal consistency-enhanced network for video anomaly detection." Pattern Recognition 121 (2022): 108232.
[37] Chang, Yunpeng, Zhigang Tu, Wei Xie, Bin Luo, Shifu Zhang, Haigang Sui, and Junsong Yuan. "Video anomaly detection with spatio-temporal dissociation." Pattern Recognition 122 (2022): 108213.
[38] Cho, MyeongAh, Taeoh Kim, Woo Jin Kim, Suhwan Cho, and Sangyoun Lee. "Unsupervised video anomaly detection via normalizing flows with implicit latent features." Pattern Recognition 129 (2022): 108703.
[39] Wang, Yang, Tianying Liu, Jiaogen Zhou, and Jihong Guan. "Video anomaly detection based on spatio-temporal relationships among objects." Neurocomputing 532 (2023): 141-151.
[40] Wang, Zhiqiang, Xiaojing Gu, Jingyu Hu, and Xingsheng Gu. "Ensemble anomaly score for video anomaly detection using denoise diffusion model and motion filters." Neurocomputing 553 (2023): 126589.
[41] Wen, Xiaopeng, Huicheng Lai, Guxue Gao, Yang Xiao, Tongguan Wang, Zhenhong Jia, and Liejun Wang. "Video anomaly detection based on cross-frame prediction mechanism and spatio-temporal memory-enhanced pseudo-3D encoder." Engineering Applications of Artificial Intelligence 126 (2023): 107057.
[42] Shao, Wenhao, Praboda Rajapaksha, Yanyan Wei, Dun Li, Noel Crespi, and Zhigang Luo. "COVAD: Content-oriented video anomaly detection using a self-attention based deep learning model." Virtual Reality & Intelligent Hardware 5, no. 1 (2023): 24-41.
[43] Kommanduri, Rangachary, and Mrinmoy Ghorai. "Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection." Journal of Visual Communication and Image Representation (2023): 103860.
[44] Wang, Le, Junwen Tian, Sanping Zhou, Haoyue Shi, and Gang Hua. "Memory-augmented appearance-motion network for video anomaly detection." Pattern Recognition 138 (2023): 109335.
[45] Amjadian, Ehsan & Prayogo, Nicholas & McDonnell, Serena & Smyth, Cathal & Abid, Muhammad. (2021). Attended over Distributed Specificity for Information Extraction in Cybersecurity. 1-12. 10.1109/AERO50100.2021.9438369.
[2] Sultani, Waqas, Chen Chen, and Mubarak Shah. "Real-world anomaly detection in surveillance videos." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6479-6488. 2018.
[3] Markovitz, Amir, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, and Shai Avidan. "Graph embedded pose clustering for anomaly detection." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10539-10547. 2020.
[4] Morais, Romero, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Mansour, and Svetha Venkatesh. "Learning regularity in skeleton trajectories for anomaly detection in videos." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11996-12004. 2019.
[5] Calderara, Simone, Uri Heinemann, Andrea Prati, Rita Cucchiara, and Naftali Tishby. "Detecting anomalies in people's trajectories using spectral graph analysis." Computer Vision and Image Understanding 115, no. 8 (2011): 1099-1111.
[6] Tung, Frederick, John S. Zelek, and David A. Clausi. "Goal-based trajectory analysis for unusual behavior detection in intelligent surveillance." Image and Vision Computing 29, no. 4 (2011): 230-240.
[7] Li, Ce, Zhenjun Han, Qixiang Ye, and Jianbin Jiao. "Visual abnormal behavior detection based on trajectory sparse reconstruction analysis." Neurocomputing 119 (2013): 94-100.
[8] Saruwatari, Kota, Fumihiko Sakaue, and Jun Sato. "Detection of abnormal driving using multiple view geometry in space-time." In 2012 IEEE Intelligent Vehicles Symposium, pp. 1102-1107. IEEE, 2012.
[9] Mehran, Ramin, Alexis Oyama, and Mubarak Shah. "Abnormal crowd behavior detection using social force model." In 2009 IEEE conference on computer vision and pattern recognition, pp. 935-942. IEEE, 2009.
[10] Gu, Xuxin, Jinrong Cui, and Qi Zhu. "Abnormal crowd behavior detection by using the particle entropy." Optik 125, no. 14 (2014): 3428-3433.
[11] Sargano, Allah Bux, Plamen Angelov, and Zulfiqar Habib. "A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition." Applied Sciences 7, no. 1 (2017): 110.
[12] Saligrama, Venkatesh, and Zhu Chen. "Video anomaly detection based on local statistical aggregates." In 2012 IEEE Conference on computer vision and pattern recognition, pp. 2112-2119. IEEE, 2012.
[13] Xu, Dan, Yan Yan, Elisa Ricci, and Nicu Sebe. "Detecting anomalous events in videos by learning deep representations of appearance and motion." Computer Vision and Image Understanding 156 (2017): 117-127.
[14] Zhang, Ying, Huchuan Lu, Lihe Zhang, and Xiang Ruan. "Combining motion and appearance cues for anomaly detection." Pattern Recognition 51 (2016): 443-452.
[15] Wang, Siqi, En Zhu, Jianping Yin, and Fatih Porikli. "Video anomaly detection and localization by local motion-based joint video representation and OCELM." Neurocomputing 277 (2018): 161-175.
[16] Anala, M. R., Malika Makker, and Aakanksha Ashok. "Anomaly detection in surveillance videos." In 2019 26th International Conference on High-Performance Computing, Data and Analytics Workshop (HiPCW), pp. 93-98. IEEE, 2019.
[17] Zhou, Joey Tianyi, Jiawei Du, Hongyuan Zhu, Xi Peng, Yong Liu, and Rick Siow Mong Goh. "Anomalynet: An anomaly detection network for video surveillance." IEEE Transactions on Information Forensics and Security 14, no. 10 (2019): 2537-2550.
[18] Hu, Jingtao, En Zhu, Siqi Wang, Siwei Wang, Xinwang Liu, and Jianping Yin. "Two-stage unsupervised video anomaly detection using low-rank based unsupervised one-class learning with ridge regression." In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1-8. IEEE, 2019.
[19] Singh, Kuldeep, Shantanu Rajora, Dinesh Kumar Vishwakarma, Gaurav Tripathi, Sandeep Kumar, and Gurjit Singh Walia. "Crowd anomaly detection using aggregation of ensembles of fine-tuned convents." Neurocomputing 371 (2020): 188-198.
[20] Zhou, Shifu, Wei Shen, Dan Zeng, Mei Fang, Yuanwang Wei, and Zhijiang Zhang. "Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes." Signal Processing: Image Communication 47 (2016): 358-368.
[21] Khaire, Pushpajit, and Praveen Kumar. "A semi-supervised deep learning-based video anomaly detection framework using RGB-D for surveillance of real-world critical environments." Forensic Science International: Digital Investigation 40 (2022): 301346.
[22] Lv, Hui, Chen Chen, Zhen Cui, Chunyan Xu, Yong Li, and Jian Yang. "Learning normal dynamics in videos with meta prototype network." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15425-15434. 2021.
[23] Medel, Jefferson Ryan, and Andreas Savakis. "Anomaly detection in video using predictive convolutional long short-term memory networks." arXiv preprint arXiv:1612.00390 (2016).
[24] Gong, Dong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. "Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1705-1714. 2019.
[25] Liu, Zhian, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. "A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 13588-13597. 2021.
[26] Prawiro, Herman, Jian-Wei Peng, Tse-Yu Pan, and Min-Chun Hu. "Abnormal event detection in surveillance videos using the two-stream decoder." In 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1-6. IEEE, 2020.
[27] Kommanduri, Rangachary, and Mrinmoy Ghorai. "Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection." Journal of Visual Communication and Image Representation (2023): 103860.
[28] Tang, Yao, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. "Integrating prediction and reconstruction for anomaly detection." Pattern Recognition Letters 129 (2020): 123-130.
[29] Lindemann, Benjamin, Benjamin Maschler, Nada Sahlab, and Michael Weyrich. "A survey on anomaly detection for technical systems using LSTM networks." Computers in Industry 131 (2021): 103498.
[30] Ullah, Waseem, Amin Ullah, Ijaz Ul Haq, Khan Muhammad, Muhammad Sajjad, and Sung Wook Baik. "CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks." Multimedia tools and applications 80 (2021): 16979-16995.
[31] K. Han et al., "A Survey on Vision Transformer," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 87-110, 1 Jan. 2023, doi: 10.1109/TPAMI.2022.3152247.
[32] Chicco, Davide, and Giuseppe Jurman. "The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification." BioData Mining 16, no. 1 (2023): 1-23.
[33] Mahadevan, Vijay, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. "Anomaly detection in crowded scenes." In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1975-1981. IEEE, 2010.
[34] Lu, Cewu, Jianping Shi, and Jiaya Jia. "Abnormal event detection at 150 fps in Matlab." In Proceedings of the IEEE International Conference on Computer Vision, pp. 2720-2727. 2013.
[35] Zhang, Qianqian, Guorui Feng, and Hangzhou Wu. "Surveillance video anomaly detection via non-local U-Net frame prediction." Multimedia Tools and Applications (2022): 1-16.
[36] Hao, Yi, Jie Li, Nannan Wang, Xiaoyu Wang, and Xinbo Gao. "Spatiotemporal consistency-enhanced network for video anomaly detection." Pattern Recognition 121 (2022): 108232.
[37] Chang, Yunpeng, Zhigang Tu, Wei Xie, Bin Luo, Shifu Zhang, Haigang Sui, and Junsong Yuan. "Video anomaly detection with spatio-temporal dissociation." Pattern Recognition 122 (2022): 108213.
[38] Cho, MyeongAh, Taeoh Kim, Woo Jin Kim, Suhwan Cho, and Sangyoun Lee. "Unsupervised video anomaly detection via normalizing flows with implicit latent features." Pattern Recognition 129 (2022): 108703.
[39] Wang, Yang, Tianying Liu, Jiaogen Zhou, and Jihong Guan. "Video anomaly detection based on spatio-temporal relationships among objects." Neurocomputing 532 (2023): 141-151.
[40] Wang, Zhiqiang, Xiaojing Gu, Jingyu Hu, and Xingsheng Gu. "Ensemble anomaly score for video anomaly detection using denoise diffusion model and motion filters." Neurocomputing 553 (2023): 126589.
[41] Wen, Xiaopeng, Huicheng Lai, Guxue Gao, Yang Xiao, Tongguan Wang, Zhenhong Jia, and Liejun Wang. "Video anomaly detection based on cross-frame prediction mechanism and spatio-temporal memory-enhanced pseudo-3D encoder." Engineering Applications of Artificial Intelligence 126 (2023): 107057.
[42] Shao, Wenhao, Praboda Rajapaksha, Yanyan Wei, Dun Li, Noel Crespi, and Zhigang Luo. "COVAD: Content-oriented video anomaly detection using a self-attention based deep learning model." Virtual Reality & Intelligent Hardware 5, no. 1 (2023): 24-41.
[43] Kommanduri, Rangachary, and Mrinmoy Ghorai. "Bi-READ: Bi-Residual AutoEncoder based feature enhancement for video anomaly detection." Journal of Visual Communication and Image Representation (2023): 103860.
[44] Wang, Le, Junwen Tian, Sanping Zhou, Haoyue Shi, and Gang Hua. "Memory-augmented appearance-motion network for video anomaly detection." Pattern Recognition 138 (2023): 109335.
[45] Amjadian, Ehsan & Prayogo, Nicholas & McDonnell, Serena & Smyth, Cathal & Abid, Muhammad. (2021). Attended over Distributed Specificity for Information Extraction in Cybersecurity. 1-12. 10.1109/AERO50100.2021.9438369.
Published
2025-01-09
How to Cite
Alberry, H. A., Khalifa, M. E., & Taha, A. (2025). Abnormal Behavior Detection in Surveillance Systems Using a Hybrid EfficientNet-Transformer Model. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2259
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).