PSO+K-means Algorithm for Anomaly Detection in Big Data
Abstract
The use of clustering methods in anomaly detection is considered as an effective approach. The choice of the cluster primary center and the finding of local optimum in the well-known k-means and other classic clustering algorithms are considered as one of the major problems and do not allow to get accurate results in anomaly detection. In this paper to improve the accuracy of anomaly detection based on the combination of PSO (particle swarm optimization) and k-means algorithms, the new weighted clustering method is proposed. The proposed method is tested on Yahoo! S5 dataset and a comparative analysis of the obtained results with the k-means algorithm is performed. The results of experiments show that compared to the k-means algorithm the proposed method is more robust and allows to get more accurate results.References
E. Macedo, Two-step semidefinite programming approach to clustering and dimensionality reduction, Statistics, Optimization and Information Computing, vol.3, no. 3, pp. 294–311, 2015.
N. Karmitsa, A.M. Bagirov, and S. Taheri, Clustering in large data sets with the limited memory bundle method, Pattern Recognition,vol. 83, pp. 245-259, 2018.
N. Karmitsa, A.M. Bagirov, and S. Taheri, New diagonal bundle method for clustering problems in large data sets, European Journal of Operational Research, vol. 263, no. 2, pp. 367-379, 2017.
C. Atilgan and E. Nasibov, A space efficient minimum spanning tree approach to the fuzzy joint points clustering algorithm, IEEE Transactions on Fuzzy Systems, 2018. DOI: 10.1109/TFUZZ.2018.2879465.
E.N. Nasibov and C. Atilgan, A note on fuzzy joint points clustering methods for large datasets, IEEE Transactions on Fuzzy Systems, vol. 24, no. 6, pp. 1648-1653, 2016.
R.M. Alguliyev, R.M. Aliguliyev, A.M. Bagirov, and R.R. Karimov, Batch clustering algorithm for big data sets, Proceedings of the 2016 IEEE 10th International Conference on Application of Information and Communication Technologies, pp.79-82, 2016.
R.M. Alguliyev, R.M. Aliguliyev, Y.N. Imamverdiyev, and L.V.Sukhostat, Weighted clustering for anomaly detection in big data,Statistics, Optimization and Information Computing, vol. 6, no. 2, pp. 178-188, 2018.
R.M. Alguliyev, R.M. Aliguliyev, and L.V. Sukhostat, Anomaly detection in big data based on clustering, Statistics, Optimization and Information Computing, vol. 5, no. 4, pp 325-340, 2017.
S. Rana, S. Jasola, and R. Kumar, A boundary restricted adaptive particle swarm optimization for data clustering, International Journal of Machine Learning and Cybernetics, vol. 4, no. 4, pp. 391-400, 2013.
A.A. Esmin, R.A. Coelho, and S. Matwin, A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data, Artificial Intelligence Review, vol. 44, no. 1, pp 23-45, 2015.
R.J. Kuo, M.J. Wang, and T.W. Huang An application of particle swarm optimization algorithm to clustering analysis, Soft Computing, vol. 15, no. 3, pp. 533-542, 2011.
P. Zhenkui, H. Xia, and H. Jinfeng, The clustering algorithm based on particle swarm optimization algorithm, Proceedings of the International Conference on Intelligent Computation Technology and Automation, pp. 148-151, 2008.
L. Xiao, Z. Shao, G. Liu, k-means algorithm based on particle swarm optimization algorithm for anomaly intrusion detection,Proceedings of the Sixth World Congress on Intelligent Control and Automation, pp. 5854-5854, 2006.
C. Kolias, G. Kambourakis, and M. Maragoudakis, Swarm intelligence in intrusion detection: A survey, Computers and Security,vol. 30, no. 8, pp. 625-642, 2011.
Z.Li,Y.Li,andL.Xu, Anomaly intrusion detection method based on k-means clustering algorithm with particle swarm optimization,Proceedings of the International Conference on Information Technology, Computer Engineering and Management Sciences, pp. 157-161, 2011.
S.H. Li, Y.C. Kao, Z.C. Zhang, Y.P. Chuang, and D.C. Yen, A network behavior-based botnet detection mechanism using PSO and k-means, ACM Transactions on Management Information Systems, vol. 6, no. 1,pp. 3-30, 2015.
A. Karami and M. Guerrero-Zapata, A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks Neurocomputing, vol 149, pp. 1253-1269, 2015.
R.M. Alguliyev, Y.N. Imamverdiyev, and F.C. Abdullayeva, Multicriteria optimization method for load balancing in cloud computing, Problems of Information Technology, no. 2, pp. 3-15, 2017.
J. Chen, Hybrid clustering algorithm based on PSO with the multidimensional asynchronism and stochastic disturbance method, Journal of Theoretical and Applied Information Technology, vol. 46, no.1, pp. 434-440, 2012.
J. Kennedy and R. Eberhart, Particle Swarm Optimization, Proceedings of the lEEE International Conference on Neural Networks, vol. 4, pp. 1942-1948, 1995.
R. Eberhart, Y. Shi, and J. Kennedy, Swarm Intelligence, (1st edition), 512 p., 2002.
R.M. Aliguliyev, Performance evaluation of density-based clustering methods, Information Sciences, vol. 179, no. 20, pp. 3583-3602, 2009.
J.C. Dunn, Well separated clusters and optimal fuzzy partitions, Journal of Cybernetics, vol. 4, no. 1, pp. 95-104, 1974.
S. Saitta, B. Raphael, and F.C. Smith, A bounded index for cluster validity, Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition, pp. 174-185, 2007.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).