Enhancing Multivariate Angular Data Clustering with Dirichlet Process Mixtures and von Mises Distributions
Keywords:
Dirichlet Process, Mixture distributions, Non parametric Bayesian model, Clustering, Ward's algorithm, Modeling Multivariate Angular Data
Abstract
Data clustering is an essential technique for organizing unsupervised data, extracting subjects automatically, and swiftly retrieving or filtering information. In this study, we approach the task of clustering multivariate angular distributions using nonparametric Bayesian mixture models featuring von Mises distributions. Our approach operates within a nonparametric Bayesian framework, specifically leveraging the Dirichlet process. Unlike finite mixture models, our approach assumes an infinite number of clusters initially, inferring the optimal number automatically from the data. Morever, our paper introduces a unified approach, leveraging Ward's algorithm, Dirichlet process, and von Mises Mixture distributions (DPM-MvM), to effectively capture both the structure and variability inherent in the data. We've developed a variational inference algorithm for DPM-MvM enabling automatic determination of the number of clusters. Our experimental results showcase the efficiency and accuracy of our method for analyzing multivariate angular data with state-of-the-art approaches.References
Alhakami, W., ALharbi, A., Bourouis, S., Alroobaea, R., & Bouguila, N. (2019). Network anomaly intrusion detection using a nonparametric Bayesian approach and feature selection. IEEE access, 7, 52181-52190.
Anderlucci, L., & Viroli, C. (2020). Mixtures of Dirichlet-multinomial distributions for supervised and unsupervised classification of short text data. Advances in Data Analysis and Classification, 14(4), 759-770.
Amayri, O., & Bouguila, N. (2016 ). A Bayesian analysis of spherical pattern based on finite Langevin mixture. Applied Soft Computing, 38, 373-383.
Amayri, O., & Bouguila, N. (2016 b). RJMCMC learning for clustering and feature selection of L 2-normalized vectors. In 2016 International Conference on Control, Decision and Information Technologies (CoDIT) (pp. 269-274). IEEE.
Benlakhdar, S., Rziza, M., & Thami, R. O. H. (2024). In-depth Analysis of von Mises Distribution Models: Understanding Theory, Applications, and Future Directions. Statistics, Optimization & Information Computing, 12(4), 1210-1230.
Benlakhdar, S., Rziza, M., & Thami, R. O. H. (2022). Statistical modeling of directional data using a robust hierarchical von mises distribution model: perspectives for wind energy. Computational Statistics, 1-21.
Blei, D. M., & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures.
Bouguila, N., & Ziou, D. (2009). A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Transactions on Neural Networks, 21(1), 107-122.
Bouguila, N., & Ziou, D. (2009). A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Transactions on Neural Networks, 21(1), 107-122.
Bourouis, S., Al Mashrgy, M., & Bouguila, N. (2014). Bayesian learning of finite generalized inverted dirichlet mixtures: Application to object classification and forgery detection. Expert Systems with Applications, 41(5), 2329-2336.
Chen, Y., Zhang, J., & Yeo, C. K. (2022). Privacy-preserving knowledge transfer for intrusion detection with federated deep autoencoding gaussian mixture model. Information Sciences, 609, 1204-1220.
Culasso, F., Gavurova, B., Crocco, E., & Giacosa, E. (2023). Empirical identification of the chief digital officer role: A latent Dirichlet allocation approach. Journal of Business Research, 154, 113301.
Darling, P. (Ed.). (2011). SME mining engineering handbook (Vol. 1). SME.
Darling, W. M. (2011, December). A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 642-647).
Elango, P. K., & Jayaraman, K. (2005). Clustering images using the latent dirichlet allocation model. University of Wisconsin, 1-18.
Endres, F., Plagemann, C., Stachniss, C., & Burgard, W. (2009, June). Unsupervised discovery of object classes from range data using latent Dirichlet allocation. In Robotics: Science and Systems (Vol. 2, p. 113120).
FAN, Wentao et BOUGUILA, Nizar. Spherical data clustering and feature selection through nonparametric Bayesian mixture models with von Mises distributions. Engineering Applications of Artificial Intelligence, 2020, vol. 94, p. 103781.
Fisher, O. J., Rady, A., El-Banna, A. A., Watson, N. J., & Emaish, H. H. (2023). An image processing and machine learning solution to automate Egyptian cotton lint grading. Textile Research Journal, 93(11-12), 2558-2575.
Figueiredo, M. A. T., & Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on pattern analysis and machine intelligence, 24(3), 381-396.
Garre, A., Zwietering, M. H., & den Besten, H. M. (2020). Multilevel modelling as a tool to include variability and uncertainty in quantitative microbiology and risk assessment. Thermal inactivation of Listeria monocytogenes as proof of concept. Food Research International, 137, 109374.
Goel, K., Michael, N., & Tabib, W. (2023). Probabilistic Point Cloud Modeling via Self-Organizing Gaussian Mixture Models. IEEE Robotics and Automation Letters, 8(5), 2526-2533.
Mardia, K.V., Atoum, E.S.A.M.: Bayesian inference for the von Mises-Fisher distribution. Biometrika 63, 203–206 (1976).
Martín-Fernández, J. A., Hron, K., Templ, M., Filzmoser, P., & Palarea-Albaladejo, J. (2015). Bayesian-multiplicative treatment of count zeros in compositional data sets. Statistical Modelling, 15(2), 134-158.
Miller, J. W., & Harrison, M. T. (2018). Mixture models with a prior on the number of components. Journal of the American Statistical Association, 113(521), 340-356.
van de Schoot, R., Depaoli, S., King, R., Kramer, B., Märtens, K., Tadesse, M. G., ... & Yau, C. (2021). Bayesian statistics and modelling. Nature Reviews Methods Primers, 1(1), 1.
Wang, M., Abrams, Z. B., Kornblau, S. M., & Coombes, K. R. (2018). Thresher: determining the number of clusters while removing outliers. BMC bioinformatics, 19, 1-15.
Zhou, T., Law, K., & Creighton, D. (2022). A weakly-supervised graph-based joint sentiment topic model for multi-topic sentiment analysis. Information Sciences, 609, 1030-1051.
Zhang, S., Yao, L., Sun, A., & Tay, Y. (2019). Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR), 52(1), 1-38.
Anderlucci, L., & Viroli, C. (2020). Mixtures of Dirichlet-multinomial distributions for supervised and unsupervised classification of short text data. Advances in Data Analysis and Classification, 14(4), 759-770.
Amayri, O., & Bouguila, N. (2016 ). A Bayesian analysis of spherical pattern based on finite Langevin mixture. Applied Soft Computing, 38, 373-383.
Amayri, O., & Bouguila, N. (2016 b). RJMCMC learning for clustering and feature selection of L 2-normalized vectors. In 2016 International Conference on Control, Decision and Information Technologies (CoDIT) (pp. 269-274). IEEE.
Benlakhdar, S., Rziza, M., & Thami, R. O. H. (2024). In-depth Analysis of von Mises Distribution Models: Understanding Theory, Applications, and Future Directions. Statistics, Optimization & Information Computing, 12(4), 1210-1230.
Benlakhdar, S., Rziza, M., & Thami, R. O. H. (2022). Statistical modeling of directional data using a robust hierarchical von mises distribution model: perspectives for wind energy. Computational Statistics, 1-21.
Blei, D. M., & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures.
Bouguila, N., & Ziou, D. (2009). A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Transactions on Neural Networks, 21(1), 107-122.
Bouguila, N., & Ziou, D. (2009). A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Transactions on Neural Networks, 21(1), 107-122.
Bourouis, S., Al Mashrgy, M., & Bouguila, N. (2014). Bayesian learning of finite generalized inverted dirichlet mixtures: Application to object classification and forgery detection. Expert Systems with Applications, 41(5), 2329-2336.
Chen, Y., Zhang, J., & Yeo, C. K. (2022). Privacy-preserving knowledge transfer for intrusion detection with federated deep autoencoding gaussian mixture model. Information Sciences, 609, 1204-1220.
Culasso, F., Gavurova, B., Crocco, E., & Giacosa, E. (2023). Empirical identification of the chief digital officer role: A latent Dirichlet allocation approach. Journal of Business Research, 154, 113301.
Darling, P. (Ed.). (2011). SME mining engineering handbook (Vol. 1). SME.
Darling, W. M. (2011, December). A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp. 642-647).
Elango, P. K., & Jayaraman, K. (2005). Clustering images using the latent dirichlet allocation model. University of Wisconsin, 1-18.
Endres, F., Plagemann, C., Stachniss, C., & Burgard, W. (2009, June). Unsupervised discovery of object classes from range data using latent Dirichlet allocation. In Robotics: Science and Systems (Vol. 2, p. 113120).
FAN, Wentao et BOUGUILA, Nizar. Spherical data clustering and feature selection through nonparametric Bayesian mixture models with von Mises distributions. Engineering Applications of Artificial Intelligence, 2020, vol. 94, p. 103781.
Fisher, O. J., Rady, A., El-Banna, A. A., Watson, N. J., & Emaish, H. H. (2023). An image processing and machine learning solution to automate Egyptian cotton lint grading. Textile Research Journal, 93(11-12), 2558-2575.
Figueiredo, M. A. T., & Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on pattern analysis and machine intelligence, 24(3), 381-396.
Garre, A., Zwietering, M. H., & den Besten, H. M. (2020). Multilevel modelling as a tool to include variability and uncertainty in quantitative microbiology and risk assessment. Thermal inactivation of Listeria monocytogenes as proof of concept. Food Research International, 137, 109374.
Goel, K., Michael, N., & Tabib, W. (2023). Probabilistic Point Cloud Modeling via Self-Organizing Gaussian Mixture Models. IEEE Robotics and Automation Letters, 8(5), 2526-2533.
Mardia, K.V., Atoum, E.S.A.M.: Bayesian inference for the von Mises-Fisher distribution. Biometrika 63, 203–206 (1976).
Martín-Fernández, J. A., Hron, K., Templ, M., Filzmoser, P., & Palarea-Albaladejo, J. (2015). Bayesian-multiplicative treatment of count zeros in compositional data sets. Statistical Modelling, 15(2), 134-158.
Miller, J. W., & Harrison, M. T. (2018). Mixture models with a prior on the number of components. Journal of the American Statistical Association, 113(521), 340-356.
van de Schoot, R., Depaoli, S., King, R., Kramer, B., Märtens, K., Tadesse, M. G., ... & Yau, C. (2021). Bayesian statistics and modelling. Nature Reviews Methods Primers, 1(1), 1.
Wang, M., Abrams, Z. B., Kornblau, S. M., & Coombes, K. R. (2018). Thresher: determining the number of clusters while removing outliers. BMC bioinformatics, 19, 1-15.
Zhou, T., Law, K., & Creighton, D. (2022). A weakly-supervised graph-based joint sentiment topic model for multi-topic sentiment analysis. Information Sciences, 609, 1030-1051.
Zhang, S., Yao, L., Sun, A., & Tay, Y. (2019). Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR), 52(1), 1-38.
Published
2025-01-02
How to Cite
Benlakhdar, S., Nadarajah, S., Rziza, M., & Oulad Haj Thami, R. (2025). Enhancing Multivariate Angular Data Clustering with Dirichlet Process Mixtures and von Mises Distributions. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2146
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).