Fuzzified Clustering and Sample Reduction for Intelligent High Performance Distributed Classification of Heterogeneous Uncertain Big Data

  • Sherouk Samir Moawad Statistics Department, Faculty of Economics and Political Sciences, Cairo University
  • Magued Osman Statistics Department, Faculty of Economics and Political Sciences, Cairo University, Giza, Egypt
  • Ahmed Shawky Moussa Computer Science Department, Faculty of Computers and Artificial Intelligence, Cairo University, Giza, Egypt
Keywords: Big Data, Fuzzified Clustering, Classifier Ensemble, Weighted Subsampling, Parallel Classification, Sample Reduction, Veracity.

Abstract

diverse datasets efficiently. This paper introduces a Fuzzified Clustering technique with sample reduction and distributed Parallel Classification (FCPC). Fuzzified clustering is particularly well-suited for Big Data as it enables the intelligent partitioning of datasets while managing uncertainties and overlapping data points. The FCPC technique takes advantage of this capability to reduce dataset size, capturing essential data structures and enhancing classification performance. Benchmark Big Data sets are used to compare FCPC with traditional classifiers, which require the entire dataset to fit in memory. Four classification techniques were evaluated in terms of classification evaluation metrics, namely, Accuracy, Area Under the ROC Curve, and F1 Score. The proposed model demonstrated improved classification predictive power with a sample reduction of approximately 90%, leading to enhanced performance and potential reductions in computational resources.
Published
2025-01-29
How to Cite
Moawad, S. S., Osman, M., & Moussa, A. S. (2025). Fuzzified Clustering and Sample Reduction for Intelligent High Performance Distributed Classification of Heterogeneous Uncertain Big Data. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2275
Section
Research Articles