A Comparative Study of Multi-Class Classification based on Imbalanced Data

  • Rojan-zaki Abdulkareem Akre University for Applied Science
  • Adnan Mohsin Abdulazeez Duhok Polytechnic University – Kurdistan Region – Iraq
Keywords: Class Imbalance, CycleGAN, EfficientNet-B3, Focal Loss, Medical Image Classification

Abstract

Class imbalance presents a significant challenge in creating reliable and precise medical diagnostic models, especially in multi-class classification contexts where rare yet clinically important cases are insufficiently represented. This work addresses the imbalance problem across three different medical datasets: HAM10000, Skin Cancer ISIC, and Non-Alcoholic Fatty Liver Disease (NAFLD) by presenting an extensive deep learning framework using Cycle-Consistent GANs (CycleGAN) for data balancing, integrating advanced data augmentation methods, and applying Focal Loss to enhance training. The suggested architecture utilizes EfficientNet-B3 for image classification and a custom-built Multi-Layer Perceptron (MLP) for evaluating tabular clinical data. The CycleGAN model is employed to create realistic images of minority classes and to replicate oversampling in tabular domains, thus generating balanced and semantically varied datasets. To enhance generalization, we implement real-time augmentation techniques, which encompass image data augmentation via flipping, rotation, and color jittering, alongside normalization strategies for tabular features. This study presents a unified deep learning pipeline that implements real CycleGAN-based oversampling for both image and tabular medical datasets, distinguishing it from previous research. The amalgamation of CycleGAN with Focal Loss and EfficientNetB3 yields improved efficacy in minority-class detection, setting a novel benchmark for imbalanced multi-class medical classification. The performance evaluation was performed using stratified 5-fold cross-validation, employing measures like macro F1-score, balanced accuracy, and ROC-AUC. The proposed method attained superior results across all datasets, with an efficient peak accuracy of 99.33%, a macro F1-score of 96.85%, and a ROC-AUC of 0.9852 on the HAM10000 dataset. The comparative analysis with previous studies illustrates the supremacy of our pipeline in overall accuracy and minority-class sensitivity. The comparison analysis with past studies shows the superiority of our pipeline in generic accuracy and minority-class sensitivity.
Published
2025-11-17
How to Cite
Abdulkareem, R.- zaki, & Abdulazeez, A. M. (2025). A Comparative Study of Multi-Class Classification based on Imbalanced Data. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2731
Section
Research Articles