A Comparative Study of Multi-Class Classification based on Imbalanced Data
Keywords:
Class Imbalance, CycleGAN, EfficientNet-B3, Focal Loss, Medical Image Classification
Abstract
Class imbalance presents a significant challenge in creating reliable and precise medical diagnostic models, especially in multi-class classification contexts where rare yet clinically important cases are insufficiently represented. This work addresses the imbalance problem across three different medical datasets: HAM10000, Skin Cancer ISIC, and Non-Alcoholic Fatty Liver Disease (NAFLD) by presenting an extensive deep learning framework using Cycle-Consistent GANs (CycleGAN) for data balancing, integrating advanced data augmentation methods, and applying Focal Loss to enhance training. The suggested architecture utilizes EfficientNet-B3 for image classification and a custom-built Multi-Layer Perceptron (MLP) for evaluating tabular clinical data. The CycleGAN model is employed to create realistic images of minority classes and to replicate oversampling in tabular domains, thus generating balanced and semantically varied datasets. To enhance generalization, we implement real-time augmentation techniques, which encompass image data augmentation via flipping, rotation, and color jittering, alongside normalization strategies for tabular features. This study presents a unified deep learning pipeline that implements real CycleGAN-based oversampling for both image and tabular medical datasets, distinguishing it from previous research. The amalgamation of CycleGAN with Focal Loss and EfficientNetB3 yields improved efficacy in minority-class detection, setting a novel benchmark for imbalanced multi-class medical classification. The performance evaluation was performed using stratified 5-fold cross-validation, employing measures like macro F1-score, balanced accuracy, and ROC-AUC. The proposed method attained superior results across all datasets, with an efficient peak accuracy of 99.33%, a macro F1-score of 96.85%, and a ROC-AUC of 0.9852 on the HAM10000 dataset. The comparative analysis with previous studies illustrates the supremacy of our pipeline in overall accuracy and minority-class sensitivity. The comparison analysis with past studies shows the superiority of our pipeline in generic accuracy and minority-class sensitivity.
Published
2025-11-17
How to Cite
Abdulkareem, R.- zaki, & Abdulazeez, A. M. (2025). A Comparative Study of Multi-Class Classification based on Imbalanced Data. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2731
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).