A Comparative Study of Multi-Class Classification based on Imbalanced Data

Rojan-zaki Abdulkareem; Adnan Mohsin  Abdulazeez

doi:10.19139/soic-2310-5070-2731

A Comparative Study of Multi-Class Classification based on Imbalanced Data

Rojan-zaki Abdulkareem Akre University for Applied Science
Adnan Mohsin Abdulazeez Duhok Polytechnic University – Kurdistan Region – Iraq

DOI: https://doi.org/10.19139/soic-2310-5070-2731

Keywords: Class Imbalance, CycleGAN, EfficientNet-B3, Focal Loss, Medical Image Classification

Abstract

Class imbalance presents a significant challenge in creating reliable and precise medical diagnostic models, especially in multi-class classification contexts where rare yet clinically important cases are insufficiently represented. This work addresses the imbalance problem across three different medical datasets: HAM10000, Skin Cancer ISIC, and Non-Alcoholic Fatty Liver Disease (NAFLD) by presenting an extensive deep learning framework using Cycle-Consistent GANs (CycleGAN) for data balancing, integrating advanced data augmentation methods, and applying Focal Loss to enhance training. The suggested architecture utilizes EfficientNet-B3 for image classification and a custom-built Multi-Layer Perceptron (MLP) for evaluating tabular clinical data. The CycleGAN model is employed to create realistic images of minority classes and to replicate oversampling in tabular domains, thus generating balanced and semantically varied datasets. To enhance generalization, we implement real-time augmentation techniques, which encompass image data augmentation via flipping, rotation, and color jittering, alongside normalization strategies for tabular features. This study presents a unified deep learning pipeline that implements real CycleGAN-based oversampling for both image and tabular medical datasets, distinguishing it from previous research. The amalgamation of CycleGAN with Focal Loss and EfficientNetB3 yields improved efficacy in minority-class detection, setting a novel benchmark for imbalanced multi-class medical classification. The performance evaluation was performed using stratified 5-fold cross-validation, employing measures like macro F1-score, balanced accuracy, and ROC-AUC. The proposed method attained superior results across all datasets, with an efficient peak accuracy of 99.33%, a macro F1-score of 96.85%, and a ROC-AUC of 0.9852 on the HAM10000 dataset. The comparative analysis with previous studies illustrates the supremacy of our pipeline in overall accuracy and minority-class sensitivity. The comparison analysis with past studies shows the superiority of our pipeline in generic accuracy and minority-class sensitivity.

Published

2025-11-17

How to Cite

Abdulkareem, R.- zaki, & Abdulazeez, A. M. (2025). A Comparative Study of Multi-Class Classification based on Imbalanced Data. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2731

Download Citation

Issue

Online First

Section

Research Articles

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).