Performance of Machine Learning Algorithms for Credit Risk Prediction with Feature Selection

  • Muhammad M. Seliem Cairo university
  • Muhammad Amin Department of Statistics, University of Sargodha, Sargodha, Pakistan
  • Mona Mahmoud Abo El Nasr Taibah University, Al Madinah, Saudi Arabia
  • Emad Abdelaziz Elnaggar College of Business Administration, Afif - Shaqra University
  • Hany Abdelmonem Mohamed Khalifa College of Science and Humanities, Al- Dawadmi- Shaqra University
  • Mona Ahmed Abdelwahab Arab Sciences, Sadat Academy for Management Sciences, Egypt
Keywords: Credit Risk Assessment, Feature Selection, Classification Algorithms, Kappa statistic and Accuracy

Abstract

Financial institutions increasingly rely on machine learning (ML) models to assess credit risk and make lending decisions. Accurate prediction hinges on effective feature selection, which can significantly enhance model performance. This paper investigates the efficacy of seven supervised ML algorithms in predicting credit risk: Naive Bayes, Support Vector Machine, Decision Tree, K-Nearest Neighbor, Artificial Neural Network, Random Forest, and Logistic Regression. Using a German credit dataset comprising 1000 observations with 20 explanatory variables, we evaluated model performance using accuracy, kappa statistic, and F1 score. Two data-splitting scenarios (70-30\% and 80-20\%) were employed to assess robustness. To optimize model performance, we addressed outliers through imputation methods and applied the Boruta algorithm for feature selection, which identified and eliminated six non-contributing features. Our findings consistently demonstrate the superiority of the Random Forest algorithm across both scenarios. In terms of accuracy, Random Forest achieved 77.3\% in the 70-30\% split and 80\% in the 80-20\% split, outperforming all other methods. These results underscore the potential of Random Forest as a valuable tool for credit risk assessment in financial institutions.
Published
2025-04-05
How to Cite
Seliem, M. M., Muhammad Amin, Mona Mahmoud Abo El Nasr, Emad Abdelaziz Elnaggar, Hany Abdelmonem Mohamed Khalifa, & Mona Ahmed Abdelwahab Arab. (2025). Performance of Machine Learning Algorithms for Credit Risk Prediction with Feature Selection. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2392
Section
Research Articles