Performance of Machine Learning Algorithms for Credit Risk Prediction with Feature Selection
Keywords:
Credit Risk Assessment, Feature Selection, Classification Algorithms, Kappa statistic and Accuracy
Abstract
Financial institutions increasingly rely on machine learning (ML) models to assess credit risk and make lending decisions. Accurate prediction hinges on effective feature selection, which can significantly enhance model performance. This paper investigates the efficacy of seven supervised ML algorithms in predicting credit risk: Naive Bayes, Support Vector Machine, Decision Tree, K-Nearest Neighbor, Artificial Neural Network, Random Forest, and Logistic Regression. Using a German credit dataset comprising 1000 observations with 20 explanatory variables, we evaluated model performance using accuracy, kappa statistic, and F1 score. Two data-splitting scenarios (70-30\% and 80-20\%) were employed to assess robustness. To optimize model performance, we addressed outliers through imputation methods and applied the Boruta algorithm for feature selection, which identified and eliminated six non-contributing features. Our findings consistently demonstrate the superiority of the Random Forest algorithm across both scenarios. In terms of accuracy, Random Forest achieved 77.3\% in the 70-30\% split and 80\% in the 80-20\% split, outperforming all other methods. These results underscore the potential of Random Forest as a valuable tool for credit risk assessment in financial institutions.
Published
2025-04-05
How to Cite
Seliem, M. M., Muhammad Amin, Mona Mahmoud Abo El Nasr, Emad Abdelaziz Elnaggar, Hany Abdelmonem Mohamed Khalifa, & Mona Ahmed Abdelwahab Arab. (2025). Performance of Machine Learning Algorithms for Credit Risk Prediction with Feature Selection. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2392
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).