On a new stacked ensemble framework for imputing missing data in the presence of outliers

  • Mahmoud A. Abdel-Fattah Department of Applied Statistics and Econometrics, Faculty of Graduate Studies for Statistical Research, 5 Ahmed Zewail St., Cairo University, Giza 12613, Egypt. https://orcid.org/0009-0000-6603-6207
  • Mai A. Mohsen Department of Applied Statistics and Econometrics, Faculty of Graduate Studies for Statistical Research, 5 Ahmed Zewail St., Cairo University, Giza 12613, Egypt.
  • Amany M. Mousa Department of Applied Statistics and Econometrics, Faculty of Graduate Studies for Statistical Research, 5 Ahmed Zewail St., Cairo University, Giza 12613, Egypt.
Keywords: Missing value imputation, Ensemble, Stacking, MissForest, IRMI, EM, Ridge

Abstract

Missing value imputation (MVI) presents a real challenge which becomes more complicated in the presence of outliers. Although ensemble techniques such as bagging and boosting have been employed for MVI and have shown promising results, stacking has not been investigated in this area, despite its efficiency in prediction tasks. To address this gap, two robust stacking frameworks are proposed for imputing missing data in the presence of outliers, namely RKSF-IM and RESF-IM. These proposed frameworks begin by adding an outlier indicator. Then they employ two different stacking configurations, where MissForest, IRMI, and EM are the base learners, and their predicted values are used as inputs in ridge regression, which acts as a meta learner in the second layer. The RMSE, MAE, and Wasserstein distance metrics of the proposed frameworks are evaluated against those of the mean, median, XGBoost, EM, IRMI, KNN, MissForest, and SVM imputation methods using a simulation study and two real data applications. The simulation study considers different scenarios for missing rates and outliers. The study also investigates the impact of adding an outlier indicator on the performance of the different imputation methods. The proposed stacking configurations show better performance, under the simulation settings, than the competing methods in most scenarios. In addition, many existing imputation methods are further improved by including an outlier indicator variable.
Published
2025-10-20
How to Cite
Abdel-Fattah, M., Mohsen, M., & Mousa, A. (2025). On a new stacked ensemble framework for imputing missing data in the presence of outliers. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2894
Section
Research Articles