On a new stacked ensemble framework for imputing missing data in the presence of outliers
Keywords:
Missing value imputation, Ensemble, Stacking, MissForest, IRMI, EM, Ridge
Abstract
Missing value imputation (MVI) presents a real challenge which becomes more complicated in the presence of outliers. Although ensemble techniques such as bagging and boosting have been employed for MVI and have shown promising results, stacking has not been investigated in this area, despite its efficiency in prediction tasks. To address this gap, two robust stacking frameworks are proposed for imputing missing data in the presence of outliers, namely RKSF-IM and RESF-IM. These proposed frameworks begin by adding an outlier indicator. Then they employ two different stacking configurations, where MissForest, IRMI, and EM are the base learners, and their predicted values are used as inputs in ridge regression, which acts as a meta learner in the second layer. The RMSE, MAE, and Wasserstein distance metrics of the proposed frameworks are evaluated against those of the mean, median, XGBoost, EM, IRMI, KNN, MissForest, and SVM imputation methods using a simulation study and two real data applications. The simulation study considers different scenarios for missing rates and outliers. The study also investigates the impact of adding an outlier indicator on the performance of the different imputation methods. The proposed stacking configurations show better performance, under the simulation settings, than the competing methods in most scenarios. In addition, many existing imputation methods are further improved by including an outlier indicator variable.
Published
2025-10-20
How to Cite
Abdel-Fattah, M., Mohsen, M., & Mousa, A. (2025). On a new stacked ensemble framework for imputing missing data in the presence of outliers. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2894
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).