The effect of the missing rate and its mechanism on the performance of the imputation methods on different real data sets

  • Mohammad Mehdi Saber Higher Education Center of Eghlid
  • Sara Javadi
  • Mehrdad Taghipour
  • Mohamed S. Hamed
  • Abdussalam Aljadani
  • Mahmoud M. Mansour
  • Haitham M. Yousof
Keywords: Imputation Methods, Missing Data, Multiple Imputation, Multiple Imputation by Chained Equations, Incomplete Data, K-Nearest Neighbor imputation, Random Forest; Single Imputation

Abstract

The purpose of this paper is to explore the mechanisms of data missingness and evaluate various imputation techniques used to handle missing data. Missing data is a common issue in data analysis, and its treatment is crucial for accurate modeling and analysis. This paper assesses prevalent imputation methods, including mean imputation, median imputation, K-Nearest Neighbor imputation (KNN), Classification and Regression Trees (CART), and Random Forest (RF). These techniques were chosen for their widespread use and varying levels of complexity and accuracy. Simple methods like mean and median imputation are computationally efficient but may introduce bias, especially when the missingness is not random. In contrast, more advanced methods like KNN, CART, and RF offer better handling of complex missingness patterns by considering relationships among variables. This paper aims to provide guidance for data scientists and analysts in selecting the most appropriate imputation methods based on their data characteristics and analysis objectives. By understanding the strengths and weaknesses of each technique, practitioners can improve the quality and reliability of their analyses.
Published
2025-09-02
How to Cite
Saber, M. M., Sara Javadi, Mehrdad Taghipour, Mohamed S. Hamed, Abdussalam Aljadani, Mahmoud M. Mansour, & Haitham M. Yousof. (2025). The effect of the missing rate and its mechanism on the performance of the imputation methods on different real data sets. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2294
Section
Research Articles

Most read articles by the same author(s)