The effect of the missing rate and its mechanism on the performance of the imputation methods on different real data sets
Keywords:
Imputation Methods, Missing Data, Multiple Imputation, Multiple Imputation by Chained Equations, Incomplete Data, K-Nearest Neighbor imputation, Random Forest; Single Imputation
Abstract
The purpose of this paper is to explore the mechanisms of data missingness and evaluate various imputation techniques used to handle missing data. Missing data is a common issue in data analysis, and its treatment is crucial for accurate modeling and analysis. This paper assesses prevalent imputation methods, including mean imputation, median imputation, K-Nearest Neighbor imputation (KNN), Classification and Regression Trees (CART), and Random Forest (RF). These techniques were chosen for their widespread use and varying levels of complexity and accuracy. Simple methods like mean and median imputation are computationally efficient but may introduce bias, especially when the missingness is not random. In contrast, more advanced methods like KNN, CART, and RF offer better handling of complex missingness patterns by considering relationships among variables. This paper aims to provide guidance for data scientists and analysts in selecting the most appropriate imputation methods based on their data characteristics and analysis objectives. By understanding the strengths and weaknesses of each technique, practitioners can improve the quality and reliability of their analyses.
Published
2025-09-02
How to Cite
Saber, M. M., Sara Javadi, Mehrdad Taghipour, Mohamed S. Hamed, Abdussalam Aljadani, Mahmoud M. Mansour, & Haitham M. Yousof. (2025). The effect of the missing rate and its mechanism on the performance of the imputation methods on different real data sets. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-2294
Issue
Section
Research Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).