Improved Mean Methods of Imputation
Abstract
Replacing missing values of a variable with the mean of the non-missing values is a simple and natural way to impute values fortunately in the case where data is missing completely at random. Following a short review of this method we consider thus possible improvements, are called the shrinkage method, a second called the weighted interval method, and a third called the known variance method. Estimates of the population mean obtained from each of these methods are compared to the mean method both analytically and by means of numerical examples.References
Cochran, W.G. (1963). Sampling Techniques. John Wiley and Sons: New York.
Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm (with Discussion). Journal of the Royal Statistical Society, B, 39(1), 1-38.
Hansen, M.H. and Hurwitz, W.N. (1946). The problem of non-response in sample surveys. J . Amer. Statist. Assoc., 41,517–529.
Heitjan,D.F.andBasuS(1996).DistinguishingMissingAtRandomandMissingCompletelyAtRandom. TheAmericanStatistician,50, 207-213.
Johnson, Richard A., and Dean W. Wichern (1982). Applied Multivariate Statistical Analysis, pages 209-213: Prentice Hall Inc. Englewood Cliffs,.N.J.
Kataria, P. and Singh, S. (1989).On the estimation of mean when population variance is known. J. Indian Soc. Agri. Statist., 41(2), 173-175.
Mohamed, C. (2015). Improved Imputation Methods in Survey Sampling. Unpublished MS thesis submitted to the Department of Mathematics, Texas A&M University-Kingsville, TX.
Mohamed, C., Sedory, S.A. and Singh, S. (2016). Comparison of different imputing methods for scrambled responses. Handbook of Statistics: Data Gathering Analysis and Protection of Privacy Through Randomized Response Techniques: Qualitative and Quantitative Human Traits, 34, 471-495.
Mohamed, C., Sedory, S.A. and Singh, S. (2017). Imputation using higher order moments of an auxiliary variable.Communications in Statistics: Simulation and Computations, 46(8), 6588-6617.
Mohamed, C., Sedory, S.A. and Singh, S. (2018). A fresh imputing survey methodology using sensible constraints on study and auxiliary variables: dubious random non-response. Journal of Statistical Computations and Simulations, 88:7, 1273-1294.
Rubin, D.B. (1976). Inference and missing data. Biometrika, 63(3), 581 -592
Searls, D.T. (1964). The utilization of a known coefficient of variation in the estimation procedure. J. Amer. Statist. Assoc., 59, 1225–1226.
Searls, D.T. (1967). A note on the use of an approximately known co-efficient of variation. American Statistician, 21(2), 20–21.
Singh, S., Mangat, N.S., and Mahajan, P.K. (1995). General class of estimators. J. Indian Soc. of Agricul. Statist. 47, 129-133.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).