Variable Selection in Count Data Regression Model based on Firefly Algorithm
Abstract
Variable selection is a very helpful procedure for improving computational speed and prediction accuracy by identifying the most important variables that related to the response variable. Count data regression modeling has received much attention in several science fields in which the Poisson and negative binomial regression models are the most basic models. Firefly algorithm is one of the recently efficient proposed nature-inspired algorithms that can efficiently be employed for variable selection. In this work, firefly algorithm is proposed to perform variable selection for count data regression models. Extensive simulation studies and two real data applications are conducted to evaluate the performance of the proposed method in terms of prediction accuracy and variable selection criteria. Further, its performance is compared with other methods. The results proved the efficiency of our proposed methods and it outperforms other popular methods.References
Z. Y. Algamal, Diagnostic in Poisson regression models, Electronic Journal of Applied Statistical Analysis, vol. 5, pp. 178–186,2012.
Y. Asar, and A. Gen?, A New Two-Parameter Estimator for the Poisson Regression Model, Iranian Journal of Science and Technology, Transactions A: Science, vol. 42, pp. 793–803, 2017.
S. Coxe, S. G. West, and L. S. Aiken, The analysis of count data: a gentle introduction to poisson regression and its alternatives, J Pers Assess, vol. 91, pp. 121–36, 2009.
Z. Wang, S. Ma, M. Zappitelli, C. Parikh, C. Y. Wang, and P. Devarajan, Penalized count data regression with application to hospital stay after pediatric cardiac surgery, Stat. Meth. Med. Res., In press., 2014.
A. C. Cameron, and P. K. Trivedi, Regression analysis of count data Cambridge university press, 2013.
J. M. Hilbe, Modeling count data Cambridge University Press, 2014.
Z. Y. Algamal, and M. H. Lee, Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification, Expert Systems with Applications, vol. 42, pp. 9326-9332, 2015.
R. Tibshirani,Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 58, pp. 267–288, 1996.
J. Fan, and R. Li,Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, vol. 96, pp. 1348–1360, 2001.
H. Zou, and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B(Statistical Methodology), vol. 67, pp. 301–320, 2005.
H. Zou,The adaptive lasso and its oracle properties, Journal of the American Statistical Association, vol. 101, pp. 1418–1429,2006.
G. I. Sayed, A. E. Hassanien, and A. T. Azar, Feature selection via a novel chaotic crow search algorithm, Neural Computing and Applications, 2017.
R. Sindhu, R. Ngadiran, Y. M. Yacob, N. A. H. Zahri, and M. Hariharan, SineCcosine algorithm for feature selection with elitism strategy and new updating mechanism, Neural Computing and Applications, vol. 28, pp. 2947–2958, 2017.
D. Broadhurst, R. Goodacre, A. Jones, J. J. Rowland, and D. B.Kell,
Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry, Analytica Chimica Acta, vol. 348, pp.71–86, 1997.
Z. Drezner, G. A. Marcoulides, and S. Salhi,Tabu search model selection in multiple regression analysis, Communications in Statistics - Simulation and Computation, vol. 28, pp. 349–367, 1999.
H. ?rkc, Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms, Applied
Mathematics and Computation, vol. 219, pp. 11018–11028, 2013.
M. J. Brusco,A comparison of simulated annealing algorithms for variable selection in principal component analysis and discriminant analysis, Computational Statistics, vol. 77, pp. 38–53, 2014.
E. Dnder, S. Gm?tekin, N. Murat, and M. A. Cengiz,Variable selection in linear regression analysis with alternative Bayesian
information criteria using differential evaluation algorithm, Communications in Statistics - Simulation and Computation, vol. 47, pp. 605–614, 2017.
J. Pacheco, S. Casado, and L. N?ez, A variable selection method based on Tabu search for logistic regression models, European Journal of Operational Research, vol. 199, pp. 506–511, 2009.
A. Unler, and A. Murat,A discrete particle swarm optimization method for feature selection in binary classification problems, European Journal of Operational Research, vol. 206, pp. 528–539, 2010.
H. Ko?, E. Dnder, S. Gm?tekin, T. Ko?, and M. A. Cengiz,Particle swarm optimization-based variable selection in Poisson regression analysis via information complexity-type criteria, Communications in Statistics - Theory and Methods, pp. 1–9, 2017.
T. J. Massaro, and H. Bozdogan, Variable subset selection via GA and information complexity in mixtures of Poisson and negative binomial regression models, arXiv preprint arXiv:1505.05229, 2015.
E. Dunder, S. Gumustekin, and M. A. Cengiz, Variable selection in gamma regression models via artificial bee colony algorithm,Journal of Applied Statistics, vol. 45, pp. 8–16, 2016.
J. M. Hilbe, Negative binomial regression Cambridge University Press, 2011.
Y. Asar, Liu-type negative binomial regression: A comparison of recent estimators and applications, In Trends and Perspectives in Linear Statistical Inference, Cham, 2018, pp. 23–39.
X.-S. Yang, Multiobjective firefly algorithm for continuous optimization, Engineering with Computers, vol. 29, pp. 175–184,2013.
S. Yu, S. Zhu, Y. Ma, and D. Mao, Enhancing firefly algorithm using generalized opposition-based learning, Computing, vol. 97,pp. 741–754, 2015.
J. Zhang, B. Gao, H. Chai, Z. Ma, and G. Yang, Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm, BMC Bioinformatics, vol. 17, pp. 323–337, 2016.
Z. Y. Algamal, and M. H. Lee, Adjusted adaptive lasso in high -dimensional Poisson regression model, Modern Applied Science,vol. 9, pp. 170–176, 2015.
J. S. Long, The origins of sex differences in science, Social forces, vol. 68, pp. 1297–1316, 1990.
Z. Y. Algamal, and M. H. Lee,A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification, Advances in Data Analysis and Classification, Accepted, 2018.
Z. Y. Algamal, Developing a ridge estimator for the gamma regression model, Journal of Chemometrics, Vol. 32, pp. 1–12, 2018.
O. S. Qasim, and Z. Y. Algamal,Feature selection using particle swarm optimization-based logistic regression model,Chemometrics and Intelligent Laboratory Systems, Vol. 182, pp. 41–46, 2018.
M. M. Alanaz, and Z. Y. Algamal,Proposed methods in estimating the ridge regression parameter in Poisson regression model,Electronic Journal of Applied Statistical Analysis, Vol. 11, pp. 506–515, 2018.
M. Kazemi, D. Shahsavani, and M. Arashi, Variable selection and structure identification for ultrahigh-dimensional partially linear additive models with application to cardiomyopathy microarray data, Statistics, Optimization & Information Computing, Vol. 6, pp.373C-382, 2018.
E. AVCI, Flexiblity of Using Com-Poisson Regression Model for Count Data, Statistics, Optimization & Information Computing,Vol. 6, pp. 278C-285, 2018.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).