New Algorithms and Software for Significance Controlled Variable Selection
Abstract
Stepwise regression algorithms have been widely used for a variety of applications and continue to be a fundamental tool in variable selection. Most functions available in statistical software packages deliver models that may contain insignificant predictors because of the criterion of the optimization at each step. Here we introduce an R package that provides the user with several measures of the prospective model at each step of the algorithm. These prospective models are checked with multiple testing p-value corrections such as Bonferroni and False Discovery Rate and hence the algorithm's final model includes only predictors that have their significance controlled by the choice of correction type and alpha level. Moreover, the steps forward or backward can have an entry or drop criterion that is a combination of the p-values of prospective models. We illustrate the functionality of the package with examples and simulations.References
Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, 1995.
Yoav Benjamini and Daniel Yekutieli. The control of the false discovery rate in multiple testing under dependency.
Annals of statistics, pages 1165–1188, 2001.
N. Draper and H. Smith. Applied regression analysis. John Wiley & Sons, New York, 1966.
M. Efroymson. Stepwise regression: a backward and forward look. Eastern Regional Meetings of the Institute of
Mathematical Statistics, 1966.
Y. Hochberg. A sharper bonferroni procedure for multiple tests of significance. Biometrika, 75:800–803, 1988.
Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2):65–70,
ISSN 03036898, 14679469.
G. Hommel. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75(2):
–386, 06 1988. ISSN 0006-3444.
Cho-Ying Huang, Hsin-Lin Wei, Jiann-Yeou Rau, and Jyun-Ping Jhan. Use of principal components of uav-acquired
narrow-band multispectral imagery to map the diverse low stature vegetation fapar. GIScience & Remote Sensing, 56
(4):605–623, 2019.
A. B. Imran, K. Khan, N. Ali, N. Ahmad, A. Ali, and K. Shah. Narrow band based and broadband derived vegetation indices using sentinel-2 imagery to estimate vegetation biomass. Global Journal of Environmental Science and Management, 6:97–108, 2020.
Josely Correa Koury, Maria Almeida Ribeiro, Fabia Albernaz Massarani, Filomena Vieira, and Elisabetta Marini. Fatfree mass in adolescent athletes: Accuracy of bioimpedance equations and identification of new predictive equations. Nutrition, 60:59 – 65, 2019. ISSN 0899-9007.
Brett Lantz. Machine Learning with R. Packt Publishing, Birmingham, Mumbai, 2013.
James W. Longley. An appraisal of least-squares programs from the point of view of the user. Journal of the American Statistical Association, 62: 819 – 841, 1967.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing,
Vienna, Austria, 2021. URL https://www.R-project.org/.
S. Sarvepalli, C.A. Burke, M. Monachese, R. Lopez, B.H. Leach, L. Laguardia, M. O’Malley, M.F. Kalady, and J.M.
Church. Web-based model for predicting time to surgery in young patients with familial adenomatous polyposis: An
internally validated study. American Journal of Gastroenterology, 113:1881 – 1890, 2018.
S. Walter and H. Tiemeier. Variable selection: current practice in epidemiological studies. European Journal of
Epidemiololy, 24:733–736, 2009.
Adriano Zambom and Jongwook Kim. Consistent significance controlled variable selection in high-dimensional
regression. STAT, 7, 2018.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).