A Modified Inexact SARAH Algorithm with Stabilized Barzilai-Borwein Step-Size in Machine learning
Abstract
The Inexact SARAH (iSARAH) algorithm as a variant of SARAH algorithm, which does not require computation of the exact gradient, can be applied to solving general expectation minimization problems rather than only finite sum problems. The performance of iSARAH algorithm is frequently affected by the step size selection, and how to choose an appropriate step size is still a worthwhile problem for study. In this paper, we propose to use the stabilized Barzilai-Borwein (SBB) method to automatically compute step size for iSARAH algorithm, which leads to a new algorithm called iSARAH-SBB. By introducing this adaptive step size in the design of the new algorithm, iSARAH-SBB can take better advantages of both iSARAH and SBB methods. We analyse the convergence rate and complexity of the modified algorithm under the usual assumptions. Numerical experimental results on standard data sets demonstrate the feasibility and effectiveness of our proposed algorithm.References
H. Robbins, S. Monro, A stochastic approximation method , Ann Math Statist, vol. 3, no. 22, pp. 400–407, 1951.
F. Ding, H.Z. Yang, F. Liu, Performance analysis of stochastic gradient algorithms under weak conditions , Sci China Ser.F-Inf.Sci, vol. 9, no. 51, pp. 1269–1280, 2008.
L. Bottou, F.E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning , SIAM Rev ,vol. 2, no. 60, pp. 223–311,2016.
N. Le Roux, M. Schmidt, F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets , in: Proceedings of 26th Conference on Neural Information Processing Systems, pp. 2663–2671,2012.
E. Moulines, F.R.Bach, Non-asymptotic analysis of stochastic approximation algorithms for machine learning , in: Advances in Neural Information Processing Systems, pp. 451–459,2011.
M. Schmit, N. Le Roux, F, Bach, Minimizing finite sums with the stochastic average gradient , Math.Program, vol. 1, no. 162, pp. 83–112,2017.
A. Defazio, F. Bach, S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in:Proceedings of Neural Information Processing Systems. Montreal:Curran Associates, pp.1646-1654, 2014.
R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in NIPS, pp.315-323, 2013.
J. Konecˇny´ , P. Richt´arik, Semi-stochastic gradient descent methods, Front Appl Math Stat, pp. 3–9, 2017.
L.M. Nguyen, J. Liu, K. Scheinberg, M. Tak´acˇ, Stochastic recursive gradient algorithm for nonconvex optimization, 2017,arXiv:1705.07261.
L. Nguyen, J. Liu, K. Scheinberg, M. Takac, SARAH: A novel method for machine learning problems using stochastic recursive gradient, ICML, pp. 2613–2621, 2017.
L.M. Nguyen, K. Scheinberg, M. Takac, Inexact SARAH algorithm for stochastic optimization, Optim Method Softw, 36, 237-258 (2020).
L. Bottou, Online learning and stochastic approximations, Online Learn , Neural Netw, no. 17, pp. 9–42, 1998.
J.C. Duchi, E. Hazan, Y.J, Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research. no. 12, pp.2121–2159, 2011.
D.P. Kingma, J. Ba, Adam:a method for stochastic optimization, in: International Conference on Learning Representations, pp.1-13, 2015.
J. Barzilai, J.M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal, vol. 1, no. 8, pp. 141–148, 1988.
Y.H. Dai, L.Z. Liao, R-linear convergence of the Barzilai and Borwein gradient method, IMA Journal of Numerical Analysis, 22(1),1-10(2002)
R. Fletcher, On the Barzilai-Borwein method, in Optimization and Control with Applications , L.Q. Qi, K.L. Teo, X.Q. Yang, and D.W. Hearn, eds., Applied Optimization, Springer, Amsterdam, vol. 96, pp. 235-256, 2005.
Sopya, P.K. Drozda, Stochastic gradient descent with barzilai-borwein update step for svm , Information Sciences,vol.316,pp.218–233, 2015.
C. Tan, S. Ma, Y.H. Dai, Y. Qian, Barzilai-Borwein step size for stochastic gradient descent , in:Neural Information
Processing Systems, pp.685–693, 2016.
B.C. Li, G.B, Giannakis, Adaptive Step Sizes in Variance Reduction via Regularization , 2019, Available at arXiv:1910.06532.
Y. Liu, X. Wang, T.D. Guo, A linearly convergent stochastic recursivegradient method for convex optimization, Optim. Lett. A,2020. doi:10.1007/s11590-020-01550-x.
Z. Yang, C. Wang, Z.M. Zhang, J. Li, Random Barzilai-Borwein step size for mini-batch algorithms , Engineering Applications of Artificial Intelligence. no. 72, pp.124–135, 2018.
Z. Yang, Z.P. Chen, C. Wang, Accelerating Mini-batch SARAH by Step Size Rules , Information Sciences. vol. 558, no. 1, pp.157–173, 2021.
K. Ma, J. Zeng, J.Xiong, Q. Xu, X. Cao, W. Liu, Y. Yao, Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein step size , in: AAAI Conference on Artificial Intelligence, 2018.
G.M. Shao, W. Xue, G.H. YU, Improved SVRG for finite sum structure optimization with application to binary classification , J. Ind. Manag. Optim. vol. 13, no. 5, pp. 1–14, 2019.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).