A Modified Inexact SARAH Algorithm with Stabilized Barzilai-Borwein Step-Size in Machine learning

Fusheng Wang; Yi-ming  Yang; Xiaotong Li; Ovanes Petrosian

doi:10.19139/soic-2310-5070-1712

Fusheng Wang Taiyuan Normal University, China
Yi-ming Yang Xiangtan University
Xiaotong Li
Ovanes Petrosian

DOI: https://doi.org/10.19139/soic-2310-5070-1712

Keywords: Stochastic gradient algorithms, Mini-batches, Stochastic optimization, BB method

Abstract

The Inexact SARAH (iSARAH) algorithm as a variant of SARAH algorithm, which does not require computation of the exact gradient, can be applied to solving general expectation minimization problems rather than only finite sum problems. The performance of iSARAH algorithm is frequently affected by the step size selection, and how to choose an appropriate step size is still a worthwhile problem for study. In this paper, we propose to use the stabilized Barzilai-Borwein (SBB) method to automatically compute step size for iSARAH algorithm, which leads to a new algorithm called iSARAH-SBB. By introducing this adaptive step size in the design of the new algorithm, iSARAH-SBB can take better advantages of both iSARAH and SBB methods. We analyse the convergence rate and complexity of the modified algorithm under the usual assumptions. Numerical experimental results on standard data sets demonstrate the feasibility and effectiveness of our proposed algorithm.

References

H. Robbins, S. Monro, A stochastic approximation method , Ann Math Statist, vol. 3, no. 22, pp. 400–407, 1951.

F. Ding, H.Z. Yang, F. Liu, Performance analysis of stochastic gradient algorithms under weak conditions , Sci China Ser.F-Inf.Sci, vol. 9, no. 51, pp. 1269–1280, 2008.

L. Bottou, F.E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning , SIAM Rev ,vol. 2, no. 60, pp. 223–311,2016.

N. Le Roux, M. Schmidt, F. Bach, A stochastic gradient method with an exponential convergence rate for finite training sets , in: Proceedings of 26th Conference on Neural Information Processing Systems, pp. 2663–2671,2012.

E. Moulines, F.R.Bach, Non-asymptotic analysis of stochastic approximation algorithms for machine learning , in: Advances in Neural Information Processing Systems, pp. 451–459,2011.

M. Schmit, N. Le Roux, F, Bach, Minimizing finite sums with the stochastic average gradient , Math.Program, vol. 1, no. 162, pp. 83–112,2017.

A. Defazio, F. Bach, S. Lacoste-Julien, SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives, in:Proceedings of Neural Information Processing Systems. Montreal:Curran Associates, pp.1646-1654, 2014.

R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in NIPS, pp.315-323, 2013.

J. Konecˇny´ , P. Richt´arik, Semi-stochastic gradient descent methods, Front Appl Math Stat, pp. 3–9, 2017.

L.M. Nguyen, J. Liu, K. Scheinberg, M. Tak´acˇ, Stochastic recursive gradient algorithm for nonconvex optimization, 2017,arXiv:1705.07261.

L. Nguyen, J. Liu, K. Scheinberg, M. Takac, SARAH: A novel method for machine learning problems using stochastic recursive gradient, ICML, pp. 2613–2621, 2017.

L.M. Nguyen, K. Scheinberg, M. Takac, Inexact SARAH algorithm for stochastic optimization, Optim Method Softw, 36, 237-258 (2020).

L. Bottou, Online learning and stochastic approximations, Online Learn , Neural Netw, no. 17, pp. 9–42, 1998.

J.C. Duchi, E. Hazan, Y.J, Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research. no. 12, pp.2121–2159, 2011.

D.P. Kingma, J. Ba, Adam:a method for stochastic optimization, in: International Conference on Learning Representations, pp.1-13, 2015.

J. Barzilai, J.M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal, vol. 1, no. 8, pp. 141–148, 1988.

Y.H. Dai, L.Z. Liao, R-linear convergence of the Barzilai and Borwein gradient method, IMA Journal of Numerical Analysis, 22(1),1-10(2002)

R. Fletcher, On the Barzilai-Borwein method, in Optimization and Control with Applications , L.Q. Qi, K.L. Teo, X.Q. Yang, and D.W. Hearn, eds., Applied Optimization, Springer, Amsterdam, vol. 96, pp. 235-256, 2005.

Sopya, P.K. Drozda, Stochastic gradient descent with barzilai-borwein update step for svm , Information Sciences,vol.316,pp.218–233, 2015.

C. Tan, S. Ma, Y.H. Dai, Y. Qian, Barzilai-Borwein step size for stochastic gradient descent , in:Neural Information

Processing Systems, pp.685–693, 2016.

B.C. Li, G.B, Giannakis, Adaptive Step Sizes in Variance Reduction via Regularization , 2019, Available at arXiv:1910.06532.

Y. Liu, X. Wang, T.D. Guo, A linearly convergent stochastic recursivegradient method for convex optimization, Optim. Lett. A,2020. doi:10.1007/s11590-020-01550-x.

Z. Yang, C. Wang, Z.M. Zhang, J. Li, Random Barzilai-Borwein step size for mini-batch algorithms , Engineering Applications of Artificial Intelligence. no. 72, pp.124–135, 2018.

Z. Yang, Z.P. Chen, C. Wang, Accelerating Mini-batch SARAH by Step Size Rules , Information Sciences. vol. 558, no. 1, pp.157–173, 2021.

K. Ma, J. Zeng, J.Xiong, Q. Xu, X. Cao, W. Liu, Y. Yao, Stochastic Non-convex Ordinal Embedding with Stabilized Barzilai-Borwein step size , in: AAAI Conference on Artificial Intelligence, 2018.

G.M. Shao, W. Xue, G.H. YU, Improved SVRG for finite sum structure optimization with application to binary classification , J. Ind. Manag. Optim. vol. 13, no. 5, pp. 1–14, 2019.