META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Liu, Zijian; Nguyen, Ta Duy; Nguyen, Thien Hang; Ene, Alina; Nguyen, Huy L.

Abstract:We study the application of variance reduction (VR) techniques to general non-convex stochastic optimization problems. In this setting, the recent work STORM [Cutkosky-Orabona '19] overcomes the drawback of having to compute gradients of "mega-batches" that earlier VR methods rely on. There, STORM utilizes recursive momentum to achieve the VR effect and is then later made fully adaptive in STORM+ [Levy et al., '21], where full-adaptivity removes the requirement for obtaining certain problem-specific parameters such as the smoothness of the objective and bounds on the variance and norm of the stochastic gradients in order to set the step size. However, STORM+ crucially relies on the assumption that the function values are bounded, excluding a large class of useful functions. In this work, we propose META-STORM, a generalized framework of STORM+ that removes this bounded function values assumption while still attaining the optimal convergence rate for non-convex optimization. META-STORM not only maintains full-adaptivity, removing the need to obtain problem specific parameters, but also improves the convergence rate's dependency on the problem parameters. Furthermore, META-STORM can utilize a large range of parameter settings that subsumes previous methods allowing for more flexibility in a wider range of settings. Finally, we demonstrate the effectiveness of META-STORM through experiments across common deep learning tasks. Our algorithm improves upon the previous work STORM+ and is competitive with widely used algorithms after the addition of per-coordinate update and exponential moving average heuristics.

Subjects:	Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2209.14853 [cs.LG]
	(or arXiv:2209.14853v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2209.14853

Computer Science > Machine Learning

Title:META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators