Showing 1–1 of 1 results for author: Xiao, K L
-
A Clipped Trip: the Dynamics of SGD with Gradient Clip** in High-Dimensions
Authors:
Noah Marshall,
Ke Liang Xiao,
Atish Agarwala,
Elliot Paquette
Abstract:
The success of modern machine learning is due in part to the adaptive optimization methods that have been developed to deal with the difficulties of training large models over complex datasets. One such method is gradient clip**: a practical procedure with limited theoretical underpinnings. In this work, we study clip** in a least squares problem under streaming SGD. We develop a theoretical a…
▽ More
The success of modern machine learning is due in part to the adaptive optimization methods that have been developed to deal with the difficulties of training large models over complex datasets. One such method is gradient clip**: a practical procedure with limited theoretical underpinnings. In this work, we study clip** in a least squares problem under streaming SGD. We develop a theoretical analysis of the learning dynamics in the limit of large intrinsic dimension-a model and dataset dependent notion of dimensionality. In this limit we find a deterministic equation that describes the evolution of the loss. We show that with Gaussian noise clip** cannot improve SGD performance. Yet, in other noisy settings, clip** can provide benefits with tuning of the clip** threshold. In these cases, clip** biases updates in a way beneficial to training which cannot be recovered by SGD under any schedule. We conclude with a discussion about the links between high-dimensional clip** and neural network training.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.