Skip to main content

Showing 1–22 of 22 results for author: Agarwala, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11733  [pdf, other

    stat.ML cs.LG

    A Clipped Trip: the Dynamics of SGD with Gradient Clip** in High-Dimensions

    Authors: Noah Marshall, Ke Liang Xiao, Atish Agarwala, Elliot Paquette

    Abstract: The success of modern machine learning is due in part to the adaptive optimization methods that have been developed to deal with the difficulties of training large models over complex datasets. One such method is gradient clip**: a practical procedure with limited theoretical underpinnings. In this work, we study clip** in a least squares problem under streaming SGD. We develop a theoretical a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2404.19261  [pdf, other

    cs.LG math.OC math.ST physics.data-an

    High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

    Authors: Atish Agarwala, Jeffrey Pennington

    Abstract: Recent empirical and theoretical work has shown that the dynamics of the large eigenvalues of the training loss Hessian have some remarkably robust features across models and datasets in the full batch regime. There is often an early period of progressive sharpening where the large eigenvalues increase, followed by stabilization at a predictable value known as the edge of stability. Previous work… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  3. arXiv:2402.05271  [pdf, other

    stat.ML cs.AI cs.LG

    Feature learning as alignment: a structural property of gradient descent in non-linear neural networks

    Authors: Daniel Beaglehole, Ioannis Mitliagkas, Atish Agarwala

    Abstract: Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as… ▽ More

    Submitted 24 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  4. arXiv:2401.10809  [pdf, other

    cs.LG

    Neglected Hessian component explains mysteries in Sharpness regularization

    Authors: Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

    Abstract: Recent work has shown that methods like SAM which either explicitly or implicitly penalize second order information can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We show that these differences can be explained by the structure of the Hessian of the loss. First, we show that a common decomposition… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

  5. arXiv:2312.00209  [pdf, other

    cs.LG cs.AI math.OC

    On the Interplay Between Stepsize Tuning and Progressive Sharpening

    Authors: Vincent Roulet, Atish Agarwala, Fabian Pedregosa

    Abstract: Recent empirical work has revealed an intriguing property of deep learning models by which the sharpness (largest eigenvalue of the Hessian) increases throughout optimization until it stabilizes around a critical value at which the optimizer operates at the edge of stability, given a fixed stepsize (Cohen et al, 2022). We investigate empirically how the sharpness evolves when using stepsize-tuners… ▽ More

    Submitted 29 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: Presented at the NeurIPS 2023 OPT Wokshop

  6. arXiv:2308.01976  [pdf, other

    cs.LG cs.AI cs.CL cs.IR

    Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces

    Authors: Dayananda Ubrangala, Juhi Sharma, Ravi Prasad Kondapalli, Kiran R, Amit Agarwala, Laurent Boué

    Abstract: Typographical errors are a major source of frustration for visitors of online marketplaces. Because of the domain-specific nature of these marketplaces and the very short queries users tend to search for, traditional spell cheking solutions do not perform well in correcting typos. We present a data augmentation method to address the lack of annotated typo data and train a recurrent neural network… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Journal ref: Microsoft Journal of Applied Research, Volume 19, 2023

  7. arXiv:2302.08692  [pdf, other

    cs.LG

    SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

    Authors: Atish Agarwala, Yann N. Dauphin

    Abstract: The Sharpness Aware Minimization (SAM) optimization algorithm has been shown to control large eigenvalues of the loss Hessian and provide generalization benefits in a variety of settings. The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong r… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  8. arXiv:2210.04860  [pdf, other

    cs.LG cs.AI math.OC

    Second-order regression models exhibit progressive sharpening to the edge of stability

    Authors: Atish Agarwala, Fabian Pedregosa, Jeffrey Pennington

    Abstract: Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant N… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  9. arXiv:2207.09432  [pdf, other

    cs.LG

    Deep equilibrium networks are sensitive to initialization statistics

    Authors: Atish Agarwala, Samuel S. Schoenholz

    Abstract: Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

  10. arXiv:2205.14929  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Neural Volumetric Object Selection

    Authors: Zhongzheng Ren, Aseem Agarwala, Bryan Russell, Alexander G. Schwing, Oliver Wang

    Abstract: We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF). Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views. To achieve this result, we propose a novel voxel fe… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: CVPR 2022 camera ready

  11. arXiv:2103.15261  [pdf, other

    cs.LG cs.AI stat.ML

    One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks

    Authors: Atish Agarwala, Abhimanyu Das, Brendan Juba, Rina Panigrahy, Vatsal Sharan, Xin Wang, Qiuyi Zhang

    Abstract: Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different? We investigate how the representations of the underlying tasks affect the ability of a single neural network to learn them jointly. We present theoretical and empirical findings that a single neural network is capable of simultaneously learning multiple tasks from a combined data set, for a vari… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

    Comments: 30 pages, 6 figures

  12. arXiv:2010.07344  [pdf, other

    cs.LG cs.AI

    Temperature check: theory and practice for training models with softmax-cross-entropy losses

    Authors: Atish Agarwala, Jeffrey Pennington, Yann Dauphin, Sam Schoenholz

    Abstract: The softmax function combined with a cross-entropy loss is a principled approach to modeling probability distributions that has become ubiquitous in deep learning. The softmax function is defined by a lone hyperparameter, the temperature, that is commonly set to one or regarded as a way to tune model confidence after training; however, less is known about how the temperature impacts training dynam… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

  13. arXiv:2005.07724  [pdf, other

    cs.LG stat.ML

    Learning the gravitational force law and other analytic functions

    Authors: Atish Agarwala, Abhimanyu Das, Rina Panigrahy, Qiuyi Zhang

    Abstract: Large neural network models have been successful in learning functions of importance in many branches of science, including physics, chemistry and biology. Recent theoretical work has shown explicit learning bounds for wide networks and kernel methods on some simple classes of functions, but not on more complex functions which arise in practice. We extend these techniques to provide learning bound… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  14. arXiv:2004.02132  [pdf, other

    cs.CV

    Deep Homography Estimation for Dynamic Scenes

    Authors: Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

    Abstract: Homography estimation is an important step in many computer vision problems. Recently, deep neural network methods have shown to be favorable for this problem when compared to traditional methods. However, these new methods do not consider dynamic content in input images. They train neural networks with only image pairs that can be perfectly aligned using homographies. This paper investigates and… ▽ More

    Submitted 5 April, 2020; originally announced April 2020.

    Comments: CVPR 2020, https://github.com/lcmhoang/hmg-dynamics

  15. arXiv:1811.11283  [pdf, other

    cs.CV cs.AI

    A Compact Embedding for Facial Expression Similarity

    Authors: Raviteja Vemulapalli, Aseem Agarwala

    Abstract: Most of the existing work on automatic facial expression analysis focuses on discrete emotion recognition, or facial action unit detection. However, facial expressions do not always fall neatly into pre-defined semantic categories. Also, the similarity between expressions measured in the action unit space need not correspond to how humans perceive expression similarity. Different from previous wor… ▽ More

    Submitted 9 January, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

  16. arXiv:1702.02463  [pdf, other

    cs.CV cs.GR cs.LG

    Video Frame Synthesis using Deep Voxel Flow

    Authors: Ziwei Liu, Raymond A. Yeh, Xiaoou Tang, Yiming Liu, Aseem Agarwala

    Abstract: We address the problem of synthesizing new video frames in an existing video, either in-between existing frames (interpolation), or subsequent to them (extrapolation). This problem is challenging because video appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation is challenging, while newer neural-network-based methods that hallucina… ▽ More

    Submitted 5 August, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

    Comments: To appear in ICCV 2017 as an oral paper. More details at the project page: https://liuziwei7.github.io/projects/VoxelFlow.html

  17. arXiv:1611.09961  [pdf, other

    cs.CV

    Semantic Facial Expression Editing using Autoencoded Flow

    Authors: Raymond Yeh, Ziwei Liu, Dan B Goldman, Aseem Agarwala

    Abstract: High-level manipulation of facial expressions in images --- such as changing a smile to a neutral expression --- is challenging because facial expression changes are highly non-linear, and vary depending on the appearance of the face. We present a fully automatic approach to editing faces that combines the advantages of flow-based face manipulation with the more recent generative capabilities of V… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

  18. arXiv:1507.07068  [pdf

    physics.comp-ph cond-mat.mtrl-sci cs.DC

    Performance metrics in a hybrid MPI-OpenMP based molecular dynamics simulation with short-range interactions

    Authors: Anirban Pal, Abhishek Agarwala, Soumyendu Raha, Baidurya Bhattacharya

    Abstract: We discuss the computational bottlenecks in molecular dynamics (MD) and describe the challenges in parallelizing the computation intensive tasks. We present a hybrid algorithm using MPI (Message Passing Interface) with OpenMP threads for parallelizing a generalized MD computation scheme for systems with short range interatomic interactions. The algorithm is discussed in the context of nanoindentat… ▽ More

    Submitted 25 July, 2015; originally announced July 2015.

    Journal ref: Journal of Parallel and Distributed Computing, Elsevier, vol. 74, no. 3, pp. 2203-2214, 2014

  19. arXiv:1507.03196  [pdf, other

    cs.CV

    DeepFont: Identify Your Font from An Image

    Authors: Zhangyang Wang, Jianchao Yang, Hailin **, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

    Abstract: As font is one of the core design concepts, automatic font identification and similar font suggestion from an image or photo has been on the wish list of many designers. We study the Visual Font Recognition (VFR) problem, and advance the state-of-the-art remarkably by develo** the DeepFont system. First of all, we build up the first available large-scale VFR dataset, named AdobeVFR, consisting o… ▽ More

    Submitted 12 July, 2015; originally announced July 2015.

    Comments: To Appear in ACM Multimedia as a full paper

  20. arXiv:1504.00028  [pdf, other

    cs.CV cs.LG

    Real-World Font Recognition Using Deep Network and Domain Adaptation

    Authors: Zhangyang Wang, Jianchao Yang, Hailin **, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

    Abstract: We address a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic domain gap caused poor generalization to new real data in previous methods (Chen et al. (2014)). In this paper, we refer to Convolutional Neural… ▽ More

    Submitted 31 March, 2015; originally announced April 2015.

  21. arXiv:1412.5758   

    cs.CV

    Decomposition-Based Domain Adaptation for Real-World Font Recognition

    Authors: Zhangyang Wang, Jianchao Yang, Hailin **, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, Thomas S. Huang

    Abstract: We present a domain adaption framework to address a domain mismatch between synthetic training and real-world testing data. We demonstrate our method on a challenging fine-grain classification problem: recognizing a font style from an image of text. In this task, it is very easy to generate lots of rendered font examples but very hard to obtain real-world labeled images. This real-to-synthetic dom… ▽ More

    Submitted 1 April, 2015; v1 submitted 18 December, 2014; originally announced December 2014.

    Comments: This paper has been withdrawn by the author due to project concerns

  22. Recognizing Image Style

    Authors: Sergey Karayev, Matthew Trentacoste, Helen Han, Aseem Agarwala, Trevor Darrell, Aaron Hertzmann, Holger Winnemoeller

    Abstract: The style of an image plays a significant role in how it is viewed, but style has received little attention in computer vision research. We describe an approach to predicting style of images, and perform a thorough evaluation of different image features for these tasks. We find that features learned in a multi-layer network generally perform best -- even when trained with object class (not style)… ▽ More

    Submitted 23 July, 2014; v1 submitted 14 November, 2013; originally announced November 2013.

    Journal ref: Proc. British Machine Vision Conference (BMVC) 2014