Skip to main content

Showing 1–16 of 16 results for author: Saunshi, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02469  [pdf, other

    cs.LG cs.CL

    Landscape-Aware Growing: The Power of a Little LAG

    Authors: Stefani Karp, Nikunj Saunshi, Sobhan Miryoosefi, Sashank J. Reddi, Sanjiv Kumar

    Abstract: Recently, there has been increasing interest in efficient pretraining paradigms for training Transformer-based models. Several recent approaches use smaller models to initialize larger models in order to save computation (e.g., stacking and fusion). In this work, we study the fundamental question of how to select the best growing strategy from a given pool of growing strategies. Prior works have e… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2402.05913  [pdf, other

    cs.CL cs.LG

    Efficient Stagewise Pretraining via Progressive Subnetworks

    Authors: Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

    Abstract: Recent developments in large language models have sparked interest in efficient pretraining methods. A recent effective paradigm is to perform stage-wise training, where the size of the model is gradually increased over the course of training (e.g. gradual stacking (Reddi et al., 2023)). While the resource and wall-time savings are appealing, it has limitations, particularly the inability to evalu… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2308.01906  [pdf, other

    cs.CL cs.AI cs.LG

    Reasoning in Large Language Models Through Symbolic Math Word Problems

    Authors: Vedant Gaur, Nikunj Saunshi

    Abstract: Large language models (LLMs) have revolutionized NLP by solving downstream tasks with little to no labeled data. Despite their versatile abilities, the larger question of their ability to reason remains ill-understood. This paper addresses reasoning in math word problems (MWPs) by studying symbolic versions of the numeric problems, since a symbolic expression is a "concise explanation" of the nume… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted at the Findings of ACL 2023

  4. arXiv:2302.06600  [pdf, other

    cs.CL cs.LG

    Task-Specific Skill Localization in Fine-tuned Language Models

    Authors: Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

    Abstract: Pre-trained language models can be fine-tuned to solve diverse NLP tasks, including in few-shot settings. Thus fine-tuning allows the model to quickly pick up task-specific ``skills,'' but there has been limited study of where these newly-learnt skills reside inside the massive model. This paper introduces the term skill localization for this problem and proposes a solution. Given the downstream t… ▽ More

    Submitted 1 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Accepted at 40th International Conference on Machine Learning (ICML 2023)

  5. arXiv:2211.02912  [pdf, other

    stat.ML cs.LG

    New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

    Authors: Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora

    Abstract: Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net. Evaluations of saliency methods convert this heat map into a new {\em masked input} by retaining the $k$ highest-ranked pixels of the original input and replacing the rest with \textquotedblleft uninformative\textquotedblright\ pixels, and checking if th… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022 (Oral)

  6. arXiv:2210.01072  [pdf, other

    cs.LG cs.AI

    Understanding Influence Functions and Datamodels via Harmonic Analysis

    Authors: Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora

    Abstract: Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017]. They have been used for detecting data poisoning, detecting helpful and harmful examples, influence of groups of datapoints, etc. Recently, Ilyas et al. [2022] introduced a linear regression method they termed datamodels to predict the ef… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  7. arXiv:2202.14037  [pdf, other

    cs.LG cs.AI

    Understanding Contrastive Learning Requires Incorporating Inductive Biases

    Authors: Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

    Abstract: Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Recent attempts to theoretically explain the success of contrastive learning on downstream classification tasks prove guarantees depending on properties of {\em augmentations} and the value of… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  8. arXiv:2111.14212  [pdf, other

    cs.LG

    On Predicting Generalization using GANs

    Authors: Yi Zhang, Arushi Gupta, Nikunj Saunshi, Sanjeev Arora

    Abstract: Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters. While generalization bounds can give many insights about architecture design, training algorithms, etc., what they do not currently do is yield good predictions for actual test error. A recently introduced Predicting Generalization in Deep Learnin… ▽ More

    Submitted 17 March, 2022; v1 submitted 28 November, 2021; originally announced November 2021.

  9. arXiv:2106.15615  [pdf, other

    cs.LG cs.AI

    A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning

    Authors: Nikunj Saunshi, Arushi Gupta, Wei Hu

    Abstract: An effective approach in meta-learning is to utilize multiple "train tasks" to learn a good initialization for model parameters that can help solve unseen "test tasks" with very few samples by fine-tuning from this initialization. Although successful in practice, theoretical understanding of such methods is limited. This work studies an important aspect of these methods: splitting the data from ea… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: In proceedings of ICML 2021

  10. arXiv:2010.03648  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

    Authors: Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora

    Abstract: Autoregressive language models, pretrained using large text corpora to do well on next word prediction, have been successful at solving many downstream tasks, even with zero-shot usage. However, there is little theoretical understanding of this success. This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions… ▽ More

    Submitted 14 April, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: This version is the camera-ready version for ICLR 2021. Main changes include a detailed discussion about natural tasks, more detailed proof sketch and updated experimental evaluations

  11. arXiv:2008.01064  [pdf, other

    cs.LG stat.ML

    Predicting What You Already Know Helps: Provable Self-Supervised Learning

    Authors: Jason D. Lee, Qi Lei, Nikunj Saunshi, Jiacheng Zhuo

    Abstract: Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data to learn useful semantic representations. These pretext tasks are created solely using the input features, such as predicting a missing image patch, recovering the color channels of an image from context, or predicting missing words in text; yet predicting this \textit{… ▽ More

    Submitted 13 November, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: NeurIPS 2021

  12. arXiv:2002.11172  [pdf, other

    cs.LG math.OC stat.ML

    A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

    Authors: Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora

    Abstract: One popular trend in meta-learning is to learn from many training tasks a common initialization for a gradient-based method that can be used to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for convex models. This work shows that convex-case analysis… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

    Comments: 34 pages

  13. arXiv:2002.10544  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Representation Learning for Imitation Learning via Bi-level Optimization

    Authors: Sanjeev Arora, Simon S. Du, Sham Kakade, Yu** Luo, Nikunj Saunshi

    Abstract: A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available. We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: 26 pages

  14. arXiv:1902.09229  [pdf, other

    cs.LG cs.AI stat.ML

    A Theoretical Analysis of Contrastive Unsupervised Representation Learning

    Authors: Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi

    Abstract: Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically "similar" data points and "negative samples," the learner forces the inner product of representations of sim… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: 19 pages, 5 figures

  15. arXiv:1805.05388  [pdf, other

    cs.CL cs.AI

    A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

    Authors: Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora

    Abstract: Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-li… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

    Comments: 11 pages, 2 figures, To appear in ACL 2018

  16. arXiv:1704.05579  [pdf, other

    cs.CL cs.AI cs.LG

    A Large Self-Annotated Corpus for Sarcasm

    Authors: Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli

    Abstract: We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection. The corpus has 1.3 million sarcastic statements -- 10 times more than any previous dataset -- and many times more instances of non-sarcastic statements, allowing for learning in both balanced and unbalanced label regimes. Each statement is further… ▽ More

    Submitted 22 March, 2018; v1 submitted 18 April, 2017; originally announced April 2017.

    Comments: 6 pages, 4 Figures. To Appear in LREC 2018