Skip to main content

Showing 1–8 of 8 results for author: Sreenivasan, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20541  [pdf, other

    cs.LG cs.CL

    Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

    Authors: Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L. Leavitt, Mansheej Paul

    Abstract: In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2307.05906  [pdf, other

    cs.LG

    Mini-Batch Optimization of Contrastive Loss

    Authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

    Abstract: Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views of the same object) are similar, while embeddings of negative pairs are dissimilar. Practical constraints such as large memory requirements make it challenging t… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  3. arXiv:2307.03381  [pdf, other

    cs.LG

    Teaching Arithmetic to Small Transformers

    Authors: Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as add… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  4. arXiv:2305.18869  [pdf, other

    cs.LG cs.AI cs.CL

    Dissecting Chain-of-Thought: Compositionality through In-Context Filtering and Learning

    Authors: Yingcong Li, Kartik Sreenivasan, Angeliki Giannou, Dimitris Papailiopoulos, Samet Oymak

    Abstract: Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositi… ▽ More

    Submitted 7 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted for NeurIPS 2023. Changes in this version: refined title, restructured content, included new out-of-distribution experiments, and code now available

  5. arXiv:2202.12002  [pdf, other

    cs.LG cs.AI cs.CV

    Rare Gems: Finding Lottery Tickets at Initialization

    Authors: Kartik Sreenivasan, Jy-yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Eric Xing, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle e… ▽ More

    Submitted 2 June, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

  6. arXiv:2110.08996  [pdf, other

    cs.LG cs.AI

    Finding Everything within Random Binary Networks

    Authors: Kartik Sreenivasan, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos

    Abstract: A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up line of theoretical work provides justification of these findings by proving that slightly overparameterized neural networks, with commonly used cont… ▽ More

    Submitted 22 October, 2021; v1 submitted 17 October, 2021; originally announced October 2021.

  7. arXiv:2106.07724  [pdf, other

    cs.LG cs.IT stat.ML

    An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

    Authors: Shashank Rajput, Kartik Sreenivasan, Dimitris Papailiopoulos, Amin Karbasi

    Abstract: It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $\widetilde{\mathcal{O}}(e^{1/δ^2}+\sqrt{n})$ neurons and… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  8. arXiv:2007.05084  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

    Authors: Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, Dimitris Papailiopoulos

    Abstract: Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is cur… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.