Skip to main content

Showing 1–12 of 12 results for author: Bhattamishra, S

.
  1. arXiv:2406.09347  [pdf, other

    cs.LG stat.ML

    Separations in the Representational Capabilities of Transformers and Recurrent Architectures

    Authors: Satwik Bhattamishra, Michael Hahn, Phil Blunsom, Varun Kanade

    Abstract: Transformer architectures have been widely adopted in foundation models. Due to their high inference costs, there is renewed interest in exploring the potential of efficient recurrent architectures (RNNs). In this paper, we analyze the differences in the representational capabilities of Transformers and RNNs across several tasks of practical relevance, including index lookup, nearest neighbor, rec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2310.11634  [pdf, other

    cs.CL

    MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations

    Authors: Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau

    Abstract: Humans possess a remarkable ability to assign novel interpretations to linguistic expressions, enabling them to learn new words and understand community-specific connotations. However, Large Language Models (LLMs) have a knowledge cutoff and are costly to finetune repeatedly. Therefore, it is crucial for LLMs to learn novel interpretations in-context. In this paper, we systematically analyse the a… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  3. arXiv:2310.03016  [pdf, other

    cs.LG cs.CL

    Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

    Authors: Satwik Bhattamishra, Arkil Patel, Phil Blunsom, Varun Kanade

    Abstract: In order to understand the in-context learning phenomenon, recent works have adopted a stylized experimental framework and demonstrated that Transformers can learn gradient-based learning algorithms for various classes of real-valued functions. However, the limitations of Transformers in implementing learning algorithms, and their ability to learn other forms of algorithms are not well understood.… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Preprint

  4. arXiv:2307.16795  [pdf, other

    cs.CL cs.AI cs.LG

    Structural Transfer Learning in NL-to-Bash Semantic Parsers

    Authors: Kyle Duffy, Satwik Bhattamishra, Phil Blunsom

    Abstract: Large-scale pre-training has made progress in many fields of natural language processing, though little is understood about the design of pre-training datasets. We propose a methodology for obtaining a quantitative understanding of structural overlap between machine translation tasks. We apply our methodology to the natural language to Bash semantic parsing task (NLBash) and show that it is largel… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  5. arXiv:2306.11800  [pdf, other

    cs.LG

    DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization

    Authors: Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov

    Abstract: With the increase in the scale of Deep Learning (DL) training workloads in terms of compute resources and time consumption, the likelihood of encountering in-training failures rises substantially, leading to lost work and resource wastage. Such failures are typically offset by a checkpointing mechanism, which comes at the cost of storage and network bandwidth overhead. State-of-the-art approaches… ▽ More

    Submitted 2 September, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

  6. arXiv:2211.12316  [pdf, other

    cs.LG cs.CL

    Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

    Authors: Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom

    Abstract: Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Bo… ▽ More

    Submitted 10 July, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: ACL 2023

  7. arXiv:2203.07402  [pdf, other

    cs.CL

    Revisiting the Compositional Generalization Abilities of Neural Sequence Models

    Authors: Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal

    Abstract: Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences. Recent works have claimed that standard seq-to-seq models severely lack the ability to compositionally generalize. In this paper, we focus on one-shot primitive generalization as introduced by the popular SCAN benchmark. We demonstrate that modifying the trainin… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  8. arXiv:2103.07191  [pdf, other

    cs.CL

    Are NLP Models really able to Solve Simple Math Word Problems?

    Authors: Arkil Patel, Satwik Bhattamishra, Navin Goyal

    Abstract: The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs.… ▽ More

    Submitted 15 April, 2021; v1 submitted 12 March, 2021; originally announced March 2021.

    Comments: NAACL 2021

  9. arXiv:2011.03965  [pdf, other

    cs.CL cs.LG

    On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages

    Authors: Satwik Bhattamishra, Kabir Ahuja, Navin Goyal

    Abstract: While recurrent models have been effective in NLP tasks, their performance on context-free languages (CFLs) has been found to be quite weak. Given that CFLs are believed to capture important phenomena such as hierarchical structure in natural languages, this discrepancy in performance calls for an explanation. We study the performance of recurrent models on Dyck-n languages, a particularly importa… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

    Comments: COLING 2020

  10. arXiv:2009.11264  [pdf, other

    cs.CL cs.LG

    On the Ability and Limitations of Transformers to Recognize Formal Languages

    Authors: Satwik Bhattamishra, Kabir Ahuja, Navin Goyal

    Abstract: Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on regular languages and have close connections with counter languages. In this work, we systematically study the ability of Transformers to model such languages a… ▽ More

    Submitted 8 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

    Comments: EMNLP 2020

  11. arXiv:2006.09286  [pdf, other

    cs.LG cs.CL stat.ML

    On the Computational Power of Transformers and its Implications in Sequence Modeling

    Authors: Satwik Bhattamishra, Arkil Patel, Navin Goyal

    Abstract: Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding of their power and inherent limitations is still nascent. In particular, the roles of various components in Transformers such as positional encodings, attention… ▽ More

    Submitted 10 October, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: CoNLL 2020

  12. arXiv:1912.03457  [pdf, other

    cs.CL cs.CY

    Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities

    Authors: Pratik Joshi, Christain Barnes, Sebastin Santy, Simran Khanuja, Sanket Shah, Anirudh Srinivasan, Satwik Bhattamishra, Sunayana Sitaram, Monojit Choudhury, Kalika Bali

    Abstract: In this paper, we examine and analyze the challenges associated with develo** and introducing language technologies to low-resource language communities. While doing so, we bring to light the successes and failures of past work in this area, challenges being faced in doing so, and what they have achieved. Throughout this paper, we take a problem-facing approach and describe essential factors whi… ▽ More

    Submitted 7 December, 2019; originally announced December 2019.

    Comments: Accepted at ICON 2019; 9 pages