Skip to main content

Showing 1–21 of 21 results for author: Toshniwal, S

.
  1. arXiv:2406.14654  [pdf, other

    cs.CL cs.AI cs.LG

    Major Entity Identification: A Generalizable Alternative to Coreference Resolution

    Authors: Kawshik Manikantan, Shubham Toshniwal, Makarand Tapaswi, Vineet Gandhi

    Abstract: The limited generalization of coreference resolution (CR) models has been a major bottleneck in the task's broad application. Prior work has identified annotation differences, especially for mention detection, as one of the main reasons for the generalization gap and proposed using additional annotated target domain data. Rather than relying on this additional annotation, we propose an alternative… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 16 pages, 6 figures

    ACM Class: I.2.7

  2. arXiv:2406.11704  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-4 340B Technical Report

    Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

    Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2405.21068  [pdf, other

    cs.CL cs.AI

    Code Pretraining Improves Entity Tracking Abilities of Language Models

    Authors: Najoung Kim, Sebastian Schuster, Shubham Toshniwal

    Abstract: Recent work has provided indirect evidence that pretraining language models on code improves the ability of models to track state changes of discourse entities expressed in natural language. In this work, we systematically test this claim by comparing pairs of language models on their entity tracking performance. Critically, the pairs consist of base models and models trained on top of these base… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  4. arXiv:2402.10176  [pdf, other

    cs.CL cs.AI cs.LG

    OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

    Authors: Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman

    Abstract: Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limitin… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Data and models are available at https://huggingface.co/collections/nvidia/openmath-65c5619de2ba059be0775014

  5. arXiv:2305.00833  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Reason and Memorize with Self-Notes

    Authors: Jack Lanchantin, Shubham Toshniwal, Jason Weston, Arthur Szlam, Sainbayar Sukhbaatar

    Abstract: Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes. Unlike recent chain-of-thought or scratchpad approaches, the model can deviate from the input context at any time to explicitly think and write down its thought… ▽ More

    Submitted 31 October, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  6. arXiv:2209.10052  [pdf, other

    cs.CL

    Adapting Pretrained Text-to-Text Models for Long Text Sequences

    Authors: Wenhan Xiong, Anchit Gupta, Shubham Toshniwal, Yashar Mehdad, Wen-tau Yih

    Abstract: We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline -- model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in t… ▽ More

    Submitted 16 November, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

  7. arXiv:2208.14252  [pdf, other

    cs.CL

    Efficient and Interpretable Neural Models for Entity Tracking

    Authors: Shubham Toshniwal

    Abstract: What would it take for a natural language model to understand a novel, such as The Lord of the Rings? Among other things, such a model must be able to: (a) identify and record new characters (entities) and their attributes as they are introduced in the text, and (b) identify subsequent references to the characters previously introduced and update their attributes. This problem of entity tracking i… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: PhD Thesis

  8. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, AdriĆ  Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  9. arXiv:2109.09667  [pdf, other

    cs.CL

    On Generalization in Coreference Resolution

    Authors: Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu, Kevin Gimpel

    Abstract: While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains. We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and meta… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: CRAC 2021

  10. arXiv:2102.13249  [pdf, other

    cs.CL cs.AI

    Chess as a Testbed for Language Model State Tracking

    Authors: Shubham Toshniwal, Sam Wiseman, Karen Livescu, Kevin Gimpel

    Abstract: Transformer language models have made tremendous strides in natural language understanding tasks. However, the complexity of natural language makes it challenging to ascertain how accurately these models are tracking the world state underlying the text. Motivated by this issue, we consider the task of language modeling for the game of chess. Unlike natural language, chess notations describe a simp… ▽ More

    Submitted 13 May, 2022; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: AAAI 2022 extended version with supplementary material

  11. arXiv:2010.02807  [pdf, other

    cs.CL cs.LG

    Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks

    Authors: Shubham Toshniwal, Sam Wiseman, Allyson Ettinger, Karen Livescu, Kevin Gimpel

    Abstract: Long document coreference resolution remains a challenging task due to the large memory and runtime requirements of current models. Recent work doing incremental coreference resolution using just the global representation of entities shows practical benefits but requires kee** all entities in memory, which can be impractical for long documents. We argue that kee** all entities in memory is unn… ▽ More

    Submitted 16 November, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Post EMNLP 2020 camera ready updates

  12. arXiv:2006.03866  [pdf, other

    cs.CL

    A Cross-Task Analysis of Text Span Representations

    Authors: Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel

    Abstract: Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution. While extensive research has focused on functional architectures for representing words and sentences, there is less work on representing arbitrary spans of text within sentences. In this paper, we conduct a comprehensive empirical evaluat… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

    Comments: RepL4NLP 2020

  13. arXiv:2005.02990  [pdf, other

    cs.CL cs.LG

    PeTra: A Sparsely Supervised Memory Model for People Tracking

    Authors: Shubham Toshniwal, Allyson Ettinger, Kevin Gimpel, Karen Livescu

    Abstract: We propose PeTra, a memory-augmented neural network designed to track entities in its memory slots. PeTra is trained using sparse annotation from the GAP pronoun resolution dataset and outperforms a prior memory model on the task while using a simpler architecture. We empirically compare key modeling choices, finding that we can simplify several aspects of the design of the memory module while ret… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  14. arXiv:1902.08295  [pdf, other

    cs.LG stat.ML

    Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

    Authors: Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob , et al. (66 additional authors not shown)

    Abstract: Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly w… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

  15. arXiv:1807.10857  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

    Authors: Shubham Toshniwal, Anjuli Kannan, Chung-Cheng Chiu, Yonghui Wu, Tara N Sainath, Karen Livescu

    Abstract: Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and requires only a parallel corpus of speech and text for training. However, unlike in conventional approaches that combine separate acoustic and language models, it is… ▽ More

    Submitted 6 November, 2018; v1 submitted 27 July, 2018; originally announced July 2018.

    Comments: Accepted in SLT 2018

  16. arXiv:1807.06234  [pdf, other

    cs.CL

    Hierarchical Multitask Learning for CTC-based Speech Recognition

    Authors: Kalpesh Krishna, Shubham Toshniwal, Karen Livescu

    Abstract: Previous work has shown that neural encoder-decoder speech recognition can be improved with hierarchical multitask learning, where auxiliary tasks are added at intermediate layers of a deep encoder. We explore the effect of hierarchical multitask learning in the context of connectionist temporal classification (CTC)-based speech recognition, and investigate several aspects of this approach. Consis… ▽ More

    Submitted 6 March, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

    Comments: Technical Report

  17. arXiv:1711.01694  [pdf, other

    eess.AS cs.AI cs.CL

    Multilingual Speech Recognition With A Single End-To-End Model

    Authors: Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao

    Abstract: Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we presen… ▽ More

    Submitted 15 February, 2018; v1 submitted 5 November, 2017; originally announced November 2017.

    Comments: Accepted in ICASSP 2018

  18. arXiv:1704.07287  [pdf, other

    cs.CL cs.LG cs.SD

    Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

    Authors: Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, Mari Ostendorf

    Abstract: In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic… ▽ More

    Submitted 15 April, 2018; v1 submitted 24 April, 2017; originally announced April 2017.

    Comments: Accepted in NAACL HLT 2018

  19. arXiv:1704.01631  [pdf, other

    cs.CL cs.AI

    Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition

    Authors: Shubham Toshniwal, Hao Tang, Liang Lu, Karen Livescu

    Abstract: End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded in explicit intermediate-level supervision. We hypothesize that using intermediate representations as auxiliary supervision at lower levels of deep networks may be a good way of comb… ▽ More

    Submitted 19 April, 2017; v1 submitted 5 April, 2017; originally announced April 2017.

  20. arXiv:1610.06540  [pdf, other

    cs.CL cs.AI

    Jointly Learning to Align and Convert Graphemes to Phonemes with Neural Attention Models

    Authors: Shubham Toshniwal, Karen Livescu

    Abstract: We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, i… ▽ More

    Submitted 20 October, 2016; originally announced October 2016.

    Comments: Accepted in SLT 2016

  21. arXiv:1402.1759  [pdf

    cs.IT

    Performance Improvement of OFDM System Using Iterative Signal Clip** With Various Window Techniques for PAPR Reduction

    Authors: Smita Jolania, Sandeep Toshniwal

    Abstract: OFDM signals demonstrates high fluctuations termed as Peak to Average Power Ratio (PAPR).The problem of OFDM is the frequent occurrence of high Peaks in the time domain signal which in turn reduces the efficiency of transmit high power amplifier.In this paper we discussed clip** and filtering technique which is easy to implement and reduces the amount of PAPR by clip** the peak of the maximum… ▽ More

    Submitted 7 February, 2014; originally announced February 2014.

    Comments: 5 pages,7 figures,Published with "International Journal of Engineering Trends and Technology (IJETT)"