Skip to main content

Showing 1–28 of 28 results for author: Parikh, A P

.
  1. arXiv:2305.13194  [pdf, other

    cs.CL

    SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

    Authors: Elizabeth Clark, Shruti Rijhwani, Sebastian Gehrmann, Joshua Maynez, Roee Aharoni, Vitaly Nikolaev, Thibault Sellam, Aditya Siddhant, Dipanjan Das, Ankur P. Parikh

    Abstract: Reliable automatic evaluation of summarization systems is challenging due to the multifaceted and subjective nature of the task. This is especially the case for languages other than English, where human evaluations are scarce. In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. SEAHORSE consists of 96K summaries with human ratings along 6 dimensi… ▽ More

    Submitted 1 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

  2. arXiv:2303.04562  [pdf, other

    cs.LG cs.CL q-bio.QM

    Extrapolative Controlled Sequence Generation via Iterative Refinement

    Authors: Vishakh Padmakumar, Richard Yuanzhe Pang, He He, Ankur P. Parikh

    Abstract: We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where the goal is to design novel proteins that are \textit{better} (e.g., more stable) than existing sequences. Thus, by definition, the target sequences and their att… ▽ More

    Submitted 7 June, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: ICML 2023 - Camera Ready Version

  3. arXiv:2211.08714  [pdf, other

    cs.CL cs.AI cs.LG

    Reward Gaming in Conditional Text Generation

    Authors: Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P. Parikh, He He

    Abstract: To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations. Under this framework, we identify three common cases where high rewards are incorrectly assigned to undesirable patterns: noise-induced spurious correlation, naturally occurring sp… ▽ More

    Submitted 1 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: ACL 2023

  4. arXiv:2210.11693  [pdf, other

    cs.LG

    Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale

    Authors: Ran Tian, Ankur P. Parikh

    Abstract: We present Amos, a stochastic gradient-based optimizer designed for training deep neural networks. It can be viewed as an Adam optimizer with theoretically supported, adaptive learning-rate decay and weight decay. A key insight behind Amos is that it leverages model-specific information to determine the initial learning-rate and decaying schedules. When used for pre-training BERT variants and T5,… ▽ More

    Submitted 21 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

  5. arXiv:2210.06324  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    SQuId: Measuring Speech Naturalness in Many Languages

    Authors: Thibault Sellam, Ankur Bapna, Joshua Camp, Diana Mackinnon, Ankur P. Parikh, Jason Riesa

    Abstract: Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We introduce SQuId (Speech Quality Identification), a multilingual naturalness prediction model trained on over a million ratings and tested in 65 loca… ▽ More

    Submitted 1 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted at ICASSP 2023, with additional material in the appendix

  6. arXiv:2205.11588  [pdf, other

    cs.CL cs.AI

    Simple Recurrence Improves Masked Language Models

    Authors: Tao Lei, Ran Tian, Jasmijn Bastings, Ankur P. Parikh

    Abstract: In this work, we explore whether modeling recurrence into the Transformer architecture can both be beneficial and efficient, by building an extremely simple recurrent module into the Transformer. We compare our model to baselines following the training and evaluation recipe of BERT. Our results confirm that recurrence can indeed improve Transformer models by a consistent margin, without requiring… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  7. arXiv:2110.08467  [pdf, other

    cs.CL cs.AI

    Improving Compositional Generalization with Self-Training for Data-to-Text Generation

    Authors: Sanket Vaibhav Mehta, **feng Rao, Yi Tay, Mihir Kale, Ankur P. Parikh, Emma Strubell

    Abstract: Data-to-text generation focuses on generating fluent natural language responses from structured meaning representations (MRs). Such representations are compositional and it is costly to collect responses for all possible combinations of atomic meaning schemata, thereby necessitating few-shot generalization to novel MRs. In this work, we systematically study the compositional generalization of the… ▽ More

    Submitted 11 April, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: Accepted at ACL 2022 main conference

  8. arXiv:2110.06341  [pdf, other

    cs.CL

    Learning Compact Metrics for MT

    Authors: Amy Pu, Hyung Won Chung, Ankur P. Parikh, Sebastian Gehrmann, Thibault Sellam

    Abstract: Recent developments in machine translation and multilingual text generation have led researchers to adopt trained metrics such as COMET or BLEURT, which treat evaluation as a regression problem and use representations from multilingual pre-trained models such as XLM-RoBERTa or mBERT. Yet studies on related tasks suggest that these models are most efficient when they are large, which is costly and… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted at EMNLP 2021

  9. arXiv:2108.13032  [pdf, other

    cs.CL cs.LG

    Shatter: An Efficient Transformer Encoder with Single-Headed Self-Attention and Relative Sequence Partitioning

    Authors: Ran Tian, Joshua Maynez, Ankur P. Parikh

    Abstract: The highly popular Transformer architecture, based on self-attention, is the foundation of large pretrained models such as BERT, that have become an enduring paradigm in NLP. While powerful, the computational resources and time required to pretrain such models can be prohibitive. In this work, we present an alternative self-attention architecture, Shatter, that more efficiently encodes sequence in… ▽ More

    Submitted 30 August, 2021; originally announced August 2021.

  10. arXiv:2103.06799  [pdf, other

    cs.CL

    Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution

    Authors: Xavier Garcia, Noah Constant, Ankur P. Parikh, Orhan Firat

    Abstract: We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models, paving the way towards efficient continual learning for multilingual machine translation. Our approach is suitable for large-scale datasets, applies to distant languages with unseen scripts, incurs only minor degradation on the translation performance for the origin… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: Accepted at NAACL 2021

  11. arXiv:2010.04297  [pdf, other

    cs.CL

    Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task

    Authors: Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun Tan, Markus Freitag, Dipanjan Das, Ankur P. Parikh

    Abstract: The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem. This paper describes our contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation. We make several submissions based on BLEURT, a previously published metric based on transfer learn… ▽ More

    Submitted 19 October, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

  12. arXiv:2009.11201  [pdf, other

    cs.CL

    Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages

    Authors: Xavier Garcia, Aditya Siddhant, Orhan Firat, Ankur P. Parikh

    Abstract: Unsupervised translation has reached impressive performance on resource-rich language pairs such as English-French and English-German. However, early studies have shown that in more realistic settings involving low-resource, rare languages, unsupervised translation performs poorly, achieving less than 3.0 BLEU. In this work, we show that multilinguality is critical to making unsupervised systems p… ▽ More

    Submitted 12 March, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

    Comments: Accepted to NAACL 2021

  13. arXiv:2004.14373  [pdf, other

    cs.CL cs.LG

    ToTTo: A Controlled Table-To-Text Generation Dataset

    Authors: Ankur P. Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

    Abstract: We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revis… ▽ More

    Submitted 6 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted to EMNLP 2020

  14. arXiv:2004.04696  [pdf, other

    cs.CL

    BLEURT: Learning Robust Metrics for Text Generation

    Authors: Thibault Sellam, Dipanjan Das, Ankur P. Parikh

    Abstract: Text generation has made significant advances in the last few years. Yet, evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU and ROUGE) may correlate poorly with human judgments. We propose BLEURT, a learned evaluation metric based on BERT that can model human judgments with a few thousand possibly biased training examples. A key aspect of our approach is a novel pre-tr… ▽ More

    Submitted 21 May, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

    Comments: Accepted at ACL 2020

  15. arXiv:2002.02955  [pdf, ps, other

    cs.CL

    A Multilingual View of Unsupervised Machine Translation

    Authors: Xavier Garcia, Pierre Foret, Thibault Sellam, Ankur P. Parikh

    Abstract: We present a probabilistic framework for multilingual neural machine translation that encompasses supervised and unsupervised setups, focusing on unsupervised translation. In addition to studying the vanilla case where there is only monolingual data available, we propose a novel setup where one language in the (source, target) pair is not associated with any parallel data, but there may exist auxi… ▽ More

    Submitted 16 October, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: Accepted at Findings of EMNLP 2020 [Fixed processing error.]

  16. arXiv:1910.12366  [pdf, other

    cs.CL cs.CR cs.LG

    Thieves on Sesame Street! Model Extraction of BERT-based APIs

    Authors: Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, Mohit Iyyer

    Abstract: We study the problem of model extraction in natural language processing, in which an adversary with only query access to a victim model attempts to reconstruct a local copy of that model. Assuming that both the adversary and victim model fine-tune a large pretrained language model such as BERT (Devlin et al. 2019), we show that the adversary does not need any real training data to successfully mou… ▽ More

    Submitted 12 October, 2020; v1 submitted 27 October, 2019; originally announced October 2019.

    Comments: ICLR 2020 Camera Ready (19 pages)

  17. arXiv:1910.08684  [pdf, other

    cs.CL

    Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation

    Authors: Ran Tian, Shashi Narayan, Thibault Sellam, Ankur P. Parikh

    Abstract: We address the issue of hallucination in data-to-text generation, i.e., reducing the generation of text that is unsupported by the source. We conjecture that hallucination can be caused by an encoder-decoder model generating content phrases without attending to the source; so we propose a confidence score to ensure that the model attends to the source whenever necessary, as well as a variational B… ▽ More

    Submitted 2 November, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

  18. arXiv:1906.05807  [pdf, other

    cs.CL

    Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

    Authors: Minjoon Seo, **hyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi

    Abstract: Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query. In this paper, we introduce the query-agnostic indexable representation of document phrases that can drastically speed up open-domain QA and also allows us to reach long-tail targets. In particular, our dense-sparse phrase enc… ▽ More

    Submitted 14 June, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

    Comments: ACL 2019; Code & demo available at https://nlp.cs.washington.edu/denspi/ ; Added comparison to Weaver (Raison et al., 2018)

  19. arXiv:1904.04428  [pdf, other

    cs.CL

    Text Generation with Exemplar-based Adaptive Decoding

    Authors: Hao Peng, Ankur P. Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das

    Abstract: We propose a novel conditioned text generation model. It draws inspiration from traditional template-based text generation techniques, where the source provides the content (i.e., what to say), and the template influences how to say it. Building on the successful encoder-decoder paradigm, it first encodes the content representation from the given input text; to produce the output, it retrieves exe… ▽ More

    Submitted 10 April, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: NAACL 2019

  20. arXiv:1904.02338  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    Consistency by Agreement in Zero-shot Neural Machine Translation

    Authors: Maruan Al-Shedivat, Ankur P. Parikh

    Abstract: Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization---a challenging setup that tests models on translation directions they have not been optimized for at training time. To solve the problem, we (i) reformulate multilingual translation as probab… ▽ More

    Submitted 10 April, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

    Comments: NAACL 2019 (14 pages, 5 figures)

  21. arXiv:1808.01687  [pdf, ps, other

    cs.LG stat.ML

    Hybrid Subspace Learning for High-Dimensional Data

    Authors: Micol Marchetti-Bowick, Benjamin J. Lengerich, Ankur P. Parikh, Eric P. Xing

    Abstract: The high-dimensional data setting, in which p >> n, is a challenging statistical paradigm that appears in many real-world problems. In this setting, learning a compact, low-dimensional representation of the data can substantially help distinguish signal from noise. One way to achieve this goal is to perform subspace learning to estimate a small set of latent features that capture the majority of t… ▽ More

    Submitted 5 August, 2018; originally announced August 2018.

  22. arXiv:1804.07726  [pdf, other

    cs.CL

    Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension

    Authors: Minjoon Seo, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi

    Abstract: We formalize a new modular variant of current question answering tasks by enforcing complete independence of the document encoder from the question encoder. This formulation addresses a key challenge in machine comprehension by requiring a standalone representation of the document discourse. It additionally leads to a significant scalability advantage since the encoding of the answer candidate phr… ▽ More

    Submitted 26 September, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: EMNLP 2018 short; 6 pages

  23. arXiv:1711.00894  [pdf, other

    cs.CL

    Multi-Mention Learning for Reading Comprehension with Neural Cascades

    Authors: Swabha Swayamdipta, Ankur P. Parikh, Tom Kwiatkowski

    Abstract: Reading comprehension is a challenging task, especially when executed across longer or across multiple evidence documents, where the answer is likely to reoccur. Existing neural architectures typically do not scale to the entire evidence, and hence, resort to selecting a single passage in the document (either via truncation or other means), and carefully searching for the answer within that passag… ▽ More

    Submitted 30 May, 2018; v1 submitted 2 November, 2017; originally announced November 2017.

    Comments: Proceedings of ICLR 2018

  24. arXiv:1606.01933  [pdf, other

    cs.CL

    A Decomposable Attention Model for Natural Language Inference

    Authors: Ankur P. Parikh, Oscar Täckström, Dipanjan Das, Jakob Uszkoreit

    Abstract: We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially parallelizable. On the Stanford Natural Language Inference (SNLI) dataset, we obtain state-of-the-art results with almost an order of magnitude fewer parameters than previous work and without relying on… ▽ More

    Submitted 25 September, 2016; v1 submitted 6 June, 2016; originally announced June 2016.

    Comments: 7 pages, 1 figure, Proceeedings of EMNLP 2016

  25. arXiv:1401.3413  [pdf, other

    cs.LG cs.IR

    Infinite Mixed Membership Matrix Factorization

    Authors: Avneesh Saluja, Mahdi Pakdaman, Dongzhen Piao, Ankur P. Parikh

    Abstract: Rating and recommendation systems have become a popular application area for applying a suite of machine learning techniques. Current approaches rely primarily on probabilistic interpretations and extensions of matrix factorization, which factorizes a user-item ratings matrix into latent user and item vectors. Most of these methods fail to model significant variations in item ratings from otherwis… ▽ More

    Submitted 14 January, 2014; originally announced January 2014.

    Comments: For ICDM 2013 Workshop Proceedings

  26. arXiv:1312.7077  [pdf, other

    cs.CL cs.LG stat.ML

    Language Modeling with Power Low Rank Ensembles

    Authors: Ankur P. Parikh, Avneesh Saluja, Chris Dyer, Eric P. Xing

    Abstract: We present power low rank ensembles (PLRE), a flexible framework for n-gram language modeling where ensembles of low rank matrices and tensors are used to obtain smoothed probability estimates of words in context. Our method can be understood as a generalization of n-gram modeling to non-integer n, and includes standard techniques such as absolute discounting and Kneser-Ney smoothing as special ca… ▽ More

    Submitted 3 October, 2014; v1 submitted 26 December, 2013; originally announced December 2013.

  27. arXiv:1210.4884  [pdf

    cs.LG stat.ML

    A Spectral Algorithm for Latent Junction Trees

    Authors: Ankur P. Parikh, Le Song, Mariya Ishteva, Gabi Teodoru, Eric P. Xing

    Abstract: Latent variable models are an elegant framework for capturing rich probabilistic dependencies in many applications. However, current approaches typically parametrize these models using conditional probability tables, and learning relies predominantly on local search heuristics such as Expectation Maximization. Using tensor algebra, we propose an alternative parameterization of latent variable mode… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-675-684

  28. arXiv:1010.1868  [pdf, other

    stat.ML stat.ME

    Infinite Hierarchical MMSB Model for Nested Communities/Groups in Social Networks

    Authors: Qirong Ho, Ankur P. Parikh, Le Song, Eric P. Xing

    Abstract: Actors in realistic social networks play not one but a number of diverse roles depending on whom they interact with, and a large number of such role-specific interactions collectively determine social communities and their organizations. Methods for analyzing social networks should capture these multi-faceted role-specific interactions, and, more interestingly, discover the latent organization or… ▽ More

    Submitted 9 October, 2010; originally announced October 2010.