Skip to main content

Showing 1–9 of 9 results for author: Jwalapuram, P

.
  1. arXiv:2403.15484  [pdf, other

    cs.CL cs.LG

    RakutenAI-7B: Extending Large Language Models for Japanese

    Authors: Rakuten Group, Aaron Levine, Connie Huang, Chenguang Wang, Eduardo Batista, Ewa Szymanska, Hongyi Ding, Hou Wei Chou, Jean-François Pessiot, Johanes Effendi, Justin Chiu, Kai Torben Ohlhus, Karan Chopra, Keiji Shinzato, Koji Murakami, Lee Xiong, Lei Chen, Maki Kubota, Maksim Tkachenko, Miroku Lee, Naoki Takahashi, Prathyusha Jwalapuram, Ryutaro Tatsushima, Saurabh Jain, Sunil Kumar Yadav , et al. (5 additional authors not shown)

    Abstract: We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.

    Submitted 21 March, 2024; originally announced March 2024.

  2. arXiv:2301.13753  [pdf, ps, other

    cs.CL

    Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation

    Authors: Xiang Lin, Prathyusha Jwalapuram, Shafiq Joty

    Abstract: State-of-the-art neural text generation models are typically trained to maximize the likelihood of each token in the ground-truth sequence conditioned on the previous target tokens. However, during inference, the model needs to make a prediction conditioned on the tokens generated by itself. This train-test discrepancy is referred to as exposure bias. Scheduled sampling is a curriculum learning st… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  3. arXiv:2110.07198  [pdf, other

    cs.CL

    Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling

    Authors: Prathyusha Jwalapuram, Shafiq Joty, Xiang Lin

    Abstract: Given the claims of improved text generation quality across various pre-trained neural models, we consider the coherence evaluation of machine generated text to be one of the principal applications of coherence models that needs to be investigated. Prior work in neural coherence modeling has primarily focused on devising new architectures for solving the permuted document task. We instead use a ba… ▽ More

    Submitted 21 March, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted at ACL 2022

  4. arXiv:2010.07638  [pdf, other

    cs.CL

    Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses

    Authors: Prathyusha Jwalapuram, Shafiq Joty, Youlin Shen

    Abstract: Popular Neural Machine Translation model training uses strategies like backtranslation to improve BLEU scores, requiring large amounts of additional data and training. We introduce a class of conditional generative-discriminative hybrid losses that we use to fine-tune a trained machine translation model. Through a combination of targeted fine-tuning objectives and intuitive re-use of the training… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  5. arXiv:2004.14626  [pdf, other

    cs.CL

    Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks

    Authors: Tasnim Mohiuddin, Prathyusha Jwalapuram, Xiang Lin, Shafiq Joty

    Abstract: Although coherence modeling has come a long way in develo** novel models, their evaluation on downstream applications for which they are purportedly developed has largely been neglected. With the advancements made by neural approaches in applications such as machine translation (MT), summarization and dialog systems, the need for coherence evaluation of these tasks is now more crucial than ever.… ▽ More

    Submitted 13 February, 2021; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: Accepted paper at EACL-21

  6. arXiv:2004.14607  [pdf, other

    cs.CL

    Can Your Context-Aware MT System Pass the DiP Benchmark Tests? : Evaluation Benchmarks for Discourse Phenomena in Machine Translation

    Authors: Prathyusha Jwalapuram, Barbara Rychalska, Shafiq Joty, Dominika Basaj

    Abstract: Despite increasing instances of machine translation (MT) systems including contextual information, the evidence for translation quality improvement is sparse, especially for discourse phenomena. Popular metrics like BLEU are not expressive or sensitive enough to capture quality improvements or drops that are minor in size but significant in perception. We introduce the first of their kind MT bench… ▽ More

    Submitted 30 April, 2020; originally announced April 2020.

  7. arXiv:1911.09812  [pdf, other

    cs.CL cs.LG

    Zero-Resource Cross-Lingual Named Entity Recognition

    Authors: M Saiful Bari, Shafiq Joty, Prathyusha Jwalapuram

    Abstract: Recently, neural methods have achieved state-of-the-art (SOTA) results in Named Entity Recognition (NER) tasks for many languages without the need for manually crafted features. However, these models still require manually annotated training data, which is not available for many languages. In this paper, we propose an unsupervised cross-lingual NER model that can transfer NER knowledge from one la… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

    Journal ref: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-2020)

  8. arXiv:1909.00131  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Pronominal Anaphora in Machine Translation: An Evaluation Measure and a Test Suite

    Authors: Prathyusha Jwalapuram, Shafiq Joty, Irina Temnikova, Preslav Nakov

    Abstract: The ongoing neural revolution in machine translation has made it easier to model larger contexts beyond the sentence-level, which can potentially help resolve some discourse-level ambiguities such as pronominal anaphora, thus enabling better translations. Unfortunately, even when the resulting improvements are seen as substantial by humans, they remain virtually unnoticed by traditional automatic… ▽ More

    Submitted 31 August, 2019; originally announced September 2019.

    Comments: Accepted at EMNLP 2019

  9. arXiv:1905.05682  [pdf, other

    cs.CL cs.AI

    A Unified Linear-Time Framework for Sentence-Level Discourse Parsing

    Authors: Xiang Lin, Shafiq Joty, Prathyusha Jwalapuram, M Saiful Bari

    Abstract: We propose an efficient neural framework for sentence-level discourse analysis in accordance with Rhetorical Structure Theory (RST). Our framework comprises a discourse segmenter to identify the elementary discourse units (EDU) in a text, and a discourse parser that constructs a discourse tree in a top-down fashion. Both the segmenter and the parser are based on Pointer Networks and operate in lin… ▽ More

    Submitted 12 June, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: Accepted by ACL 2019