Skip to main content

Showing 1–7 of 7 results for author: Ravula, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2203.02656  [pdf, other

    cs.LG cs.SI

    Deep Partial Multiplex Network Embedding

    Authors: Qifan Wang, Yi Fang, Anirudh Ravula, Ruining He, Bin Shen, **gang Wang, Xiaojun Quan, Dongfang Liu

    Abstract: Network embedding is an effective technique to learn the low-dimensional representations of nodes in networks. Real-world networks are usually with multiplex or having multi-view representations from different relations. Recently, there has been increasing interest in network embedding on multiplex data. However, most existing multiplex approaches assume that the data is complete in all views. But… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted to WWW 2022 GL workshop

  2. arXiv:2202.00217  [pdf, other

    cs.CL

    WebFormer: The Web-page Transformer for Structure Information Extraction

    Authors: Qifan Wang, Yi Fang, Anirudh Ravula, Fuli Feng, Xiaojun Quan, Dongfang Liu

    Abstract: Structure information extraction refers to the task of extracting structured text fields from web pages, such as extracting a product offer from a shop** page including product title, description, brand and price. It is an important research topic which has been widely studied in document understanding and web search. Recent natural language models with sequence modeling have demonstrated state-… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

    Comments: Accepted to WWW 2022

  3. Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality

    Authors: Adithya V Ganesan, Matthew Matero, Aravind Reddy Ravula, Huy Vu, H. Andrew Schwartz

    Abstract: In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components a… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

    Comments: 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

  4. arXiv:2102.13247  [pdf, other

    cs.CL

    DOCENT: Learning Self-Supervised Entity Representations from Large Document Collections

    Authors: Yury Zemlyanskiy, Sudeep Gandhe, Ruining He, Bhargav Kanagal, Anirudh Ravula, Juraj Gottweis, Fei Sha, Ilya Eckstein

    Abstract: This paper explores learning rich self-supervised entity representations from large amounts of the associated text. Once pre-trained, these models become applicable to multiple entity-centric tasks such as ranked retrieval, knowledge base completion, question answering, and more. Unlike other methods that harvest self-supervision signals based merely on a local context within a sentence, we radica… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: To appear in the proceedings of EACL 2021

  5. arXiv:2012.11747  [pdf, other

    cs.LG

    RealFormer: Transformer Likes Residual Attention

    Authors: Ruining He, Anirudh Ravula, Bhargav Kanagal, Joshua Ainslie

    Abstract: Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform the canonical Transformer and its variants (BERT, ETC, etc.) on a wide spectrum of tasks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Qu… ▽ More

    Submitted 10 September, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: Findings of ACL-IJCNLP 2021

  6. arXiv:2007.14062  [pdf, other

    cs.LG cs.CL stat.ML

    Big Bird: Transformers for Longer Sequences

    Authors: Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed

    Abstract: Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that… ▽ More

    Submitted 8 January, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

    Journal ref: Neural Information Processing Systems (NeurIPS) 2020

  7. arXiv:2004.08483  [pdf, other

    cs.LG stat.ML

    ETC: Encoding Long and Structured Inputs in Transformers

    Authors: Joshua Ainslie, Santiago Ontanon, Chris Alberti, Vaclav Cvicek, Zachary Fisher, Philip Pham, Anirudh Ravula, Sumit Sanghai, Qifan Wang, Li Yang

    Abstract: Transformer models have advanced the state of the art in many Natural Language Processing (NLP) tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key challenges of standard Transformer architectures, namely scaling input length and encoding structured inputs. To scale attention to longer inputs, we introduce a novel global-… ▽ More

    Submitted 27 October, 2020; v1 submitted 17 April, 2020; originally announced April 2020.

    Comments: Accepted at EMNLP 2020