Skip to main content

Showing 1–6 of 6 results for author: Aribandi, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2203.00759  [pdf, other

    cs.CL cs.LG

    HyperPrompt: Prompt-based Task-Conditioning of Transformers

    Authors: Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, YaGuang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi

    Abstract: Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to… ▽ More

    Submitted 14 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: Accepted to ICML 2022

  2. arXiv:2111.10952  [pdf, other

    cs.CL cs.LG

    ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

    Authors: Vamsi Aribandi, Yi Tay, Tal Schuster, **feng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

    Abstract: Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training. Towards this goal, this paper introduces ExMix (Extreme Mixture): a massive collection of 107 supervised NLP tasks across diverse domains and task-families. Using ExMix, we study the ef… ▽ More

    Submitted 29 January, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

    Comments: ICLR 2022; see https://youtu.be/FbRcbM4T-50 for a video overview of the paper

  3. How Reliable are Model Diagnostics?

    Authors: Vamsi Aribandi, Yi Tay, Donald Metzler

    Abstract: In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for develo** suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU. This paper takes a step back and asks an important and timely question: how reliable are these diagnostics in providing insight into models and training setups? We critically examine three recent diagnostic… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: ACL 2021 Findings

  4. arXiv:2105.03322  [pdf, other

    cs.CL cs.LG

    Are Pre-trained Convolutions Better than Pre-trained Transformers?

    Authors: Yi Tay, Mostafa Dehghani, Jai Gupta, Dara Bahri, Vamsi Aribandi, Zhen Qin, Donald Metzler

    Abstract: In the era of pre-trained language models, Transformers are the de facto choice of model architectures. While recent research has shown promise in entirely convolutional, or CNN, architectures, they have not been explored using the pre-train-fine-tune paradigm. In the context of language models, are convolutional models competitive to Transformers when pre-trained? This paper investigates this res… ▽ More

    Submitted 30 January, 2022; v1 submitted 7 May, 2021; originally announced May 2021.

    Comments: ACL'21 + updated code/ckpt pointers

  5. Characterization of Time-variant and Time-invariant Assessment of Suicidality on Reddit using C-SSRS

    Authors: Manas Gaur, Vamsi Aribandi, Amanuel Alambo, Ugur Kursuncu, Krishnaprasad Thirunarayan, Jonanthan Beich, Jyotishman Pathak, Amit Sheth

    Abstract: Suicide is the 10th leading cause of death in the U.S (1999-2019). However, predicting when someone will attempt suicide has been nearly impossible. In the modern world, many individuals suffering from mental illness seek emotional support and advice on well-known and easily-accessible social media platforms such as Reddit. While prior artificial intelligence research has demonstrated the ability… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: 24 Pages, 8 Tables, 6 Figures; Accepted by PLoS One ; One of the two mentioned Datasets in the manuscript has Closed Access. We will make it public after PLoS One produces the manuscript

    ACM Class: H.4; I.2; J.3; J.4

  6. arXiv:2103.01075  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    OmniNet: Omnidirectional Representations from Transformers

    Authors: Yi Tay, Mostafa Dehghani, Vamsi Aribandi, Jai Gupta, Philip Pham, Zhen Qin, Dara Bahri, Da-Cheng Juan, Donald Metzler

    Abstract: This paper proposes Omnidirectional Representations from Transformers (OmniNet). In OmniNet, instead of maintaining a strictly horizontal receptive field, each token is allowed to attend to all tokens in the entire network. This process can also be interpreted as a form of extreme or intensive attention mechanism that has the receptive field of the entire width and depth of the network. To this en… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.