Skip to main content

Showing 1–13 of 13 results for author: Sethy, A

.
  1. arXiv:2402.11532  [pdf, other

    cs.CL

    Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

    Authors: Shirley Anugrah Hayati, Taehee Jung, Tristan Bodding-Long, Sudipta Kar, Abhinav Sethy, Joo-Kyung Kim, Dongyeop Kang

    Abstract: Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model's generalization to different tasks, even for unseen tasks. However, most existing instruction datasets include only single instructions, and they struggle to follow complex instructions composed of multiple subtasks. In this work, we propose a novel concept of compositional instruct… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  2. arXiv:2310.20081  [pdf, other

    cs.CL cs.AI cs.IR

    Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

    Authors: Chris Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh, Zeynab Raeesy, Omar Zia Khan, Abhinav Sethy

    Abstract: Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: 4 pages, International Workshop on Personalized Generative AI (@CIKM 2023)

    ACM Class: I.2.7; H.3.3

  3. arXiv:2302.10978  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Learning to Retrieve Engaging Follow-Up Queries

    Authors: Christopher Richardson, Sudipta Kar, Anjishnu Kumar, Anand Ramachandran, Omar Zia Khan, Zeynab Raeesy, Abhinav Sethy

    Abstract: Open domain conversational agents can answer a broad range of targeted queries. However, the sequential nature of interaction with these systems makes knowledge exploration a lengthy task which burdens the user with asking a chain of well phrased questions. In this paper, we present a retrieval based system and associated dataset for predicting the next questions that the user might have. Such a s… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: EACL 2023

  4. arXiv:2010.01949  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Improving Device Directedness Classification of Utterances with Semantic Lexical Features

    Authors: Kellen Gillespie, Ioannis C. Konstantakopoulos, Xingzhi Guo, Vishal Thanvantri Vasudevan, Abhinav Sethy

    Abstract: User interactions with personal assistants like Alexa, Google Home and Siri are typically initiated by a wake term or wakeword. Several personal assistants feature "follow-up" modes that allow users to make additional interactions without the need of a wakeword. For the system to only respond when appropriate, and to ignore speech not intended for it, utterances must be classified as device-direct… ▽ More

    Submitted 29 September, 2020; originally announced October 2020.

    Comments: Accepted and Published at ICASSP 2020

    Journal ref: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 7859-7863

  5. arXiv:1911.11952  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Label Dependent Deep Variational Paraphrase Generation

    Authors: Siamak Shakeri, Abhinav Sethy

    Abstract: Generating paraphrases that are lexically similar but semantically different is a challenging task. Paraphrases of this form can be used to augment data sets for various NLP tasks such as machine reading comprehension and question answering with non-trivial negative examples. In this article, we propose a deep variational model to generate paraphrases conditioned on a label that specifies whether… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  6. arXiv:1911.11756  [pdf, other

    cs.LG cs.CL stat.ML

    Semi-Supervised Learning for Text Classification by Layer Partitioning

    Authors: Alexander Hanbo Li, Abhinav Sethy

    Abstract: Most recent neural semi-supervised learning algorithms rely on adding small perturbation to either the input vectors or their representations. These methods have been successful on computer vision tasks as the images form a continuous manifold, but are not appropriate for discrete input such as sentence. To adapt these methods to text input, we propose to decompose a neural network $M$ into two co… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: ASRU 2019

  7. arXiv:1911.11065  [pdf, ps, other

    cs.IR cs.CL cs.LG

    Knowledge Distillation in Document Retrieval

    Authors: Siamak Shakeri, Abhinav Sethy, Cheng Cheng

    Abstract: Complex deep learning models now achieve state of the art performance for many document retrieval tasks. The best models process the query or claim jointly with the document. However for fast scalable search it is desirable to have document embeddings which are independent of the claim. In this paper we show that knowledge distillation can be used to encourage a model that generates claim independ… ▽ More

    Submitted 11 November, 2019; originally announced November 2019.

    Comments: Published at Amazon Machine Learning Conference(AMLC) 2019

  8. arXiv:1909.00102  [pdf, other

    cs.CL cs.CR cs.LG stat.ML

    Knowledge Enhanced Attention for Robust Natural Language Inference

    Authors: Alexander Hanbo Li, Abhinav Sethy

    Abstract: Neural network models have been very successful at achieving high accuracy on natural language inference (NLI) tasks. However, as demonstrated in recent literature, when tested on some simple adversarial examples, most of the models suffer a significant drop in performance. This raises the concern about the robustness of NLI models. In this paper, we propose to make NLI models robust by incorporat… ▽ More

    Submitted 30 August, 2019; originally announced September 2019.

  9. arXiv:1810.12464  [pdf, other

    cs.LG stat.ML

    Differentiable Greedy Networks

    Authors: Thomas Powers, Rasool Fakoor, Siamak Shakeri, Abhinav Sethy, Amanjit Kainth, Abdel-rahman Mohamed, Ruhi Sarikaya

    Abstract: Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization. In this paper, we propose a subset selection algorithm that is trainable with gradient-based methods yet achieves near-optimal performance via submodular optimization. We focus on the task of identifying a relevant set of sentences for claim verification in the context of the FEVER t… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

    Comments: Work in progress and under review

  10. arXiv:1709.06436  [pdf, other

    cs.CL

    Language Modeling with Highway LSTM

    Authors: Gakuto Kurata, Bhuvana Ramabhadran, George Saon, Abhinav Sethy

    Abstract: Language models (LMs) based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks. In this paper, we extend an LSTM by adding highway networks inside an LSTM and use the resulting Highway LSTM (HW-LSTM) model for language modeling. The added highway networks increase the depth in the time dimension. Since a typical LSTM has two internal states, a memory… ▽ More

    Submitted 19 September, 2017; originally announced September 2017.

    Comments: to appear in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2017)

  11. arXiv:1701.04313  [pdf, other

    cs.CL cs.IR cs.LG cs.NE

    End-to-End ASR-free Keyword Search from Speech

    Authors: Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, Brian Kingsbury

    Abstract: End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-en… ▽ More

    Submitted 13 January, 2017; originally announced January 2017.

    Comments: Published in the IEEE 2017 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), scheduled for 5-9 March 2017 in New Orleans, Louisiana, USA

  12. arXiv:1412.7063  [pdf, other

    cs.CL cs.LG cs.NE

    Diverse Embedding Neural Network Language Models

    Authors: Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran

    Abstract: We propose Diverse Embedding Neural Network (DENN), a novel architecture for language models (LMs). A DENNLM projects the input word history vector onto multiple diverse low-dimensional sub-spaces instead of a single higher-dimensional sub-space as in conventional feed-forward neural network LMs. We encourage these sub-spaces to be diverse during network training through an augmented loss function… ▽ More

    Submitted 15 April, 2015; v1 submitted 22 December, 2014; originally announced December 2014.

    Comments: Under review as workshop contribution at ICLR 2015

  13. arXiv:1312.7463  [pdf, ps, other

    stat.ML cs.CV cs.LG

    Generalized Ambiguity Decomposition for Understanding Ensemble Diversity

    Authors: Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, Shrikanth S. Narayanan

    Abstract: Diversity or complementarity of experts in ensemble pattern recognition and information processing systems is widely-observed by researchers to be crucial for achieving performance improvement upon fusion. Understanding this link between ensemble diversity and fusion performance is thus an important research question. However, prior works have theoretically characterized ensemble diversity and hav… ▽ More

    Submitted 28 December, 2013; originally announced December 2013.

    Comments: 32 pages, 10 figures

    ACM Class: I.5