Skip to main content

Showing 1–18 of 18 results for author: Namazifar, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.17376  [pdf, other

    cs.CL

    CESAR: Automatic Induction of Compositional Instructions for Multi-turn Dialogs

    Authors: Taha Aksu, Devamanyu Hazarika, Shikib Mehri, Seokhwan Kim, Dilek Hakkani-Tür, Yang Liu, Mahdi Namazifar

    Abstract: Instruction-based multitasking has played a critical role in the success of large language models (LLMs) in multi-turn dialog applications. While publicly available LLMs have shown promising performance, when exposed to complex instructions with multiple constraints, they lag against state-of-the-art models like ChatGPT. In this work, we hypothesize that the availability of large-scale complex dem… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  2. arXiv:2311.14543  [pdf, other

    cs.CL cs.AI

    Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language

    Authors: Di **, Shikib Mehri, Devamanyu Hazarika, Aishwarya Padmakumar, Sung** Lee, Yang Liu, Mahdi Namazifar

    Abstract: Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which ma… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted by Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023, Submitted to AAAI 2024

  3. arXiv:2305.12091  [pdf, other

    cs.CL

    "What do others think?": Task-Oriented Conversational Modeling with Subjective Knowledge

    Authors: Chao Zhao, Spandana Gella, Seokhwan Kim, Di **, Devamanyu Hazarika, Alexandros Papangelis, Behnam Hedayatnia, Mahdi Namazifar, Yang Liu, Dilek Hakkani-Tur

    Abstract: Task-oriented Dialogue (TOD) Systems aim to build dialogue systems that assist users in accomplishing specific goals, such as booking a hotel or a restaurant. Traditional TODs rely on domain-specific APIs/DBs or external factual knowledge to generate responses, which cannot accommodate subjective user requests (e.g., "Is the WIFI reliable?" or "Does the restaurant have a good atmosphere?"). To add… ▽ More

    Submitted 2 October, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: SIGDIAL 2023

  4. arXiv:2302.09170  [pdf, other

    cs.CL cs.AI

    KILM: Knowledge Injection into Encoder-Decoder Language Models

    Authors: Yan Xu, Mahdi Namazifar, Devamanyu Hazarika, Aishwarya Padmakumar, Yang Liu, Dilek Hakkani-Tür

    Abstract: Large pre-trained language models (PLMs) have been shown to retain implicit knowledge within their parameters. To enhance this implicit knowledge, we propose Knowledge Injection into Language Models (KILM), a novel approach that injects entity-related knowledge into encoder-decoder PLMs, via a generative knowledge infilling objective through continued pre-training. This is done without architectur… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  5. arXiv:2302.08626  [pdf, other

    cs.NE

    Role of Bias Terms in Dot-Product Attention

    Authors: Mahdi Namazifar, Devamanyu Hazarika, Dilek Hakkani-Tur

    Abstract: Dot-product attention is a core module in the present generation of neural network models, particularly transformers, and is being leveraged across numerous areas such as natural language processing and computer vision. This attention module is comprised of three linear transformations, namely query, key, and value linear transformations, each of which has a bias term. In this work, we study the r… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  6. arXiv:2302.05096  [pdf, other

    cs.CL cs.AI

    Selective In-Context Data Augmentation for Intent Detection using Pointwise V-Information

    Authors: Yen-Ting Lin, Alexandros Papangelis, Seokhwan Kim, Sung** Lee, Devamanyu Hazarika, Mahdi Namazifar, Di **, Yang Liu, Dilek Hakkani-Tur

    Abstract: This work focuses on in-context data augmentation for intent detection. Having found that augmentation via in-context prompting of large pre-trained language models (PLMs) alone does not improve performance, we introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model. Our method first fine-tunes a PLM o… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Accepted at EACL 2023

  7. arXiv:2210.14469  [pdf, other

    cs.CL cs.LG

    Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning

    Authors: Yifan Chen, Devamanyu Hazarika, Mahdi Namazifar, Yang Liu, Di **, Dilek Hakkani-Tur

    Abstract: Prefix-tuning, or more generally continuous prompt tuning, has become an essential paradigm of parameter-efficient transfer learning. Using a large pre-trained language model (PLM), prefix-tuning can obtain strong performance by training only a small portion of parameters. In this paper, we propose to understand and further develop prefix-tuning through the kernel lens. Specifically, we make an an… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: To appear in EMNLP 2022. Code is available at https://github.com/ychen-stat-ml/kernel-adapters

  8. arXiv:2206.07296  [pdf, other

    cs.CL

    Enhanced Knowledge Selection for Grounded Dialogues via Document Semantic Graphs

    Authors: Sha Li, Mahdi Namazifar, Di **, Mohit Bansal, Heng Ji, Yang Liu, Dilek Hakkani-Tur

    Abstract: Providing conversation models with background knowledge has been shown to make open-domain dialogues more informative and engaging. Existing models treat knowledge selection as a sentence ranking or classification problem where each sentence is handled individually, ignoring the internal semantic connection among sentences in the background document. In this work, we propose to automatically conve… ▽ More

    Submitted 30 June, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: NAACL 2022. Please refer to https://www.amazon.science/publications/enhanced-knowledge-selection-for-grounded-dialogues-via-document-semantic-graphs for code and resources

  9. arXiv:2205.03720  [pdf, other

    cs.CL cs.LG

    Empowering parameter-efficient transfer learning by recognizing the kernel structure in self-attention

    Authors: Yifan Chen, Devamanyu Hazarika, Mahdi Namazifar, Yang Liu, Di **, Dilek Hakkani-Tur

    Abstract: The massive amount of trainable parameters in the pre-trained language models (PLMs) makes them hard to be deployed to multiple downstream tasks. To address this issue, parameter-efficient transfer learning methods have been proposed to tune only a few parameters during fine-tuning while freezing the rest. This paper looks at existing methods along this line through the \textit{kernel lens}. Motiv… ▽ More

    Submitted 26 October, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: Accepted in NAACL 2022. Code is available at https://github.com/ychen-stat-ml/kernel-adapters

  10. arXiv:2106.06411  [pdf, other

    cs.CL cs.AI

    Zero-Shot Controlled Generation with Encoder-Decoder Transformers

    Authors: Devamanyu Hazarika, Mahdi Namazifar, Dilek Hakkani-Tür

    Abstract: Controlling neural network-based models for natural language generation (NLG) has broad applications in numerous areas such as machine translation, document summarization, and dialog systems. Approaches that enable such control in a zero-shot manner would be of great importance as, among other reasons, they remove the need for additional annotated data and training. In this work, we propose novel… ▽ More

    Submitted 6 April, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted at AAAI 2022

  11. arXiv:2103.14580  [pdf, other

    cs.CL

    Correcting Automated and Manual Speech Transcription Errors using Warped Language Models

    Authors: Mahdi Namazifar, John Malik, Li Erran Li, Gokhan Tur, Dilek Hakkani Tür

    Abstract: Masked language models have revolutionized natural language processing systems in the past few years. A recently introduced generalization of masked language models called warped language models are trained to be more robust to the types of errors that appear in automatic or manual transcriptions of spoken language by exposing the language model to the same types of errors during training. In this… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Submitted to INTERSPEECH

  12. arXiv:2011.03023  [pdf, other

    cs.CL cs.AI

    Language Model is All You Need: Natural Language Understanding as Question Answering

    Authors: Mahdi Namazifar, Alexandros Papangelis, Gokhan Tur, Dilek Hakkani-Tür

    Abstract: Different flavors of transfer learning have shown tremendous impact in advancing research and applications of machine learning. In this work we study the use of a specific family of transfer learning, where the target domain is mapped to the source domain. Specifically we map Natural Language Understanding (NLU) problems to QuestionAnswering (QA) problems and we show that in low data regimes this… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

  13. arXiv:2011.01900  [pdf, other

    cs.CL cs.AI

    Warped Language Models for Noise Robust Language Understanding

    Authors: Mahdi Namazifar, Gokhan Tur, Dilek Hakkani Tür

    Abstract: Masked Language Models (MLM) are self-supervised neural networks trained to fill in the blanks in a given sentence with masked tokens. Despite the tremendous success of MLMs for various text based tasks, they are not robust for spoken language understanding, especially for spontaneous conversational speech recognition noise. In this work we introduce Warped Language Models (WLM) in which input sen… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: To appear at IEEE SLT 2021

  14. arXiv:2002.00750  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Joint Contextual Modeling for ASR Correction and Language Understanding

    Authors: Yue Weng, Sai Sumanth Miryala, Chandra Khatri, Runze Wang, Huaixiu Zheng, Piero Molino, Mahdi Namazifar, Alexandros Papangelis, Hugh Williams, Franziska Bell, Gokhan Tur

    Abstract: The quality of automatic speech recognition (ASR) is critical to Dialogue Systems as ASR errors propagate to and directly impact downstream tasks such as language understanding (LU). In this paper, we propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with LU to improve the performance of both tasks simultaneously. To measure the effectiveness of… ▽ More

    Submitted 28 January, 2020; originally announced February 2020.

    Comments: Accepted at IEEE ICASSP 2020

  15. arXiv:2001.08868  [pdf, other

    cs.CL cs.AI

    Exploration Based Language Learning for Text-Based Games

    Authors: Andrea Madotto, Mahdi Namazifar, Joost Huizinga, Piero Molino, Adrien Ecoffet, Huaixiu Zheng, Alexandros Papangelis, Dian Yu, Chandra Khatri, Gokhan Tur

    Abstract: This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, a… ▽ More

    Submitted 7 June, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: Accepted at IJCAI 2020

  16. arXiv:2001.06463  [pdf, other

    cs.HC cs.AI cs.CL

    Plato Dialogue System: A Flexible Conversational AI Research Platform

    Authors: Alexandros Papangelis, Mahdi Namazifar, Chandra Khatri, Yi-Chia Wang, Piero Molino, Gokhan Tur

    Abstract: As the field of Spoken Dialogue Systems and Conversational AI grows, so does the need for tools and environments that abstract away implementation details in order to expedite the development process, lower the barrier of entry to the field, and offer a common test-bed for new ideas. In this paper, we present Plato, a flexible Conversational AI platform written in Python that supports any kind of… ▽ More

    Submitted 17 January, 2020; originally announced January 2020.

  17. arXiv:1908.02402  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Flexibly-Structured Model for Task-Oriented Dialogues

    Authors: Lei Shu, Piero Molino, Mahdi Namazifar, Hu Xu, Bing Liu, Huaixiu Zheng, Gokhan Tur

    Abstract: This paper proposes a novel end-to-end architecture for task-oriented dialogue systems. It is based on a simple and practical yet very effective sequence-to-sequence approach, where language understanding and state tracking tasks are modeled jointly with a structured copy-augmented sequential decoder and a multi-label decoder for each slot. The policy engine and language generation tasks are model… ▽ More

    Submitted 6 August, 2019; originally announced August 2019.

  18. arXiv:1712.02316  [pdf, other

    cs.NE cs.AI cs.IR

    Named Entity Sequence Classification

    Authors: Mahdi Namazifar

    Abstract: Named Entity Recognition (NER) aims at locating and classifying named entities in text. In some use cases of NER, including cases where detected named entities are used in creating content recommendations, it is crucial to have a reliable confidence level for the detected named entities. In this work we study the problem of finding confidence levels for detected named entities. We refer to this pr… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.