Skip to main content

Showing 1–12 of 12 results for author: Ghazarian, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.12794  [pdf, other

    cs.CL

    Overview of Robust and Multilingual Automatic Evaluation Metrics for Open-Domain Dialogue Systems at DSTC 11 Track 4

    Authors: Mario Rodríguez-Cantelar, Chen Zhang, Chengguang Tang, Ke Shi, Sarik Ghazarian, João Sedoc, Luis Fernando D'Haro, Alexander Rudnicky

    Abstract: The advent and fast development of neural networks have revolutionized the research on dialogue systems and subsequently have triggered various challenges regarding their automatic evaluation. Automatic evaluation of open-domain dialogue systems as an open challenge has been the center of the attention of many researchers. Despite the consistent efforts to improve automatic metrics' correlations w… ▽ More

    Submitted 13 September, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  2. arXiv:2305.07797  [pdf, other

    cs.CL

    ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems

    Authors: Sarik Ghazarian, Yijia Shao, Rujun Han, Aram Galstyan, Nanyun Peng

    Abstract: Commonsense reasoning is omnipresent in human communications and thus is an important feature for open-domain dialogue systems. However, evaluating commonsense in dialogue systems is still an open challenge. We take the first step by focusing on event commonsense that considers events and their relations, and is crucial in both dialogues and general commonsense reasoning. We propose ACCENT, an eve… ▽ More

    Submitted 3 November, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  3. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  4. arXiv:2203.13927  [pdf, other

    cs.CL

    What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation

    Authors: Sarik Ghazarian, Behnam Hedayatnia, Alexandros Papangelis, Yang Liu, Dilek Hakkani-Tur

    Abstract: Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL Findings 2022. 11 pages, 8 figures, 5 tables

  5. arXiv:2203.09711  [pdf, other

    cs.CL

    DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations

    Authors: Sarik Ghazarian, Nuan Wen, Aram Galstyan, Nanyun Peng

    Abstract: Automatic evaluation metrics are essential for the rapid development of open-domain dialogue systems as they facilitate hyper-parameter tuning and comparison between models. Although recently proposed trainable conversation-level metrics have shown encouraging results, the quality of the metrics is strongly dependent on the quality of training data. Prior works mainly resort to heuristic text-leve… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: Association for Computational Linguistics (ACL 2022)

  6. arXiv:2111.08808  [pdf, other

    cs.CL

    User Response and Sentiment Prediction for Automatic Dialogue Evaluation

    Authors: Sarik Ghazarian, Behnam Hedayatnia, Alexandros Papangelis, Yang Liu, Dilek Hakkani-Tur

    Abstract: Automatic evaluation is beneficial for open-domain dialog system development. However, standard word-overlap metrics (BLEU, ROUGE) do not correlate well with human judgements of open-domain dialog systems. In this work we propose to use the sentiment of the next user utterance for turn or dialog level evaluation. Specifically we propose three methods: one that predicts the next sentiment directly,… ▽ More

    Submitted 16 February, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: Accepted at EMNLP 2021 Evaluations and Assessments of Neural Conversation Systems Workshop. 2 pages, 1 table

  7. arXiv:2104.05801  [pdf, other

    cs.CL cs.LG

    Plot-guided Adversarial Example Construction for Evaluating Open-domain Story Generation

    Authors: Sarik Ghazarian, Zixi Liu, Akash SM, Ralph Weischedel, Aram Galstyan, Nanyun Peng

    Abstract: With the recent advances of open-domain story generation, the lack of reliable automatic evaluation metrics becomes an increasingly imperative issue that hinders the fast development of story generation. According to conducted researches in this regard, learnable evaluation metrics have promised more accurate assessments by having higher correlations with human judgments. A critical bottleneck of… ▽ More

    Submitted 25 May, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  8. arXiv:2102.02191  [pdf, other

    cs.CL

    DiSCoL: Toward Engaging Dialogue Systems through Conversational Line Guided Response Generation

    Authors: Sarik Ghazarian, Zixi Liu, Tuhin Chakrabarty, Xuezhe Ma, Aram Galstyan, Nanyun Peng

    Abstract: Having engaging and informative conversations with users is the utmost goal for open-domain conversational systems. Recent advances in transformer-based language models and their applications to dialogue systems have succeeded to generate fluent and human-like responses. However, they still lack control over the generation process towards producing contentful responses and achieving engaging conve… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

  9. arXiv:2012.06154  [pdf, other

    cs.CL cs.AI

    ParsiNLU: A Suite of Language Understanding Challenges for Persian

    Authors: Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, Yadollah Yaghoobzadeh

    Abstract: Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluat… ▽ More

    Submitted 13 July, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

    Comments: To appear on Transactions of the Association for Computational Linguistics (TACL), 2021

  10. arXiv:1911.01456  [pdf, other

    cs.CL

    Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

    Authors: Sarik Ghazarian, Ralph Weischedel, Aram Galstyan, Nanyun Peng

    Abstract: User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em pre… ▽ More

    Submitted 24 January, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

  11. arXiv:1904.10635  [pdf, other

    cs.CL

    Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings

    Authors: Sarik Ghazarian, Johnny Tian-Zheng Wei, Aram Galstyan, Nanyun Peng

    Abstract: Despite advances in open-domain dialogue systems, automatic evaluation of such systems is still a challenging problem. Traditional reference-based metrics such as BLEU are ineffective because there could be many valid responses for a given context that share no common words with reference responses. A recent work proposed Referenced metric and Unreferenced metric Blended Evaluation Routine (RUBER)… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Comments: 8 pages, 2 figures, NAACL 2019 Methods for Optimizing and Evaluating Neural Language Generation (NeuralGen workshop)

  12. arXiv:1804.10188  [pdf, other

    cs.LG cs.AI cs.CL cs.IT stat.ML

    Modeling Psychotherapy Dialogues with Kernelized Hashcode Representations: A Nonparametric Information-Theoretic Approach

    Authors: Sahil Garg, Irina Rish, Guillermo Cecchi, Palash Goyal, Sarik Ghazarian, Shuyang Gao, Greg Ver Steeg, Aram Galstyan

    Abstract: We propose a novel dialogue modeling framework, the first-ever nonparametric kernel functions based approach for dialogue modeling, which learns kernelized hashcodes as compressed text representations; unlike traditional deep learning models, it handles well relatively small datasets, while also scaling to large ones. We also derive a novel lower bound on mutual information, used as a model-select… ▽ More

    Submitted 9 September, 2019; v1 submitted 26 April, 2018; originally announced April 2018.

    Comments: Response generative based model added, along with human evaluation