Skip to main content

Showing 1–13 of 13 results for author: Hayati, S A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09127  [pdf, other

    cs.CL

    Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation

    Authors: Ruixin Yang, Dheeraj Rajagopal, Shirley Anugrah Hayati, Bin Hu, Dongyeop Kang

    Abstract: Uncertainty estimation is a significant issue for current large language models (LLMs) that are generally poorly calibrated and over-confident, especially with reinforcement learning from human feedback (RLHF). Unlike humans, whose decisions and confidences not only stem from intrinsic beliefs but can also be adjusted through daily observations, existing calibration methods for LLMs focus on estim… ▽ More

    Submitted 10 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted at ICLR 2024 Workshop on Reliable and Responsible Foundation Models

  2. arXiv:2402.11532  [pdf, other

    cs.CL

    Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

    Authors: Shirley Anugrah Hayati, Taehee Jung, Tristan Bodding-Long, Sudipta Kar, Abhinav Sethy, Joo-Kyung Kim, Dongyeop Kang

    Abstract: Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model's generalization to different tasks, even for unseen tasks. However, most existing instruction datasets include only single instructions, and they struggle to follow complex instructions composed of multiple subtasks. In this work, we propose a novel concept of compositional instruct… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  3. arXiv:2401.14698  [pdf, other

    cs.CL cs.AI

    Under the Surface: Tracking the Artifactuality of LLM-Generated Data

    Authors: Debarati Das, Karin De Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa Lee, Zae Myung Kim, Shirley Anugrah Hayati, Risako Owan, Bin Hu, Ritik Parkar, Ryan Koo, Jonginn Park, Aahan Tyagi, Libby Ferland, Sanjali Roy, Vincent Liu, Dongyeop Kang

    Abstract: This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant c… ▽ More

    Submitted 30 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Core Authors: Debarati Das, Karin De Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa Lee and Zae Myung Kim | Project lead : Debarati Das | PI : Dongyeop Kang

  4. arXiv:2311.09799  [pdf, other

    cs.CL

    How Far Can We Extract Diverse Perspectives from Large Language Models?

    Authors: Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang

    Abstract: Collecting diverse human opinions is costly and challenging. This leads to a recent trend in collaborative efforts between humans and Large Language Models (LLMs) for generating diverse data, offering potential scalable and efficient solutions. However, the extent of LLMs' capability to generate diverse perspectives on subjective topics remains an unexplored question. In this study, we investigate… ▽ More

    Submitted 18 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  5. arXiv:2212.08279  [pdf, other

    cs.LG cs.CL cs.CV

    Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games

    Authors: Bolin Lai, Hongxin Zhang, Miao Liu, Aryan Pariani, Fiona Ryan, Wenqi Jia, Shirley Anugrah Hayati, James M. Rehg, Diyi Yang

    Abstract: Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: 17 pages

  6. arXiv:2211.05182  [pdf, ps, other

    cs.HC cs.AI

    Modeling Motivational Interviewing Strategies On An Online Peer-to-Peer Counseling Platform

    Authors: Raj Sanjay Shah, Faye Holt, Shirley Anugrah Hayati, Aastha Agarwal, Yi-Chia Wang, Robert E. Kraut, Diyi Yang

    Abstract: Millions of people participate in online peer-to-peer support sessions, yet there has been little prior research on systematic psychology-based evaluations of fine-grained peer-counselor behavior in relation to client satisfaction. This paper seeks to bridge this gap by map** peer-counselor chat-messages to motivational interviewing (MI) techniques. We annotate 14,797 utterances from 734 chat co… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: Accepted at CSCW 2022

  7. arXiv:2210.07469  [pdf, other

    cs.CL

    StyLEx: Explaining Style Using Human Lexical Annotations

    Authors: Shirley Anugrah Hayati, Kyumin Park, Dheeraj Rajagopal, Lyle Ungar, Dongyeop Kang

    Abstract: Large pre-trained language models have achieved impressive results on various style classification tasks, but they often learn spurious domain-specific words to make predictions (Hayati et al., 2021). While human explanation highlights stylistic tokens as important features for this task, we observe that model explanations often do not align with them. To tackle this issue, we introduce StyLEx, a… ▽ More

    Submitted 14 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: EACL 2023

  8. arXiv:2112.06050  [pdf, other

    cs.HC

    Real-Time Detection of Crowded Buses via Mobile Phones

    Authors: Alex Haig, Shirley Anugrah Hayati, Anthony Tomasic

    Abstract: Automated passenger counting (APC) technology is central to many aspects of the public transit experience. APC information informs public transit planners about utilization in a public transit system and operations about dynamic fluctuations in demand. Perhaps most importantly, APC information provides one metric to the rider experience - standing during a long ride because of a crowded vehicle is… ▽ More

    Submitted 11 December, 2021; originally announced December 2021.

    Journal ref: Poster Session, The Transportation Research Board (TRB) 98th Annual Meeting, 2019

  9. arXiv:2109.02738  [pdf, other

    cs.CL

    Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica

    Authors: Shirley Anugrah Hayati, Dongyeop Kang, Lyle Ungar

    Abstract: People convey their intention and attitude through linguistic styles of the text that they write. In this study, we investigate lexicon usages across styles throughout two lenses: human perception and machine word importance, since words differ in the strength of the stylistic cues that they provide. To collect labels of human perception, we curate a new dataset, Hummingbird, on top of benchmarkin… ▽ More

    Submitted 12 November, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP 2021 Main Conference, updated typos and Appendix

  10. arXiv:2105.00825  [pdf, other

    cs.CL cs.AI

    DEUX: An Attribute-Guided Framework for Sociable Recommendation Dialog Systems

    Authors: Yu Li, Shirley Anugrah Hayati, Weiyan Shi, Zhou Yu

    Abstract: It is important for sociable recommendation dialog systems to perform as both on-task content and social content to engage users and gain their favor. In addition to understand the user preferences and provide a satisfying recommendation, such systems must be able to generate coherent and natural social conversations to the user. Traditional dialog state tracking cannot be applied to such systems… ▽ More

    Submitted 16 April, 2021; originally announced May 2021.

  11. arXiv:2009.14306  [pdf, other

    cs.CL

    INSPIRED: Toward Sociable Recommendation Dialog Systems

    Authors: Shirley Anugrah Hayati, Dongyeop Kang, Qingxiaoyang Zhu, Weiyan Shi, Zhou Yu

    Abstract: In recommendation dialogs, humans commonly disclose their preference and make recommendations in a friendly manner. However, this is a challenge when develo** a sociable recommendation dialog system, due to the lack of dialog dataset annotated with such sociable strategies. Therefore, we present INSPIRED, a new dataset of 1,001 human-human dialogs for movie recommendation with measures for succe… ▽ More

    Submitted 8 October, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: Accepted as a long paper at EMNLP 2020, corrected typos

  12. arXiv:2004.13203  [pdf, other

    cs.CL

    A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization

    Authors: Graham Neubig, Shruti Rijhwani, Alexis Palmer, Jordan MacKenzie, Hilaria Cruz, Xinjian Li, Matthew Lee, Aditi Chaudhary, Luke Gessler, Steven Abney, Shirley Anugrah Hayati, Antonios Anastasopoulos, Olga Zamaraeva, Emily Prud'hommeaux, Jennette Child, Sara Child, Rebecca Knowles, Sarah Moeller, Jeffrey Micher, Yiyuan Li, Sydney Zink, Mengzhou Xia, Roshan S Sharma, Patrick Littell

    Abstract: Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited. In August 2019, a workshop was held at Carnegie Mellon University in Pittsburgh to attempt to bring together language community members, documentary linguists, and technologists to discuss how to bridge this gap and cr… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted at SLTU-CCURL 2020

  13. arXiv:1808.10025  [pdf, other

    cs.CL

    Retrieval-Based Neural Code Generation

    Authors: Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, Graham Neubig

    Abstract: In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it possible to explicitly reference existing code example… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: This paper is accepted in EMNLP 2018. It has 6 pages