Skip to main content

Showing 1–14 of 14 results for author: Zayats, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19316  [pdf, other

    cs.LG cs.CL

    Robust Preference Optimization through Reward Model Distillation

    Authors: Adam Fisch, Jacob Eisenstein, Vicky Zayats, Alekh Agarwal, Ahmad Beirami, Chirag Nagpal, Pete Shaw, Jonathan Berant

    Abstract: Language model (LM) post-training (or alignment) involves maximizing a reward function that is derived from preference annotations. Direct Preference Optimization (DPO) is a popular offline alignment method that trains a policy directly on preference data without the need to train a reward model or apply reinforcement learning. However, typical preference datasets have only a single, or at most a… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2405.18669  [pdf, other

    cs.LG cs.AI cs.CL eess.AS

    Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

    Authors: Vicky Zayats, Peter Chen, Melissa Ferrari, Dirk Padfield

    Abstract: Integrating multiple generative foundation models, especially those trained on different modalities, into something greater than the sum of its parts poses significant challenges. Two key hurdles are the availability of aligned data (concepts that contain similar meaning but is expressed differently in different modalities), and effectively leveraging unimodal representations in cross-domain gener… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Under review at NeurIPS

  3. arXiv:2306.12925  [pdf, other

    cs.CL cs.AI cs.SD eess.AS stat.ML

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Authors: Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats , et al. (5 additional authors not shown)

    Abstract: We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: Technical report

  4. arXiv:2305.12029  [pdf, other

    cs.CL cs.AI cs.LG

    MultiTurnCleanup: A Benchmark for Multi-Turn Spoken Conversational Transcript Cleanup

    Authors: Hua Shen, Vicky Zayats, Johann C. Rocholl, Daniel D. Walker, Dirk Padfield

    Abstract: Current disfluency detection models focus on individual utterances each from a single speaker. However, numerous discontinuity phenomena in spoken conversational transcripts occur across multiple turns, hampering human readability and the performance of downstream NLP tasks. This study addresses these phenomena by proposing an innovative Multi-Turn Cleanup task for spoken conversational transcript… ▽ More

    Submitted 27 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 main conference. Dataset: https://github.com/huashen218/MultiTurnCleanup

  5. arXiv:2205.00620  [pdf, other

    cs.CL

    Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection

    Authors: Angelica Chen, Vicky Zayats, Daniel D. Walker, Dirk Padfield

    Abstract: In modern interactive speech-based systems, speech is consumed and transcribed incrementally prior to having disfluencies removed. This post-processing step is crucial for producing clean transcripts and high performance on downstream tasks (e.g. machine translation). However, most current state-of-the-art NLP models such as the Transformer operate non-incrementally, potentially causing unacceptab… ▽ More

    Submitted 1 May, 2022; originally announced May 2022.

    Comments: To be published at NAACL 2022

  6. arXiv:2109.06952  [pdf, other

    cs.CL cs.SD eess.AS

    Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech

    Authors: Katrin Tomanek, Vicky Zayats, Dirk Padfield, Kara Vaillancourt, Fadi Biadsy

    Abstract: Automatic Speech Recognition (ASR) systems are often optimized to work best for speakers with canonical speech patterns. Unfortunately, these systems perform poorly when tested on atypical speech and heavily accented speech. It has previously been shown that personalization through model fine-tuning substantially improves performance. However, maintaining such large models per speaker is costly an… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted to EMNLP 2021

  7. arXiv:2104.10769  [pdf, ps, other

    cs.CL

    Disfluency Detection with Unlabeled Data and Small BERT Models

    Authors: Johann C. Rocholl, Vicky Zayats, Daniel D. Walker, Noah B. Murad, Aaron Schneider, Daniel J. Liebling

    Abstract: Disfluency detection models now approach high accuracy on English text. However, little exploration has been done in improving the size and inference time of the model. At the same time, automatic speech recognition (ASR) models are moving from server-side inference to local, on-device inference. Supporting models in the transcription pipeline (like disfluency detection) must follow suit. In this… ▽ More

    Submitted 27 July, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: INTERSPEECH 2021

  8. arXiv:2101.10573  [pdf, other

    cs.CL

    Representations for Question Answering from Documents with Tables and Text

    Authors: Vicky Zayats, Kristina Toutanova, Mari Ostendorf

    Abstract: Tables in Web documents are pervasive and can be directly used to answer many of the queries searched on the Web, motivating their integration in question answering. Very often information presented in tables is succinct and hard to interpret with standard language representations. On the other hand, tables often appear within textual context, such as an article describing the table. Using the inf… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: To appear at EACL 2021

  9. arXiv:1904.04398  [pdf, other

    cs.CL

    Disfluencies and Human Speech Transcription Errors

    Authors: Vicky Zayats, Trang Tran, Richard Wright, Courtney Mansfield, Mari Ostendorf

    Abstract: This paper explores contexts associated with errors in transcrip-tion of spontaneous speech, shedding light on human perceptionof disfluencies and other conversational speech phenomena. Anew version of the Switchboard corpus is provided with disfluency annotations for careful speech transcripts, together with results showing the impact of transcription errors on evaluation of automatic disfluency… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: Submitted to INTERSPEECH 2019

  10. arXiv:1904.04388  [pdf, other

    cs.CL cs.AI

    Giving Attention to the Unexpected: Using Prosody Innovations in Disfluency Detection

    Authors: Vicky Zayats, Mari Ostendorf

    Abstract: Disfluencies in spontaneous speech are known to be associated with prosodic disruptions. However, most algorithms for disfluency detection use only word transcripts. Integrating prosodic cues has proved difficult because of the many sources of variability affecting the acoustic correlates. This paper introduces a new approach to extracting acoustic-prosodic cues using text-based distributional pre… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: Accepted at NAACL-HLT 2019

  11. arXiv:1811.07236  [pdf, other

    cs.CL cs.AI

    Robust cross-domain disfluency detection with pattern match networks

    Authors: Vicky Zayats, Mari Ostendorf

    Abstract: In this paper we introduce a novel pattern match neural network architecture that uses neighbor similarity scores as features, eliminating the need for feature engineering in a disfluency detection task. We evaluate the approach in disfluency detection for four different speech genres, showing that the approach is as effective as hand-engineered pattern match features when used on in-domain data a… ▽ More

    Submitted 17 November, 2018; originally announced November 2018.

    Comments: This paper was submitted to EMNLP 2018 and was rejected. Our EMNLP submission is posted here to establish concurrency with "Disfluency Detection using Auto-Correlational Neural Networks" by P. Lou, P. Anderson, M. Johnson which was submitted to EMNLP at the same time

  12. arXiv:1704.02080  [pdf, other

    cs.CL

    Conversation Modeling on Reddit using a Graph-Structured LSTM

    Authors: Vicky Zayats, Mari Ostendorf

    Abstract: This paper presents a novel approach for modeling threaded discussions on social media using a graph-structured bidirectional LSTM which represents both hierarchical and temporal conversation structure. In experiments with a task of predicting popularity of comments in Reddit discussions, the proposed model outperforms a node-independent architecture for different sets of input features. Analyses… ▽ More

    Submitted 6 April, 2017; originally announced April 2017.

    Comments: Submitted to TACL

  13. arXiv:1604.03209  [pdf, other

    cs.CL

    Disfluency Detection using a Bidirectional LSTM

    Authors: Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi

    Abstract: We introduce a new approach for disfluency detection using a Bidirectional Long-Short Term Memory neural network (BLSTM). In addition to the word sequence, the model takes as input pattern match features that were developed to reduce sensitivity to vocabulary size in training, which lead to improved performance over the word sequence alone. The BLSTM takes advantage of explicit repair states in ad… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

    Comments: Submitted to INTERSPEECH 2016

  14. arXiv:1507.02205  [pdf, other

    cs.CL cs.SI

    Talking to the crowd: What do people react to in online discussions?

    Authors: Aaron Jaech, Victoria Zayats, Hao Fang, Mari Ostendorf, Hannaneh Hajishirzi

    Abstract: This paper addresses the question of how language use affects community reaction to comments in online discussion forums, and the relative importance of the message vs. the messenger. A new comment ranking task is proposed based on community annotated karma in Reddit discussions, which controls for topic and timing of comments. Experimental work with discussion threads from six subreddits shows th… ▽ More

    Submitted 16 August, 2015; v1 submitted 8 July, 2015; originally announced July 2015.