Skip to main content

Showing 1–50 of 79 results for author: Black, A W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.10803  [pdf, other

    cs.CL eess.AS

    SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

    Authors: Cheol Jun Cho, Abdelrahman Mohamed, Shang-Wen Li, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Data-driven unit discovery in self-supervised learning (SSL) of speech has embarked on a new era of spoken language processing. Yet, the discovered units often remain in phonetic space and the units beyond phonemes are largely underexplored. Here, we demonstrate that a syllabic organization emerges in learning sentence-level representation of speech. In particular, we adopt "self-distillation" obj… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  2. arXiv:2310.10788  [pdf, other

    eess.AS cs.CL

    Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

    Authors: Cheol Jun Cho, Abdelrahman Mohamed, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Self-Supervised Learning (SSL) based models of speech have shown remarkable performance on a range of downstream tasks. These state-of-the-art models have remained blackboxes, but many recent studies have begun "probing" models like HuBERT, to correlate their internal representations to different aspects of speech. In this paper, we show "inference of articulatory kinematics" as fundamental proper… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  3. arXiv:2302.06774  [pdf, other

    eess.AS cs.SD

    Speaker-Independent Acoustic-to-Articulatory Speech Inversion

    Authors: Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: To build speech processing methods that can handle speech as naturally as humans, researchers have explored multiple ways of building an invertible map** from speech to an interpretable space. The articulatory space is a promising inversion target, since this space captures the mechanics of speech production. To this end, we build an acoustic-to-articulatory inversion (AAI) model that leverages… ▽ More

    Submitted 24 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  4. arXiv:2210.16498  [pdf, other

    eess.AS cs.SD

    Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

    Authors: Jiachen Lian, Alan W Black, Yi**g Lu, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

    Abstract: Articulatory representation learning is the fundamental research in modeling neural speech production system. Our previous work has established a deep paradigm to decompose the articulatory kinematics data into gestures, which explicitly model the phonological and linguistic structure encoded with human speech production mechanism, and corresponding gestural scores. We continue with this line of w… ▽ More

    Submitted 20 February, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: Accepted to 2023 ICASSP. Camera Ready

  5. arXiv:2210.15734  [pdf, other

    cs.CL cs.SD eess.AS

    Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

    Authors: Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe

    Abstract: End-to-end spoken language understanding (SLU) systems are gaining popularity over cascaded approaches due to their simplicity and ability to avoid error propagation. However, these systems model sequence labeling as a sequence prediction task causing a divergence from its well-established token-level tagging formulation. We build compositional end-to-end SLU systems that explicitly separate the a… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022 Findings. Our code and models will be publicly available as part of the ESPnet-SLU toolkit: https://github.com/espnet/espnet and the release can be followed here: https://github.com/espnet/espnet/pull/4735

  6. arXiv:2210.15272  [pdf, ps, other

    eess.AS cs.SD eess.SP

    A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution

    Authors: Yisi Liu, Peter Wu, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Estimation of fundamental frequency (F0) in voiced segments of speech signals, also known as pitch tracking, plays a crucial role in pitch synchronous speech analysis, speech synthesis, and speech manipulation. In this paper, we capitalize on the high time and frequency resolution of the pseudo Wigner-Ville distribution (PWVD) and propose a new PWVD-based pitch estimation method. We devise an effi… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  7. arXiv:2210.05200  [pdf, other

    cs.CL cs.SD eess.AS

    CTC Alignments Improve Autoregressive Translation

    Authors: Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe

    Abstract: Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment. However for translation, CTC exhibits clear limitations due to the contextual and non-monotonic nature of the task and thus lags behind attentional decoder approaches in terms of translation quality. In this work, we argue that CT… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  8. arXiv:2209.06337  [pdf, other

    eess.AS cs.SD q-bio.QM

    Deep Speech Synthesis from Articulatory Representations

    Authors: Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: In the articulatory synthesis task, speech is synthesized from input features containing information about the physical behavior of the human vocal tract. This task provides a promising direction for speech synthesis research, as the articulatory space is compact, smooth, and interpretable. Current works have highlighted the potential for deep learning models to perform articulatory synthesis. How… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  9. arXiv:2209.02842  [pdf, other

    cs.CL

    ASR2K: Speech Recognition for Around 2000 Languages without Audio

    Authors: Xinjian Li, Florian Metze, David R Mortensen, Alan W Black, Shinji Watanabe

    Abstract: Most recent speech recognition models rely on large supervised datasets, which are unavailable for many low-resource languages. In this work, we present a speech recognition pipeline that does not require any audio for the target language. The only assumption is that we have access to raw text datasets or a set of n-gram statistics. Our speech pipeline consists of three components: acoustic, pronu… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: INTERSPEECH 2022

  10. arXiv:2207.00688  [pdf, other

    cs.CL cs.SD eess.AS

    Building African Voices

    Authors: Perez Ogayo, Graham Neubig, Alan W Black

    Abstract: Modern speech synthesis techniques can produce natural-sounding speech given sufficient high-quality data and compute resources. However, such data is not readily available for many languages. This paper focuses on speech synthesis for low-resourced African languages, from corpus creation to sharing and deploying the Text-to-Speech (TTS) systems. We first create a set of general-purpose instructio… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

  11. arXiv:2205.11686  [pdf, other

    cs.CL cs.CV

    On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization

    Authors: Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasović

    Abstract: Combining the visual modality with pretrained language models has been surprisingly effective for simple descriptive tasks such as image captioning. More general text generation however remains elusive. We take a step back and ask: How do these models work for more complex generative tasks, i.e. conditioning on both text and images? Are multimodal models simply visually adapted language models, or… ▽ More

    Submitted 22 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: v2: EMNLP Findings 2022 accepted paper camera-ready version. 9 pages main, 2 pages appendix

  12. arXiv:2204.00465  [pdf, other

    eess.AS cs.AI eess.SP

    Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

    Authors: Jiachen Lian, Alan W Black, Louis Goldstein, Gopala Krishna Anumanchipalli

    Abstract: Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data… ▽ More

    Submitted 20 June, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: Accepted to 2022 Interspeech. Code is publicly available at https://github.com/Berkeley-Speech-Group/ema_gesture

  13. arXiv:2111.14706  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

    Authors: Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

    Abstract: As Automatic Speech Processing (ASR) systems are getting better, there is an increasing interest of using the ASR output to do downstream Natural Language Processing (NLP) tasks. However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks. Hence, there is a need to build an open source standard that can b… ▽ More

    Submitted 3 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Accepted at ICASSP 2022 (5 pages)

  14. arXiv:2111.01326  [pdf, other

    eess.AS cs.CL cs.SD

    Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

    Authors: Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W Black

    Abstract: Speech processing systems currently do not support the vast majority of languages, in part due to the lack of data in low-resource languages. Cross-lingual transfer offers a compelling way to help bridge this digital divide by incorporating high-resource data into low-resource systems. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-re… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  15. arXiv:2111.01231  [pdf, other

    cs.CL

    Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

    Authors: Parul Chopra, Sai Krishna Rallabandi, Alan W Black, Khyathi Raghavi Chandu

    Abstract: Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is th… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted at EMNLP Findings 2021

  16. arXiv:2111.00610  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units

    Authors: Anurag Katakkar, Alan W Black

    Abstract: Language models (LMs) for text data have been studied extensively for their usefulness in language generation and other downstream tasks. However, language modelling purely in the speech domain is still a relatively unexplored topic, with traditional speech LMs often depending on auxiliary text LMs for learning distributional aspects of the language. For the English language, these LMs treat words… ▽ More

    Submitted 31 October, 2021; originally announced November 2021.

  17. arXiv:2110.09264  [pdf, other

    cs.CL cs.SD eess.AS

    Intent Classification Using Pre-trained Language Agnostic Embeddings For Low Resource Languages

    Authors: Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W Black, Rajiv Ratn Shah

    Abstract: Building Spoken Language Understanding (SLU) systems that do not rely on language specific Automatic Speech Recognition (ASR) is an important yet less explored problem in language processing. In this paper, we present a comparative study aimed at employing a pre-trained acoustic model to perform SLU in low resource scenarios. Specifically, we use three different embeddings extracted using Allosaur… ▽ More

    Submitted 18 April, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

  18. arXiv:2110.06263  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speech Summarization using Restricted Self-Attention

    Authors: Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze

    Abstract: Speech summarization is typically performed by using a cascade of speech recognition and text summarization models. End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences. Recent work in document summarization has inspired methods to reduce the complexity of self-attentions, which enables transformer models to… ▽ More

    Submitted 24 January, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted at ICASSP 2022

  19. arXiv:2106.15065  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding

    Authors: Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, Alan W Black

    Abstract: Decomposable tasks are complex and comprise of a hierarchy of sub-tasks. Spoken intent prediction, for example, combines automatic speech recognition and natural language understanding. Existing benchmarks, however, typically hold out examples for only the surface-level sub-task. As a result, models with similar performance on these benchmarks may have unobserved performance differences on the oth… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: INTERSPEECH 2021

  20. arXiv:2106.06004  [pdf, other

    cs.CL

    CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing

    Authors: Sai Muralidhar Jayanthi, Kavya Nerella, Khyathi Raghavi Chandu, Alan W Black

    Abstract: The NLP community has witnessed steep progress in a variety of tasks across the realms of monolingual and multilingual language processing recently. These successes, in conjunction with the proliferating mixed language interactions on social media have boosted interest in modeling code-mixed texts. In this work, we present CodemixedNLP, an open-source library with the goals of bringing together th… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted at the Fifth Workshop on Computational Approaches to Linguistic Code-Switching-CALCS 2021

  21. arXiv:2106.02192  [pdf, other

    cs.CL

    Grounding 'Grounding' in NLP

    Authors: Khyathi Raghavi Chandu, Yonatan Bisk, Alan W Black

    Abstract: The NLP community has seen substantial recent interest in grounding to facilitate interaction between language technologies and the world. However, as a community, we use the term broadly to reference any linking of text to data or non-textual modality. In contrast, Cognitive Science more formally defines "grounding" as the process of establishing what mutual information is required for successful… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: 24 pages

  22. arXiv:2104.12714  [pdf, other

    cs.CL

    Focused Attention Improves Document-Grounded Generation

    Authors: Shrimai Prabhumoye, Kazuma Hashimoto, Yingbo Zhou, Alan W Black, Ruslan Salakhutdinov

    Abstract: Document grounded generation is the task of using the information provided in a document to improve text generation. This work focuses on two different document grounded generation tasks: Wikipedia Update Generation task and Dialogue response generation. Our work introduces two novel adaptations of large scale pre-trained encoder-decoder models focusing on building context driven representation of… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted at North American Chapter of the Association for Computational Linguistics (NAACL) 2021

  23. arXiv:2104.01287  [pdf, other

    cs.CL

    Intent Recognition and Unsupervised Slot Identification for Low Resourced Spoken Dialog Systems

    Authors: Akshat Gupta, Olivia Deng, Akruti Kushwaha, Saloni Mittal, William Zeng, Sai Krishna Rallabandi, Alan W Black

    Abstract: Intent Recognition and Slot Identification are crucial components in spoken language understanding (SLU) systems. In this paper, we present a novel approach towards both these tasks in the context of low resourced and unwritten languages. We present an acoustic based SLU system that converts speech to its phonetic transcription using a universal phone recognition system. We build a word-free natur… ▽ More

    Submitted 28 September, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  24. arXiv:2103.14797  [pdf, other

    cs.CL cs.LG

    Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

    Authors: Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, Alan W Black

    Abstract: Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and… ▽ More

    Submitted 1 October, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

  25. arXiv:2102.08345  [pdf, other

    cs.CL

    NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

    Authors: Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard Hovy, Alan W Black

    Abstract: When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, ty** questions into a search engine, or even translating questions to languages supported by the QA system. While there has been significant community attention devoted to identifying correct answers in passages assuming a perfectly formed q… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: EACL 2021

  26. arXiv:2012.00876  [pdf, other

    cs.CL eess.AS

    Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios

    Authors: Peter Wu, Yifan Zhong, Alan W Black

    Abstract: Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches. In this work, we propose a method to analyze language similarity using deep learning. Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language fam… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  27. arXiv:2011.03646  [pdf, other

    cs.CL cs.AI

    Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

    Authors: Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W Black

    Abstract: With recent advancements in language technologies, humans are now speaking to devices. Increasing the reach of spoken language technologies requires building systems in local languages. A major bottleneck here are the underlying data-intensive parts that make up such systems, including automatic speech recognition (ASR) systems that require large amounts of labelled data. With the aim of aiding de… ▽ More

    Submitted 19 February, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

  28. arXiv:2010.16411  [pdf, ps, other

    cs.CL

    Mere account mein kitna balance hai? -- On building voice enabled Banking Services for Multilingual Communities

    Authors: Akshat Gupta, Sai Krishna Rallabandi, Alan W Black

    Abstract: Tremendous progress in speech and language processing has brought language technologies closer to daily human life. Voice technology has the potential to act as a horizontal enabling layer across all aspects of digitization. It is especially beneficial to rural communities in scenarios like a pandemic. In this work we present our initial exploratory work towards one such direction -- building voic… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

  29. arXiv:2010.10472  [pdf, other

    cs.CL

    Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

    Authors: Yiyuan Li, Antonios Anastasopoulos, Alan W Black

    Abstract: Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict and large corpora are usually required to collect enough examples. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, impro… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: 9 pages

  30. arXiv:2010.07279  [pdf, other

    cs.CL

    Positioning yourself in the maze of Neural Text Generation: A Task-Agnostic Survey

    Authors: Khyathi Raghavi Chandu, Alan W Black

    Abstract: Neural text generation metamorphosed into several critical natural language applications ranging from text completion to free form narrative generation. In order to progress research in text generation, it is critical to absorb the existing research works and position ourselves in this massively growing field. Specifically, this paper surveys the fundamental components of modeling approaches relay… ▽ More

    Submitted 25 March, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: 16 pages

  31. arXiv:2010.04658  [pdf, other

    cs.CL

    Case Study: Deontological Ethics in NLP

    Authors: Shrimai Prabhumoye, Brendon Boldt, Ruslan Salakhutdinov, Alan W Black

    Abstract: Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive language; and building frameworks for better system design and data handling practices. However, there has been little discussion about the ethical foundations that underlie… ▽ More

    Submitted 12 April, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted at North American Chapter of the Association for Computational Linguistics (NAACL) 2021

  32. arXiv:2008.04820  [pdf, other

    cs.CL cs.IR cs.LG

    LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification

    Authors: Sopan Khosla, Rishabh Joshi, Ritam Dutt, Alan W Black, Yulia Tsvetkov

    Abstract: In this paper we describe our submission for the task of Propaganda Span Identification in news articles. We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda. The "multi-granular" model incorporates linguistic knowledge at various levels of text granularity, including word, sentence and docum… ▽ More

    Submitted 20 August, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

  33. arXiv:2007.12948  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    Nonlinear ISA with Auxiliary Variables for Learning Speech Representations

    Authors: Amrith Setlur, Barnabas Poczos, Alan W Black

    Abstract: This paper extends recent work on nonlinear Independent Component Analysis (ICA) by introducing a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables. Observed high dimensional acoustic features like log Mel spectrograms can be considered as surface level manifestations of nonlinear transformations over individual multivariate sources of i… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: To be presented at Interspeech 2020

  34. arXiv:2006.05986  [pdf, other

    cs.CL cs.AI cs.LG

    ClarQ: A large-scale and diverse dataset for Clarification Question Generation

    Authors: Vaibhav Kumar, Alan W. black

    Abstract: Question answering and conversational systems are often baffled and need help clarifying certain ambiguities. However, limitations of existing datasets hinder the development of large-scale models capable of generating and utilising clarification questions. In order to overcome these limitations, we devise a novel bootstrap** framework (based on self-supervision) that assists in the creation of… ▽ More

    Submitted 11 June, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Accepted at ACL 2020

  35. arXiv:2005.13962  [pdf, other

    cs.CL

    A Corpus for Large-Scale Phonetic Typology

    Authors: Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W Black, Jason Eisner

    Abstract: A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions. We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology, with aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants. Access to such data can greatly… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: Accepted to ACL2020

  36. arXiv:2005.13681  [pdf, other

    cs.CL cs.SD eess.AS

    Phone Features Improve Speech Translation

    Authors: Elizabeth Salesky, Alan W Black

    Abstract: End-to-end models for speech translation (ST) more tightly couple speech recognition (ASR) and machine translation (MT) than a traditional cascade of separate ASR and MT models, with simpler model architectures and the potential for reduced error propagation. Their performance is often assumed to be superior, though in many conditions this is not yet the case. We compare cascaded and end-to-end mo… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: Accepted to ACL2020

  37. arXiv:2005.01822  [pdf, other

    cs.CL

    Exploring Controllable Text Generation Techniques

    Authors: Shrimai Prabhumoye, Alan W Black, Ruslan Salakhutdinov

    Abstract: Neural controllable text generation is an important area gaining attention due to its plethora of applications. Although there is a large body of prior work in controllable text generation, there is no unifying theme. In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules. The control of attributes in the generation process requires modi… ▽ More

    Submitted 30 October, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

    Comments: Will be published at COLING 2020

  38. arXiv:2005.00458  [pdf, other

    cs.CL

    Style Variation as a Vantage Point for Code-Switching

    Authors: Khyathi Raghavi Chandu, Alan W Black

    Abstract: Code-Switching (CS) is a common phenomenon observed in several bilingual and multilingual communities, thereby attaining prevalence in digital and social media platforms. This increasing prominence demands the need to model CS languages for critical downstream tasks. A major problem in this domain is the dearth of annotated data and a substantial corpora to train large scale neural models. Generat… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  39. arXiv:2005.00432  [pdf, ps, other

    cs.CL

    Topological Sort for Sentence Ordering

    Authors: Shrimai Prabhumoye, Ruslan Salakhutdinov, Alan W Black

    Abstract: Sentence ordering is the task of arranging the sentences of a given text in the correct order. Recent work using deep neural networks for this task has framed it as a sequence prediction problem. In this paper, we propose a new framing of this task as a constraint solving problem and introduce a new technique to solve it. Additionally, we propose a human evaluation for this task. The results on bo… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: Will be published at the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) 2020

  40. arXiv:2004.14257  [pdf, other

    cs.CL

    Politeness Transfer: A Tag and Generate Approach

    Authors: Aman Madaan, Amrith Setlur, Tanmay Parekh, Barnabas Poczos, Graham Neubig, Yiming Yang, Ruslan Salakhutdinov, Alan W Black, Shrimai Prabhumoye

    Abstract: This paper introduces a new task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. We also provide a dataset of more than 1.39 instances automatically labeled for politeness to encourage benchmark evaluations on this new task. We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a… ▽ More

    Submitted 1 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: To appear at ACL 2020

  41. arXiv:2004.08031  [pdf, other

    cs.CL

    AlloVera: A Multilingual Allophone Database

    Authors: David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. Black, Florian Metze, Graham Neubig

    Abstract: We introduce a new resource, AlloVera, which provides map**s from 218 allophones to phonemes for 14 languages. Phonemes are contrastive phonological units, and allophones are their various concrete realizations, which are predictable from phonological context. While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a uni… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

    Comments: 8 pages, LREC 2020

  42. arXiv:2002.11800  [pdf, other

    cs.CL cs.SD eess.AS

    Universal Phone Recognition with a Multilingual Allophone System

    Authors: Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W Black, Florian Metze

    Abstract: Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can support lexical contrasts in a particular language) and their corresponding phones (the sounds that are actually spoken, which are language independent). This c… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: ICASSP 2020

  43. arXiv:2002.11781  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Zero-shot Learning for Automatic Phonemic Transcription

    Authors: Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W Black, Florian Metze

    Abstract: Automatic phonemic transcription tools are useful for low-resource language documentation. However, due to the lack of training sets, only a tiny fraction of languages have phonemic transcription tools. Fortunately, multilingual acoustic modeling provides a solution given limited audio training data. A more challenging problem is to build phonemic transcribers for languages with zero training data… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: AAAI 2020

  44. arXiv:2001.03521  [pdf, ps, other

    cs.CL cs.AI

    Towards Minimal Supervision BERT-based Grammar Error Correction

    Authors: Yiyuan Li, Antonios Anastasopoulos, Alan W Black

    Abstract: Current grammatical error correction (GEC) models typically consider the task as sequence generation, which requires large amounts of annotated data and limit the applications in data-limited settings. We try to incorporate contextual information from pre-trained language model to leverage annotation and benefit multilingual scenarios. Results show strong potential of Bidirectional Encoder Represe… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

  45. arXiv:1912.01772  [pdf, ps, other

    cs.CL

    A Resource for Computational Experiments on Mapudungun

    Authors: Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, Alan W Black

    Abstract: We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers. We provide 142 hours of culturally significant conversations in the domain of medical treatment. The conversations are fully transcribed and translated into Spanish. The transcriptions also include annotations for code-switching and non-stand… ▽ More

    Submitted 4 April, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: accepted at LREC 2020

  46. arXiv:1911.00841  [pdf, other

    cs.CL

    Question Answering for Privacy Policies: Combining Computational and Legal Perspectives

    Authors: Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton, Norman Sadeh

    Abstract: Privacy policies are long and complex documents that are difficult for users to read and understand, and yet, they have legal effects on how user data is collected, managed and used. Ideally, we would like to empower users to inform themselves about issues that matter to them, and enable them to selectively explore those issues. We present PrivacyQA, a corpus consisting of 1750 questions about the… ▽ More

    Submitted 3 November, 2019; originally announced November 2019.

    Comments: EMNLP 2019

  47. arXiv:1909.13426  [pdf, other

    cs.CL

    A Dynamic Strategy Coach for Effective Negotiation

    Authors: Yiheng Zhou, He He, Alan W Black, Yulia Tsvetkov

    Abstract: Negotiation is a complex activity involving strategic reasoning, persuasion, and psychology. An average person is often far from an expert in negotiation. Our goal is to assist humans to become better negotiators through a machine-in-the-loop approach that combines machine's advantage at data-driven decision-making and human's language generation ability. We consider a bargaining scenario where a… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Comments: In Proceedings of SigDial 2019

  48. arXiv:1909.13425  [pdf, other

    cs.CL

    Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History

    Authors: Yiheng Zhou, Yulia Tsvetkov, Alan W Black, Zhou Yu

    Abstract: We study non-collaborative dialogs, where two agents have a conflict of interest but must strategically communicate to reach an agreement (e.g., negotiation). This setting poses new challenges for modeling dialog history because the dialog's outcome relies not only on the semantic intent, but also on tactics that convey the intent. We propose to model both semantic and tactic history using finite… ▽ More

    Submitted 29 September, 2019; originally announced September 2019.

    Comments: Unpublished preprint

  49. arXiv:1909.09699  [pdf, other

    cs.CL cs.LG stat.ML

    Induction and Reference of Entities in a Visual Story

    Authors: Ruo-** Dong, Khyathi Raghavi Chandu, Alan W Black

    Abstract: We are enveloped by stories of visual interpretations in our everyday lives. The way we narrate a story often comprises of two stages, which are, forming a central mind map of entities and then weaving a story around them. A contributing factor to coherence is not just basing the story on these entities but also, referring to them using appropriate terms to avoid repetition. In this paper, we addr… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

    Comments: 9 pages, 4 figures, 3 tables

  50. arXiv:1909.01322  [pdf, other

    cs.CL cs.HC

    CMU GetGoing: An Understandable and Memorable Dialog System for Seniors

    Authors: Shikib Mehri, Alan W Black, Maxine Eskenazi

    Abstract: Voice-based technologies are typically developed for the average user, and thus generally not tailored to the specific needs of any subgroup of the population, like seniors. This paper presents CMU GetGoing, an accessible trip planning dialog system designed for senior users. The GetGoing system design is described in detail, with particular attention to the senior-tailored features. A user study… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: Accepted to the Dialog for Good (DiGo) workshop (http://dialogforgood.org) at SIGDial 2019