Search | arXiv e-print repository

Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents

Authors: John L. Zhou, Weizhe Hong, Jonathan C. Kao

Abstract: Emergent cooperation among self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, naïve reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging class of opponent-sha** methods have demonstrated the ability to re… ▽ More Emergent cooperation among self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, naïve reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging class of opponent-sha** methods have demonstrated the ability to reach prosocial outcomes by influencing the learning of other agents. However, they rely on higher-order derivatives through the predicted learning step of other agents or learning meta-game dynamics, which in turn rely on stringent assumptions over opponent learning rules or exponential sample complexity, respectively. To provide a learning rule-agnostic and sample-efficient alternative, we introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of an opponent's actions on their returns. This approach effectively seeks to modify other agents' $Q$-values by increasing their return following beneficial actions (with respect to the Reciprocator) and decreasing it after detrimental actions, guiding them towards mutually beneficial actions without attempting to directly shape policy updates. We show that Reciprocators can be used to promote cooperation in a variety of temporally extended social dilemmas during simultaneous learning. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 9 pages, 4 figures

arXiv:2406.01538 [pdf, other]

What Are Large Language Models Map** to in the Brain? A Case Against Over-Reliance on Brain Scores

Authors: Ebrahim Feghhi, Nima Hadidi, Bryan Song, Idan A. Blank, Jonathan C. Kao

Abstract: Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called "brain score". Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share c… ▽ More Given the remarkable capabilities of large language models (LLMs), there has been a growing interest in evaluating their similarity to the human brain. One approach towards quantifying this similarity is by measuring how well a model predicts neural signals, also called "brain score". Internal representations from LLMs achieve state-of-the-art brain scores, leading to speculation that they share computational principles with human language processing. This inference is only valid if the subset of neural activity predicted by LLMs reflects core elements of language processing. Here, we question this assumption by analyzing three neural datasets used in an impactful study on LLM-to-brain map**s, with a particular focus on an fMRI dataset where participants read short passages. We first find that when using shuffled train-test splits, as done in previous studies with these datasets, a trivial feature that encodes temporal autocorrelation not only outperforms LLMs but also accounts for the majority of neural variance that LLMs explain. We therefore use contiguous splits moving forward. Second, we explain the surprisingly high brain scores of untrained LLMs by showing they do not account for additional neural variance beyond two simple features: sentence length and sentence position. This undermines evidence used to claim that the transformer architecture biases computations to be more brain-like. Third, we find that brain scores of trained LLMs on this dataset can largely be explained by sentence length, position, and pronoun-dereferenced static word embeddings; a small, additional amount is explained by sense-specific embeddings and contextual representations of sentence structure. We conclude that over-reliance on brain scores can lead to over-interpretations of similarity between LLMs and brains, and emphasize the importance of deconstructing what LLMs are map** to in neural signals. △ Less

Submitted 20 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 figures in the main paper

arXiv:2405.16557 [pdf, other]

Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

Authors: Chun-Kai Huang, Yi-Hsien Hsieh, Ta-Jung Chien, Li-Cheng Chien, Shao-Hua Sun, Tung-Hung Su, Jia-Horng Kao, Che Lin

Abstract: Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furth… ▽ More Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furthermore, these methods typically fail to provide robust initial embeddings for values infrequently observed or even absent within the training set, posing significant challenges to model generalizability. In response to these challenges, we propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token, effectively bypassing the need for imputation. SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism. Coupling SCANE with the Transformer Encoder architecture, we develop the Scalable nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries. Our experimental validation, conducted across three disparate electronic health record (EHR) datasets marked by elevated missing value frequencies, confirms the superior performance of SUMMIT over contemporary state-of-the-art approaches addressing similar challenges. These results substantiate the efficacy of SCANE and SUMMIT, underscoring their potential applicability across a broad spectrum of MTS data analytical tasks. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2312.05187 [pdf, other]

Seamless: Multilingual Expressive and Streaming Speech Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2310.08795 [pdf, other]

Mitigating Bias for Question Answering Models by Tracking Bias Influence

Authors: Mingyu Derek Ma, Jiun-Yu Kao, Arpit Gupta, Yu-Hsiang Lin, Wenbo Zhao, Tagyoung Chung, Wei Wang, Kai-Wei Chang, Nanyun Peng

Abstract: Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bi… ▽ More Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bias of multiple-choice QA models. Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance by observing its influence on another instance. If the influenced instance is more biased, we derive that the query instance is biased. We then use the bias level detected as an optimization objective to form a multi-task learning setting in addition to the original QA task. We further introduce a new bias evaluation metric to quantify bias in a comprehensive and sensitive way. We show that our method could be applied to multiple QA formulations across multiple bias categories. It can significantly reduce the bias level in all 9 bias categories in the BBQ dataset while maintaining comparable QA accuracy. △ Less

Submitted 17 June, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: To appear at NAACL 2024 main conference

arXiv:2308.15437 [pdf, ps, other]

doi 10.1103/PhysRevA.108.032414}

Existence of Pauli-like stabilizers for every quantum error-correcting code

Authors: Jhih-Yuan Kao, Hsi-Sheng Goan

Abstract: The Pauli stabilizer formalism is perhaps the most thoroughly studied means of procuring quantum error-correcting codes, whereby the code is obtained through commutative Pauli operators and ``stabilized'' by them. In this work we will show that every quantum error-correcting code, including Pauli stabilizer codes and subsystem codes, has a similar structure, in that the code can be stabilized by c… ▽ More The Pauli stabilizer formalism is perhaps the most thoroughly studied means of procuring quantum error-correcting codes, whereby the code is obtained through commutative Pauli operators and ``stabilized'' by them. In this work we will show that every quantum error-correcting code, including Pauli stabilizer codes and subsystem codes, has a similar structure, in that the code can be stabilized by commutative ``Paulian'' operators which share many features with Pauli operators and which form a \textbf{Paulian stabilizer group}. By facilitating a controlled gate we can measure these Paulian operators to acquire the error syndrome. Examples concerning codeword stabilized codes and bosonic codes will be presented; specifically, one of the examples has been demonstrated experimentally and the observable for detecting the error turns out to be Paulian, thereby showing the potential utility of this approach. This work provides a possible approach to implement error-correcting codes and to find new codes. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: 20 pages (including 7 appendices); to appear in Phys. Rev. A

Journal ref: Physical Review A 108, 032414 (2023)

arXiv:2308.11596 [pdf, other]

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

ACM Class: I.2.7

arXiv:2301.10915 [pdf, other]

Parameter-Efficient Low-Resource Dialogue State Tracking by Prompt Tuning

Authors: Mingyu Derek Ma, Jiun-Yu Kao, Shuyang Gao, Arpit Gupta, Di **, Tagyoung Chung, Nanyun Peng

Abstract: Dialogue state tracking (DST) is an important step in dialogue management to keep track of users' beliefs. Existing works fine-tune all language model (LM) parameters to tackle the DST task, which requires significant data and computing resources for training and hosting. The cost grows exponentially in the real-world deployment where dozens of fine-tuned LM are used for different domains and task… ▽ More Dialogue state tracking (DST) is an important step in dialogue management to keep track of users' beliefs. Existing works fine-tune all language model (LM) parameters to tackle the DST task, which requires significant data and computing resources for training and hosting. The cost grows exponentially in the real-world deployment where dozens of fine-tuned LM are used for different domains and tasks. To reduce parameter size and better utilize cross-task shared information, we propose to use soft prompt token embeddings to learn task properties. Without tuning LM parameters, our method drastically reduces the number of parameters needed to less than 0.5% of prior works while achieves better low-resource DST performance. △ Less

Submitted 29 May, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: In the INTERSPEECH 2023, and the Second Workshop on Efficient Natural Language and Speech Processing (ENLSP) at NeurIPS 2022

arXiv:2301.10606 [pdf, other]

A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

Authors: Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen

Abstract: Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech to target speech while maintaining translation accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at a time. Likewise, this research area lacks standard evaluation protocols and well-curated benchmark datasets. In this work, we propose a ho… ▽ More Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech to target speech while maintaining translation accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at a time. Likewise, this research area lacks standard evaluation protocols and well-curated benchmark datasets. In this work, we propose a holistic cascade system for expressive S2ST, combining multiple prosody transfer techniques previously considered only in isolation. We curate a benchmark expressivity test set in the TV series domain and explored a second dataset in the audiobook domain. Finally, we present a human evaluation protocol to assess multiple expressive dimensions across speech pairs. Experimental results indicate that bi-lingual annotators can assess the quality of expressive preservation in S2ST systems, and the holistic modeling approach outperforms single-aspect systems. Audio samples can be accessed through our demo webpage: https://facebookresearch.github.io/speech_translation/cascade_expressive_s2st. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: This is the full version of our submission to ICASSP 2023

arXiv:2212.08486 [pdf, other]

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

Authors: Mingda Chen, Paul-Ambroise Duquenne, Pierre Andrews, Justine Kao, Alexandre Mourachko, Holger Schwenk, Marta R. Costa-jussà

Abstract: End-to-End speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. This means that generated speech has to be automatically transcribed, making the evaluation dependent on the availability and quality of automatic speech recognition (ASR) systems. In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR sy… ▽ More End-to-End speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. This means that generated speech has to be automatically transcribed, making the evaluation dependent on the availability and quality of automatic speech recognition (ASR) systems. In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. BLASER leverages a multilingual multimodal encoder to directly encode the speech segments for source input, translation output and reference into a shared embedding space and computes a score of the translation quality that can be used as a proxy to human evaluation. To evaluate our approach, we construct training and evaluation sets from more than 40k human annotations covering seven language directions. The best results of BLASER are achieved by training with supervision from human rating scores. We show that when evaluated at the sentence level, BLASER correlates significantly better with human judgment compared to ASR-dependent metrics including ASR-SENTBLEU in all translation directions and ASR-COMET in five of them. Our analysis shows combining speech and text as inputs to BLASER does not increase the correlation with human scores, but best correlations are achieved when using speech, which motivates the goal of our research. Moreover, we show that using ASR for references is detrimental for text-based metrics. △ Less

Submitted 16 December, 2022; originally announced December 2022.

ACM Class: I.2.7

arXiv:2211.06474 [pdf, other]

Speech-to-Speech Translation For A Real-world Unwritten Language

Authors: Peng-Jen Chen, Kevin Tran, Yilin Yang, **gfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee

Abstract: We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating… ▽ More We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating human annotated data, automatically mining data from large unlabeled speech datasets, and adopting pseudo-labeling to produce weakly supervised data. On the modeling, we take advantage of recent advances in applying self-supervised discrete representations as target for prediction in S2ST and show the effectiveness of leveraging additional text supervision from Mandarin, a language similar to Hokkien, in model training. Finally, we release an S2ST benchmark set to facilitate future research in this field. The demo can be found at https://huggingface.co/spaces/facebook/Hokkien_Translation . △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2205.12239 [pdf, other]

Gacs-Korner Common Information Variational Autoencoder

Authors: Michael Kleinman, Alessandro Achille, Stefano Soatto, Jonathan Kao

Abstract: We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each. Our notion of common information is defined by an optimization problem over a family of functions and recovers the Gács-Körner common information as a special case. Importantly, our notion can be approximated emp… ▽ More We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each. Our notion of common information is defined by an optimization problem over a family of functions and recovers the Gács-Körner common information as a special case. Importantly, our notion can be approximated empirically using samples from the underlying data distribution. We then provide a method to partition and quantify the common and unique information using a simple modification of a traditional variational auto-encoder. Empirically, we demonstrate that our formulation allows us to learn semantically meaningful common and unique factors of variation even on high-dimensional data such as images and videos. Moreover, on datasets where ground-truth latent factors are known, we show that we can accurately quantify the common information between the random variables. △ Less

Submitted 5 November, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: Accepted to NeurIPS 2023

arXiv:2109.12211 [pdf, other]

Style Control for Schema-Guided Natural Language Generation

Authors: Alicia Y. Tsai, Shereen Oraby, Vittorio Perera, Jiun-Yu Kao, Yuheng Du, Anjali Narayan-Chen, Tagyoung Chung, Dilek Hakkani-Tur

Abstract: Natural Language Generation (NLG) for task-oriented dialogue systems focuses on communicating specific content accurately, fluently, and coherently. While these attributes are crucial for a successful dialogue, it is also desirable to simultaneously accomplish specific stylistic goals, such as response length, point-of-view, descriptiveness, sentiment, formality, and empathy. In this work, we focu… ▽ More Natural Language Generation (NLG) for task-oriented dialogue systems focuses on communicating specific content accurately, fluently, and coherently. While these attributes are crucial for a successful dialogue, it is also desirable to simultaneously accomplish specific stylistic goals, such as response length, point-of-view, descriptiveness, sentiment, formality, and empathy. In this work, we focus on stylistic control and evaluation for schema-guided NLG, with joint goals of achieving both semantic and stylistic control. We experiment in detail with various controlled generation methods for large pretrained language models: specifically, conditional training, guided fine-tuning, and guided decoding. We discuss their advantages and limitations, and evaluate them with a broad range of automatic and human evaluation metrics. Our results show that while high style accuracy and semantic correctness are easier to achieve for more lexically-defined styles with conditional training, stylistic control is also achievable for more semantically complex styles using discriminator-based guided decoding methods. The results also suggest that methods that are more scalable (with less hyper-parameters tuning) and that disentangle content generation and stylistic variations are more effective at achieving semantic correctness and style accuracy. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Comments: Accepted at the 3rd Workshop on NLP for ConvAI at EMNLP '21

arXiv:2104.09088 [pdf, other]

Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems

Authors: Anish Acharya, Suranjit Adhikari, Sanchit Agarwal, Vincent Auvray, Nehal Belgamwar, Arijit Biswas, Shubhra Chandra, Tagyoung Chung, Maryam Fazel-Zarandi, Raefer Gabriel, Shuyang Gao, Rahul Goel, Dilek Hakkani-Tur, Jan Jezabek, Abhay Jha, Jiun-Yu Kao, Prakash Krishnan, Peter Ku, Anuj Goyal, Chien-Wei Lin, Qing Liu, Arindam Mandal, Angeliki Metallinou, Vishal Naik, Yi Pan , et al. (6 additional authors not shown)

Abstract: Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and… ▽ More Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomena like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task and show that the dialogue simulator is an essential component of the system that leads to over $50\%$ improvement in turn-level action signature prediction accuracy. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Journal ref: NAACL 2021 System Demonstrations Track

arXiv:2010.02459 [pdf, other]

Usable Information and Evolution of Optimal Representations During Training

Authors: Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan C. Kao

Abstract: We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training. We show that the implicit regularization coming from training with Stochastic Gradient Descent with a high learning-rate and small batch size plays an important role in learning minimal sufficient representations… ▽ More We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training. We show that the implicit regularization coming from training with Stochastic Gradient Descent with a high learning-rate and small batch size plays an important role in learning minimal sufficient representations for the task. In the process of arriving at a minimal sufficient representation, we find that the content of the representation changes dynamically during training. In particular, we find that semantically meaningful but ultimately irrelevant information is encoded in the early transient dynamics of training, before being later discarded. In addition, we evaluate how perturbing the initial part of training impacts the learning dynamics and the resulting representations. We show these effects on both perceptual decision-making tasks inspired by neuroscience literature, as well as on standard image classification tasks. △ Less

Submitted 28 February, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: ICLR 2021

arXiv:2005.14408 [pdf, other]

doi 10.18653/v1/2021.naacl-industry.25

Noise Robust Named Entity Understanding for Voice Assistants

Authors: Deepak Muralidharan, Joel Ruben Antony Moniz, Sida Gao, Xiao Yang, Justine Kao, Stephen Pulman, Atish Kothari, Ray Shen, Yinying Pan, Vivek Kaul, Mubarak Seyed Ibrahim, Gang Xiang, Nan Dun, Yidan Zhou, Andy O, Yuan Zhang, Pooja Chitkara, Xuan Wang, Alkesh Patel, Kushal Tayal, Roger Zheng, Peter Grasch, Jason D. Williams, Lin Li

Abstract: Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that jointly solves the NER and EL tasks by combining them in a joint reranking module. We show that our proposed framework improves NER accuracy by up to… ▽ More Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that jointly solves the NER and EL tasks by combining them in a joint reranking module. We show that our proposed framework improves NER accuracy by up to 3.13% and EL accuracy by up to 3.6% in F1 score. The features used also lead to better accuracies in other natural language understanding tasks, such as domain classification and semantic parsing. △ Less

Submitted 10 August, 2021; v1 submitted 29 May, 2020; originally announced May 2020.

Comments: NAACL 2021 Industry Track

MSC Class: 68T50 ACM Class: I.2.7

arXiv:1910.00458 [pdf, other]

MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension

Authors: Di **, Shuyang Gao, Jiun-Yu Kao, Tagyoung Chung, Dilek Hakkani-tur

Abstract: Machine Reading Comprehension (MRC) for question answering (QA), which aims to answer a question given the relevant context passages, is an important way to test the ability of intelligence systems to understand human language. Multiple-Choice QA (MCQA) is one of the most difficult tasks in MRC because it often requires more advanced reading comprehension skills such as logical reasoning, summariz… ▽ More Machine Reading Comprehension (MRC) for question answering (QA), which aims to answer a question given the relevant context passages, is an important way to test the ability of intelligence systems to understand human language. Multiple-Choice QA (MCQA) is one of the most difficult tasks in MRC because it often requires more advanced reading comprehension skills such as logical reasoning, summarization, and arithmetic operations, compared to the extractive counterpart where answers are usually spans of text within given passages. Moreover, most existing MCQA datasets are small in size, making the learning task even harder. We introduce MMM, a Multi-stage Multi-task learning framework for Multi-choice reading comprehension. Our method involves two sequential stages: coarse-tuning stage using out-of-domain datasets and multi-task learning stage using a larger in-domain dataset to help model generalize better with limited data. Furthermore, we propose a novel multi-step attention network (MAN) as the top-level classifier for this task. We demonstrate MMM significantly advances the state-of-the-art on four representative MCQA datasets. △ Less

Submitted 18 November, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: Accepted by AAAI 2020

arXiv:1909.09143 [pdf, ps, other]

Leveraging User Engagement Signals For Entity Labeling in a Virtual Assistant

Authors: Deepak Muralidharan, Justine Kao, Xiao Yang, Lin Li, Lavanya Viswanathan, Mubarak Seyed Ibrahim, Kevin Luikens, Stephen Pulman, Ashish Garg, Atish Kothari, Jason Williams

Abstract: Personal assistant AI systems such as Siri, Cortana, and Alexa have become widely used as a means to accomplish tasks through natural language commands. However, components in these systems generally rely on supervised machine learning algorithms that require large amounts of hand-annotated training data, which is expensive and time consuming to collect. The ability to incorporate unsupervised, we… ▽ More Personal assistant AI systems such as Siri, Cortana, and Alexa have become widely used as a means to accomplish tasks through natural language commands. However, components in these systems generally rely on supervised machine learning algorithms that require large amounts of hand-annotated training data, which is expensive and time consuming to collect. The ability to incorporate unsupervised, weakly supervised, or distantly supervised data holds significant promise in overcoming this bottleneck. In this paper, we describe a framework that leverages user engagement signals (user behaviors that demonstrate a positive or negative response to content) to automatically create granular entity labels for training data augmentation. Strategies such as multi-task learning and validation using an external knowledge base are employed to incorporate the engagement annotated data and to boost the model's accuracy on a sequence labeling task. Our results show that learning from data automatically labeled by user engagement signals achieves significant accuracy gains in a production deep learning system, when measured on both the sequence labeling task as well as on user facing results produced by the system end-to-end. We believe this is the first use of user engagement signals to help generate training data for a sequence labeling task on a large scale, and can be applied in practical settings to speed up new feature deployment when little human annotated data is available. △ Less

Submitted 18 September, 2019; originally announced September 2019.

Comments: NeurIPS 2018 Conversational AI Workshop

arXiv:1908.11404 [pdf]

Active Learning for Domain Classification in a Commercial Spoken Personal Assistant

Authors: Xi C. Chen, Adithya Sagar, Justine T. Kao, Tony Y. Li, Christopher Klein, Stephen Pulman, Ashish Garg, Jason D. Williams

Abstract: We describe a method for selecting relevant new training data for the LSTM-based domain selection component of our personal assistant system. Adding more annotated training data for any ML system typically improves accuracy, but only if it provides examples not already adequately covered in the existing data. However, obtaining, selecting, and labeling relevant data is expensive. This work present… ▽ More We describe a method for selecting relevant new training data for the LSTM-based domain selection component of our personal assistant system. Adding more annotated training data for any ML system typically improves accuracy, but only if it provides examples not already adequately covered in the existing data. However, obtaining, selecting, and labeling relevant data is expensive. This work presents a simple technique that automatically identifies new helpful examples suitable for human annotation. Our experimental results show that the proposed method, compared with random-selection and entropy-based methods, leads to higher accuracy improvements given a fixed annotation budget. Although developed and tested in the setting of a commercial intelligent assistant, the technique is of wider applicability. △ Less

Submitted 29 August, 2019; originally announced August 2019.

arXiv:1702.06780 [pdf, other]

Joint Spectrum Reuse and Power Control for Multi-Sharing Device-to-Device Communication

Authors: Kuo-Yi Chen, Jung-Chun Kao, Si-An Ciou, Shih-Han Lin

Abstract: Compared to current mobile networks, next-generation mobile networks are expected to support higher numbers of simultaneously connected devices and to achieve higher system spectrum efficiency and lower power consumption. To achieve these goals, we study the multi-sharing device-to-device (D2D) communication, which allows any cellular user equipment to share its radio resource with multiple D2D de… ▽ More Compared to current mobile networks, next-generation mobile networks are expected to support higher numbers of simultaneously connected devices and to achieve higher system spectrum efficiency and lower power consumption. To achieve these goals, we study the multi-sharing device-to-device (D2D) communication, which allows any cellular user equipment to share its radio resource with multiple D2D devices. We jointly consider resource block reuse and power control and then develop the MISS algorithm. Simulation results show that MISS performs very well in terms of transmission power consumption, system throughput, and the number of permitted D2D devices. △ Less

Submitted 22 February, 2017; originally announced February 2017.

arXiv:0808.2417 [pdf, ps, other]

On NFAs Where All States are Final, Initial, or Both

Authors: Jui-Yi Kao, Narad Rampersad, Jeffrey Shallit

Abstract: We examine questions involving nondeterministic finite automata where all states are final, initial, or both initial and final. First, we prove hardness results for the nonuniversality and inequivalence problems for these NFAs. Next, we characterize the languages accepted. Finally, we discuss some state complexity problems involving such automata. We examine questions involving nondeterministic finite automata where all states are final, initial, or both initial and final. First, we prove hardness results for the nonuniversality and inequivalence problems for these NFAs. Next, we characterize the languages accepted. Finally, we discuss some state complexity problems involving such automata. △ Less

Submitted 3 July, 2009; v1 submitted 18 August, 2008; originally announced August 2008.

Comments: submitted

arXiv:0710.4728 [pdf]

Energy-Aware Routing for E-Textile Applications

Authors: Jung-Chun Kao, Radu Marculescu

Abstract: As the scale of electronic devices shrinks, "electronic textiles" (e-textiles) will make possible a wide variety of novel applications which are currently unfeasible. Due to the wearability concerns, low-power techniques are critical for e-textile applications. In this paper, we address the issue of the energy-aware routing for e-textile platforms and propose an efficient algorithm to solve it.… ▽ More As the scale of electronic devices shrinks, "electronic textiles" (e-textiles) will make possible a wide variety of novel applications which are currently unfeasible. Due to the wearability concerns, low-power techniques are critical for e-textile applications. In this paper, we address the issue of the energy-aware routing for e-textile platforms and propose an efficient algorithm to solve it. The platform we consider consists of dedicated components for e-textiles, including computational modules, dedicated transmission lines and thin-film batteries on fiber substrates. Furthermore, we derive an analytical upper bound for the achievable number of jobs completed over all possible routing strategies. From a practical standpoint, for the Advanced Encryption Standard (AES) cipher, the routing technique we propose achieves about fifty percent of this analytical upper bound. Moreover, compared to the non-energy-aware counterpart, our routing technique increases the number of encryption jobs completed by one order of magnitude. △ Less

Submitted 25 October, 2007; originally announced October 2007.

Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

Journal ref: Dans Design, Automation and Test in Europe - DATE'05, Munich : Allemagne (2005)

arXiv:0708.3224 [pdf, ps, other]

The Frobenius Problem in a Free Monoid

Authors: Jui-Yi Kao, Jeffrey Shallit, Zhi Xu

Abstract: The classical Frobenius problem is to compute the largest number g not representable as a non-negative integer linear combination of non-negative integers x_1, x_2, ..., x_k, where gcd(x_1, x_2, ..., x_k) = 1. In this paper we consider generalizations of the Frobenius problem to the noncommutative setting of a free monoid. Unlike the commutative case, where the bound on g is quadratic, we are ab… ▽ More The classical Frobenius problem is to compute the largest number g not representable as a non-negative integer linear combination of non-negative integers x_1, x_2, ..., x_k, where gcd(x_1, x_2, ..., x_k) = 1. In this paper we consider generalizations of the Frobenius problem to the noncommutative setting of a free monoid. Unlike the commutative case, where the bound on g is quadratic, we are able to show exponential or subexponential behavior for an analogue of g, depending on the particular measure chosen. △ Less

Submitted 23 August, 2007; originally announced August 2007.

Comments: 19 pages; preliminary announcement

ACM Class: F.4.3

Showing 1–23 of 23 results for author: Kao, J