Search | arXiv e-print repository

Prompt-based Pseudo-labeling Strategy for Sample-Efficient Semi-Supervised Extractive Summarization

Authors: Gaurav Sahu, Olga Vechtomova, Issam H. Laradji

Abstract: Semi-supervised learning (SSL) is a widely used technique in scenarios where labeled data is scarce and unlabeled data is abundant. While SSL is popular for image and text classification, it is relatively underexplored for the task of extractive text summarization. Standard SSL methods follow a teacher-student paradigm to first train a classification model and then use the classifier's confidence… ▽ More Semi-supervised learning (SSL) is a widely used technique in scenarios where labeled data is scarce and unlabeled data is abundant. While SSL is popular for image and text classification, it is relatively underexplored for the task of extractive text summarization. Standard SSL methods follow a teacher-student paradigm to first train a classification model and then use the classifier's confidence values to select pseudo-labels for the subsequent training cycle; however, such classifiers are not suitable to measure the accuracy of pseudo-labels as they lack specific tuning for evaluation, which leads to confidence values that fail to capture the semantics and correctness of the generated summary. To address this problem, we propose a prompt-based pseudo-labeling strategy with LLMs that picks unlabeled examples with more accurate pseudo-labels than using just the classifier's probability outputs. Our approach also includes a relabeling mechanism that improves the quality of pseudo-labels. We evaluate our method on three text summarization datasets: TweetSumm, WikiHow, and ArXiv/PubMed. We empirically show that a prompting-based LLM that scores and generates pseudo-labels outperforms existing SSL methods on ROUGE-1, ROUGE-2, and ROUGE-L scores on all the datasets. Furthermore, our method achieves competitive L-Eval scores (evaluation with LLaMa-3) as a fully supervised method in a data-scarce setting and outperforms fully supervised method in a data-abundant setting. △ Less

Submitted 1 July, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 8 pages, 6 figures, 3 tables

arXiv:2310.14192 [pdf, other]

PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation

Authors: Gaurav Sahu, Olga Vechtomova, Dzmitry Bahdanau, Issam H. Laradji

Abstract: Data augmentation is a widely used technique to address the problem of text classification when there is a limited amount of training data. Recent work often tackles this problem using large language models (LLMs) like GPT3 that can generate new examples given already available ones. In this work, we propose a method to generate more helpful augmented data by utilizing the LLM's abilities to follo… ▽ More Data augmentation is a widely used technique to address the problem of text classification when there is a limited amount of training data. Recent work often tackles this problem using large language models (LLMs) like GPT3 that can generate new examples given already available ones. In this work, we propose a method to generate more helpful augmented data by utilizing the LLM's abilities to follow instructions and perform few-shot classifications. Our specific PromptMix method consists of two steps: 1) generate challenging text augmentations near class boundaries; however, generating borderline examples increases the risk of false positives in the dataset, so we 2) relabel the text augmentations using a prompting-based LLM classifier to enhance the correctness of labels in the generated data. We evaluate the proposed method in challenging 2-shot and zero-shot settings on four text classification datasets: Banking77, TREC6, Subjectivity (SUBJ), and Twitter Complaints. Our experiments show that generating and, crucially, relabeling borderline examples facilitates the transfer of knowledge of a massive LLM like GPT3.5-turbo into smaller and cheaper classifiers like DistilBERT$_{base}$ and BERT$_{base}$. Furthermore, 2-shot PromptMix outperforms multiple 5-shot data augmentation methods on the four datasets. Our code is available at https://github.com/ServiceNow/PromptMix-EMNLP-2023. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (Long paper)

arXiv:2212.09947 [pdf, other]

Future Sight: Dynamic Story Generation with Large Pretrained Language Models

Authors: Brian D. Zimmerman, Gaurav Sahu, Olga Vechtomova

Abstract: Recent advances in deep learning research, such as transformers, have bolstered the ability for automated agents to generate creative texts similar to those that a human would write. By default, transformer decoders can only generate new text with respect to previously generated text. The output distribution of candidate tokens at any position is conditioned on previously selected tokens using a s… ▽ More Recent advances in deep learning research, such as transformers, have bolstered the ability for automated agents to generate creative texts similar to those that a human would write. By default, transformer decoders can only generate new text with respect to previously generated text. The output distribution of candidate tokens at any position is conditioned on previously selected tokens using a self-attention mechanism to emulate the property of autoregression. This is inherently limiting for tasks such as controllable story generation where it may be necessary to condition on future plot events when writing a story. In this work, we propose Future Sight, a method for finetuning a pretrained generative transformer on the task of future conditioning. Transformer decoders are typically pretrained on the task of completing a context, one token at a time, by means of self-attention. Future Sight additionally enables a decoder to attend to an encoded future plot event. This motivates the decoder to expand on the context in a way that logically concludes with the provided future. During inference, the future plot event can be written by a human author to steer the narrative being generated in a certain direction. We evaluate the efficacy of our approach on a story generation task with human evaluators. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: 9 pages, 1 figure, 4 tables

arXiv:2210.15638 [pdf, other]

LyricJam Sonic: A Generative System for Real-Time Composition and Musical Improvisation

Authors: Olga Vechtomova, Gaurav Sahu

Abstract: Electronic music artists and sound designers have unique workflow practices that necessitate specialized approaches for develo** music information retrieval and creativity support tools. Furthermore, electronic music instruments, such as modular synthesizers, have near-infinite possibilities for sound creation and can be combined to create unique and complex audio paths. The process of discoveri… ▽ More Electronic music artists and sound designers have unique workflow practices that necessitate specialized approaches for develo** music information retrieval and creativity support tools. Furthermore, electronic music instruments, such as modular synthesizers, have near-infinite possibilities for sound creation and can be combined to create unique and complex audio paths. The process of discovering interesting sounds is often serendipitous and impossible to replicate. For this reason, many musicians in electronic genres record audio output at all times while they work in the studio. Subsequently, it is difficult for artists to rediscover audio segments that might be suitable for use in their compositions from thousands of hours of recordings. In this paper, we describe LyricJam Sonic -- a novel creative tool for musicians to rediscover their previous recordings, re-contextualize them with other recordings, and create original live music compositions in real-time. A bi-modal AI-driven approach uses generated lyric lines to find matching audio clips from the artist's past studio recordings, and uses them to generate new lyric lines, which in turn are used to find other clips, thus creating a continuous and evolving stream of music and lyrics. The intent is to keep the artists in a state of creative flow conducive to music creation rather than taking them into an analytical/critical state of deliberately searching for past audio segments. The system can run in either a fully autonomous mode without user input, or in a live performance mode, where the artist plays live music, while the system "listens" and creates a continuous stream of music and lyrics in response. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: 15 pages, 9 figures, 2 tables

arXiv:2106.01960 [pdf, other]

LyricJam: A system for generating lyrics for live instrumental music

Authors: Olga Vechtomova, Gaurav Sahu, Dhruv Kumar

Abstract: We describe a real-time system that receives a live audio stream from a jam session and generates lyric lines that are congruent with the live music being played. Two novel approaches are proposed to align the learned latent spaces of audio and text representations that allow the system to generate novel lyric lines matching live instrumental music. One approach is based on adversarial alignment o… ▽ More We describe a real-time system that receives a live audio stream from a jam session and generates lyric lines that are congruent with the live music being played. Two novel approaches are proposed to align the learned latent spaces of audio and text representations that allow the system to generate novel lyric lines matching live instrumental music. One approach is based on adversarial alignment of latent representations of audio and lyrics, while the other approach learns to transfer the topology from the music latent space to the lyric latent space. A user study with music artists using the system showed that the system was useful not only in lyric composition, but also encouraged the artists to improvise and find new musical expressions. Another user study demonstrated that users preferred the lines generated using the proposed methods to the lines generated by a baseline model. △ Less

Submitted 3 June, 2021; originally announced June 2021.

Comments: Accepted to International Conference on Computational Creativity (ICCC) 2021 [Oral]

arXiv:2105.01129 [pdf, other]

Towards A Multi-agent System for Online Hate Speech Detection

Authors: Gaurav Sahu, Robin Cohen, Olga Vechtomova

Abstract: This paper envisions a multi-agent system for detecting the presence of hate speech in online social media platforms such as Twitter and Facebook. We introduce a novel framework employing deep learning techniques to coordinate the channels of textual and im-age processing. Our experimental results aim to demonstrate the effectiveness of our methods for classifying online content, training the prop… ▽ More This paper envisions a multi-agent system for detecting the presence of hate speech in online social media platforms such as Twitter and Facebook. We introduce a novel framework employing deep learning techniques to coordinate the channels of textual and im-age processing. Our experimental results aim to demonstrate the effectiveness of our methods for classifying online content, training the proposed neural network model to effectively detect hateful instances in the input. We conclude with a discussion of how our system may be of use to provide recommendations to users who are managing online social networks, showcasing the immense potential of intelligent multi-agent systems towards delivering social good. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Comments: Accepted to the 2nd International Workshop on Autonomous Agents for Social Good (AASG), AAMAS, 2021

arXiv:2011.00416 [pdf, other]

Deep Learning for Text Style Transfer: A Survey

Authors: Di **, Zhi**g **, Zhiting Hu, Olga Vechtomova, Rada Mihalcea

Abstract: Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a… ▽ More Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of this task. Our curated paper list is at https://github.com/zhi**g-**/Text_Style_Transfer_Survey △ Less

Submitted 16 December, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

Comments: Computational Linguistics Journal 2022

arXiv:2009.14375 [pdf, other]

Generation of lyrics lines conditioned on music audio clips

Authors: Olga Vechtomova, Gaurav Sahu, Dhruv Kumar

Abstract: We present a system for generating novel lyrics lines conditioned on music audio. A bimodal neural network model learns to generate lines conditioned on any given short audio clip. The model consists of a spectrogram variational autoencoder (VAE) and a text VAE. Both automatic and human evaluations demonstrate effectiveness of our model in generating lines that have an emotional impact matching a… ▽ More We present a system for generating novel lyrics lines conditioned on music audio. A bimodal neural network model learns to generate lines conditioned on any given short audio clip. The model consists of a spectrogram variational autoencoder (VAE) and a text VAE. Both automatic and human evaluations demonstrate effectiveness of our model in generating lines that have an emotional impact matching a given audio clip. The system is intended to serve as a creativity tool for songwriters. △ Less

Submitted 29 September, 2020; originally announced September 2020.

Comments: Accepted to First Workshop on NLP for Music and Audio (NLP4MusA) at ISMIR 2020

arXiv:2006.09639 [pdf, other]

Iterative Edit-Based Unsupervised Sentence Simplification

Authors: Dhruv Kumar, Lili Mou, Lukasz Golab, Olga Vechtomova

Abstract: We present a novel iterative, edit-based approach to unsupervised sentence simplification. Our model is guided by a scoring function involving fluency, simplicity, and meaning preservation. Then, we iteratively perform word and phrase-level edits on the complex sentence. Compared with previous approaches, our model does not require a parallel training set, but is more controllable and interpretabl… ▽ More We present a novel iterative, edit-based approach to unsupervised sentence simplification. Our model is guided by a scoring function involving fluency, simplicity, and meaning preservation. Then, we iteratively perform word and phrase-level edits on the complex sentence. Compared with previous approaches, our model does not require a parallel training set, but is more controllable and interpretable. Experiments on Newsela and WikiLarge datasets show that our approach is nearly as effective as state-of-the-art supervised approaches. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: The paper has been accepted to ACL 2020

arXiv:2005.01791 [pdf, other]

Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction

Authors: Raphael Schumann, Lili Mou, Yao Lu, Olga Vechtomova, Katja Markert

Abstract: Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information. A good summary is characterized by language fluency and high information overlap with the source sentence. We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics. We search for a high-scoring summary b… ▽ More Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information. A good summary is characterized by language fluency and high information overlap with the source sentence. We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics. We search for a high-scoring summary by discrete optimization. Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores. Additionally, we demonstrate that the commonly reported ROUGE F1 metric is sensitive to summary length. Since this is unwillingly exploited in recent work, we emphasize that future evaluation should explicitly group summarization systems by output length brackets. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: Accepted at ACL 2020

arXiv:2004.10809 [pdf, other]

Polarized-VAE: Proximity Based Disentangled Representation Learning for Text Generation

Authors: Vikash Balasubramanian, Ivan Kobyzev, Hareesh Bahuleyan, Ilya Shapiro, Olga Vechtomova

Abstract: Learning disentangled representations of real-world data is a challenging open problem. Most previous methods have focused on either supervised approaches which use attribute labels or unsupervised approaches that manipulate the factorization in the latent space of models such as the variational autoencoder (VAE) by training with task-specific losses. In this work, we propose polarized-VAE, an app… ▽ More Learning disentangled representations of real-world data is a challenging open problem. Most previous methods have focused on either supervised approaches which use attribute labels or unsupervised approaches that manipulate the factorization in the latent space of models such as the variational autoencoder (VAE) by training with task-specific losses. In this work, we propose polarized-VAE, an approach that disentangles select attributes in the latent space based on proximity measures reflecting the similarity between data points with respect to these attributes. We apply our method to disentangle the semantics and syntax of sentences and carry out transfer experiments. Polarized-VAE outperforms the VAE baseline and is competitive with state-of-the-art approaches, while being more a general framework that is applicable to other attribute disentanglement tasks. △ Less

Submitted 24 January, 2021; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: Camera Ready for EACL 2021

arXiv:1911.03828 [pdf, other]

Stylized Text Generation Using Wasserstein Autoencoders with a Mixture of Gaussian Prior

Authors: Amirpasha Ghabussi, Lili Mou, Olga Vechtomova

Abstract: Wasserstein autoencoders are effective for text generation. They do not however provide any control over the style and topic of the generated sentences if the dataset has multiple classes and includes different topics. In this work, we present a semi-supervised approach for generating stylized sentences. Our model is trained on a multi-class dataset and learns the latent representation of the sent… ▽ More Wasserstein autoencoders are effective for text generation. They do not however provide any control over the style and topic of the generated sentences if the dataset has multiple classes and includes different topics. In this work, we present a semi-supervised approach for generating stylized sentences. Our model is trained on a multi-class dataset and learns the latent representation of the sentences using a mixture of Gaussian prior without any adversarial losses. This allows us to generate sentences in the style of a specified class or multiple classes by sampling from their corresponding prior distributions. Moreover, we can train our model on relatively small datasets and learn the latent representation of a specified class by adding external data with other styles/classes to our dataset. While a simple WAE or VAE cannot generate diverse sentences in this case, generated sentences with our approach are diverse, fluent, and preserve the style and the content of the desired classes. △ Less

Submitted 9 November, 2019; originally announced November 2019.

arXiv:1911.03821 [pdf, other]

Adaptive Fusion Techniques for Multimodal Data

Authors: Gaurav Sahu, Olga Vechtomova

Abstract: Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data. In this paper, we propose adaptive fusion techniques that aim to model context from different modalities effectively. Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide "how" to co… ▽ More Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data. In this paper, we propose adaptive fusion techniques that aim to model context from different modalities effectively. Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide "how" to combine a given set of multimodal features more effectively. We propose two networks: 1) Auto-Fusion, which learns to compress information from different modalities while preserving the context, and 2) GAN-Fusion, which regularizes the learned latent space given context from complementing modalities. A quantitative evaluation on the tasks of multimodal machine translation and emotion recognition suggests that our lightweight, adaptive networks can better model context from other modalities than existing methods, many of which employ massive transformer-based networks. △ Less

Submitted 26 January, 2021; v1 submitted 9 November, 2019; originally announced November 2019.

Comments: Camera-ready version for EACL 2021

arXiv:1911.03817 [pdf, other]

Adversarial Learning on the Latent Space for Diverse Dialog Generation

Authors: Kashif Khan, Gaurav Sahu, Vikash Balasubramanian, Lili Mou, Olga Vechtomova

Abstract: Generating relevant responses in a dialog is challenging, and requires not only proper modeling of context in the conversation but also being able to generate fluent sentences during inference. In this paper, we propose a two-step framework based on generative adversarial nets for generating conditioned responses. Our model first learns a meaningful representation of sentences by autoencoding and… ▽ More Generating relevant responses in a dialog is challenging, and requires not only proper modeling of context in the conversation but also being able to generate fluent sentences during inference. In this paper, we propose a two-step framework based on generative adversarial nets for generating conditioned responses. Our model first learns a meaningful representation of sentences by autoencoding and then learns to map an input query to the response representation, which is in turn decoded as a response sentence. Both quantitative and qualitative evaluations show that our model generates more fluent, relevant, and diverse responses than existing state-of-the-art methods. △ Less

Submitted 3 November, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

Comments: Accepted to COLING 2020

arXiv:1907.05789 [pdf, other]

Generating Sentences from Disentangled Syntactic and Semantic Spaces

Authors: Yu Bao, Hao Zhou, Shujian Huang, Lei Li, Lili Mou, Olga Vechtomova, Xinyu Dai, Jiajun Chen

Abstract: Variational auto-encoders (VAEs) are widely used in natural language generation due to the regularization of the latent space. However, generating sentences from the continuous latent space does not explicitly model the syntactic information. In this paper, we propose to generate sentences from disentangled syntactic and semantic spaces. Our proposed method explicitly models syntactic information… ▽ More Variational auto-encoders (VAEs) are widely used in natural language generation due to the regularization of the latent space. However, generating sentences from the continuous latent space does not explicitly model the syntactic information. In this paper, we propose to generate sentences from disentangled syntactic and semantic spaces. Our proposed method explicitly models syntactic information in the VAE's latent space by using the linearized tree sequence, leading to better performance of language generation. Additionally, the advantage of sampling in the disentangled syntactic and semantic latent spaces enables us to perform novel applications, such as the unsupervised paraphrase generation and syntax-transfer generation. Experimental results show that our proposed model achieves similar or better performance in various tasks, compared with state-of-the-art related work. △ Less

Submitted 6 July, 2019; originally announced July 2019.

Comments: 11 pages, accepted in ACL-2019

arXiv:1903.12136 [pdf, other]

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

Authors: Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin

Abstract: In the natural language processing literature, neural networks are becoming increasingly deeper and complex. The recent poster child of this trend is the deep language representation model, which includes BERT, ELMo, and GPT. These developments have led to the conviction that previous-generation, shallower neural networks for language understanding are obsolete. In this paper, however, we demonstr… ▽ More In the natural language processing literature, neural networks are becoming increasingly deeper and complex. The recent poster child of this trend is the deep language representation model, which includes BERT, ELMo, and GPT. These developments have led to the conviction that previous-generation, shallower neural networks for language understanding are obsolete. In this paper, however, we demonstrate that rudimentary, lightweight neural networks can still be made competitive without architecture changes, external training data, or additional input features. We propose to distill knowledge from BERT, a state-of-the-art language representation model, into a single-layer BiLSTM, as well as its siamese counterpart for sentence-pair tasks. Across multiple datasets in paraphrasing, natural language inference, and sentiment classification, we achieve comparable results with ELMo, while using roughly 100 times fewer parameters and 15 times less inference time. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Comments: 8 pages, 2 figures; first three authors contributed equally

arXiv:1812.08318 [pdf]

Generating lyrics with variational autoencoder and multi-modal artist embeddings

Authors: Olga Vechtomova, Hareesh Bahuleyan, Amirpasha Ghabussi, Vineet John

Abstract: We present a system for generating song lyrics lines conditioned on the style of a specified artist. The system uses a variational autoencoder with artist embeddings. We propose the pre-training of artist embeddings with the representations learned by a CNN classifier, which is trained to predict artists based on MEL spectrograms of their song clips. This work is the first step towards combining a… ▽ More We present a system for generating song lyrics lines conditioned on the style of a specified artist. The system uses a variational autoencoder with artist embeddings. We propose the pre-training of artist embeddings with the representations learned by a CNN classifier, which is trained to predict artists based on MEL spectrograms of their song clips. This work is the first step towards combining audio and text modalities of songs for generating lyrics conditioned on the artist's style. Our preliminary results suggest that there is a benefit in initializing artists' embeddings with the representations learned by a spectrogram classifier. △ Less

Submitted 19 December, 2018; originally announced December 2018.

Comments: 5 pages, 5 tables, 1 figure

arXiv:1808.04339 [pdf, other]

Disentangled Representation Learning for Non-Parallel Text Style Transfer

Authors: Vineet John, Lili Mou, Hareesh Bahuleyan, Olga Vechtomova

Abstract: This paper tackles the problem of disentangling the latent variables of style and content in language models. We propose a simple yet effective approach, which incorporates auxiliary multi-task and adversarial objectives, for label prediction and bag-of-words prediction, respectively. We show, both qualitatively and quantitatively, that the style and content are indeed disentangled in the latent s… ▽ More This paper tackles the problem of disentangling the latent variables of style and content in language models. We propose a simple yet effective approach, which incorporates auxiliary multi-task and adversarial objectives, for label prediction and bag-of-words prediction, respectively. We show, both qualitatively and quantitatively, that the style and content are indeed disentangled in the latent space. This disentangled latent representation learning method is applied to style transfer on non-parallel corpora. We achieve substantially better results in terms of transfer accuracy, content preservation and language fluency, in comparison to previous state-of-the-art approaches. △ Less

Submitted 10 September, 2018; v1 submitted 13 August, 2018; originally announced August 2018.

Comments: 11 pages, 7 figures, 6 tables; Preliminary work rejected by EMNLP-18

MSC Class: 68T50 ACM Class: I.2.7

arXiv:1806.08462 [pdf, other]

Stochastic Wasserstein Autoencoder for Probabilistic Sentence Generation

Authors: Hareesh Bahuleyan, Lili Mou, Hao Zhou, Olga Vechtomova

Abstract: The variational autoencoder (VAE) imposes a probabilistic distribution (typically Gaussian) on the latent space and penalizes the Kullback--Leibler (KL) divergence between the posterior and prior. In NLP, VAEs are extremely difficult to train due to the problem of KL collapsing to zero. One has to implement various heuristics such as KL weight annealing and word dropout in a carefully engineered m… ▽ More The variational autoencoder (VAE) imposes a probabilistic distribution (typically Gaussian) on the latent space and penalizes the Kullback--Leibler (KL) divergence between the posterior and prior. In NLP, VAEs are extremely difficult to train due to the problem of KL collapsing to zero. One has to implement various heuristics such as KL weight annealing and word dropout in a carefully engineered manner to successfully train a VAE for text. In this paper, we propose to use the Wasserstein autoencoder (WAE) for probabilistic sentence generation, where the encoder could be either stochastic or deterministic. We show theoretically and empirically that, in the original WAE, the stochastically encoded Gaussian distribution tends to become a Dirac-delta function, and we propose a variant of WAE that encourages the stochasticity of the encoder. Experimental results show that the latent space learned by WAE exhibits properties of continuity and smoothness as in VAEs, while simultaneously achieving much higher BLEU scores for sentence reconstruction. △ Less

Submitted 12 April, 2019; v1 submitted 21 June, 2018; originally announced June 2018.

Comments: Accepted by NAACL-HLT 2019

arXiv:1712.08207 [pdf, other]

Variational Attention for Sequence-to-Sequence Models

Authors: Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart

Abstract: The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent sp… ▽ More The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences. △ Less

Submitted 21 June, 2018; v1 submitted 21 December, 2017; originally announced December 2017.

Comments: In Proceedings of COLING 2018. Also accepted by TADGM Workshop@ICML 2018 for presentation

arXiv:1707.09448 [pdf, ps, other]

Sentiment Analysis on Financial News Headlines using Training Dataset Augmentation

Authors: Vineet John, Olga Vechtomova

Abstract: This paper discusses the approach taken by the UWaterloo team to arrive at a solution for the Fine-Grained Sentiment Analysis problem posed by Task 5 of SemEval 2017. The paper describes the document vectorization and sentiment score prediction techniques used, as well as the design and implementation decisions taken while building the system for this task. The system uses text vectorization model… ▽ More This paper discusses the approach taken by the UWaterloo team to arrive at a solution for the Fine-Grained Sentiment Analysis problem posed by Task 5 of SemEval 2017. The paper describes the document vectorization and sentiment score prediction techniques used, as well as the design and implementation decisions taken while building the system for this task. The system uses text vectorization models, such as N-gram, TF-IDF and paragraph embeddings, coupled with regression model variants to predict the sentiment scores. Amongst the methods examined, unigrams and bigrams coupled with simple linear regression obtained the best baseline accuracy. The paper also explores data augmentation methods to supplement the training dataset. This system was designed for Subtask 2 (News Statements and Headlines). △ Less

Submitted 28 July, 2017; originally announced July 2017.

Comments: 5 pages

MSC Class: 68T50 ACM Class: I.2.7

Journal ref: Association for Computational Linguistics, Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 869-873

Showing 1–21 of 21 results for author: Vechtomova, O