Search | arXiv e-print repository

doi 10.18653/v1/2023.acl-short.59

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Authors: Amirhossein Abaskohi, Sascha Rothe, Yadollah Yaghoobzadeh

Abstract: In recent years, there has been significant progress in develo** pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to p… ▽ More In recent years, there has been significant progress in develo** pre-trained language models for NLP. However, these models often struggle when fine-tuned on small datasets. To address this issue, researchers have proposed various adaptation approaches. Prompt-based tuning is arguably the most common way, especially for larger models. Previous research shows that adding contrastive learning to prompt-based fine-tuning is effective as it helps the model generate embeddings that are more distinguishable between classes, and it can also be more sample-efficient as the model learns from positive and negative examples simultaneously. One of the most important components of contrastive learning is data augmentation, but unlike computer vision, effective data augmentation for NLP is still challenging. This paper proposes LM-CPPF, Contrastive Paraphrasing-guided Prompt-based Fine-tuning of Language Models, which leverages prompt-based few-shot paraphrasing using generative language models, especially large language models such as GPT-3 and OPT-175B, for data augmentation. Our experiments on multiple text classification benchmarks show that this augmentation method outperforms other methods, such as easy data augmentation, back translation, and multiple templates. △ Less

Submitted 5 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 10 pages, 1 figure, 8 tables, 1 algorithm Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics

Journal ref: https://aclanthology.org/2023.acl-short

arXiv:2209.15469 [pdf, other]

Zero-Shot Retrieval with Search Agents and Hybrid Environments

Authors: Michelle Chen Huebscher, Christian Buck, Massimiliano Ciaramita, Sascha Rothe

Abstract: Learning to search is the task of building artificial agents that learn to autonomously use a search box to find information. So far, it has been shown that current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers. We extend the previous learning to search setup to a hybrid envir… ▽ More Learning to search is the task of building artificial agents that learn to autonomously use a search box to find information. So far, it has been shown that current language models can learn symbolic query reformulation policies, in combination with traditional term-based retrieval, but fall short of outperforming neural retrievers. We extend the previous learning to search setup to a hybrid environment, which accepts discrete query refinement operations, after a first-pass retrieval step via a dual encoder. Experiments on the BEIR task show that search agents, trained via behavioral cloning, outperform the underlying search system based on a combined dual encoder retriever and cross encoder reranker. Furthermore, we find that simple heuristic Hybrid Retrieval Environments (HRE) can improve baseline performance by several nDCG points. The search agent based on HRE (HARE) matches state-of-the-art performance, balanced in both zero-shot and in-domain evaluations, via interpretable actions, and at twice the speed. △ Less

Submitted 29 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

arXiv:2204.12553 [pdf, other]

Generative 3D Animation Pipelines: Automating Facial Retargeting Workflows

Authors: Julius Girbig, Changkun Ou, Sylvia Rothe

Abstract: Design tools in the 3D industry, while powerful, are still tedious and labor-intensive when it comes to bringing a creative idea for a visual effect to life. In this position paper, we discussed how an infamous generative synthetic media, deepfakes, could be of use and embedded into common sophisticated 3D workflows to reduce user workloads in areas such as 3D model editing, material design, and c… ▽ More Design tools in the 3D industry, while powerful, are still tedious and labor-intensive when it comes to bringing a creative idea for a visual effect to life. In this position paper, we discussed how an infamous generative synthetic media, deepfakes, could be of use and embedded into common sophisticated 3D workflows to reduce user workloads in areas such as 3D model editing, material design, and character animation. As a case discussion, we also prototyped a tool to address the retargeting problem in character animation. Although deepfakes themselves have received a negative public image, the results of our interviews with field experts are unexpectedly positive in regard to our tool that utilizes deepfake algorithms. Lastly, we also discussed our experience and observed design practices to put deepfakes to good use, including how we could avoid potential misuses directly by design, how this design changes user interactions, and subsequent open issues. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: 4 pages, 1 figure

ACM Class: I.3.7

arXiv:2203.02064 [pdf]

doi 10.34133/research.0065

Securing Data in Multimode Fibers by Exploiting Mode-Dependent Light Propagation Effects

Authors: Stefan Rothe, Karl-Ludwig Besser, David Krause, Robert Kuschmierz, Nektarios Koukourakis, Eduard Jorswieck, Jürgen W. Czarske

Abstract: Multimode fibers hold great promise to advance data rates in optical communications but come with the challenge to compensate for modal crosstalk and mode-dependent losses, resulting in strong distortions. The holographic measurement of the transmission matrix enables not only correcting distortions but also harnessing these effects for creating a confidential data connection between legitimate co… ▽ More Multimode fibers hold great promise to advance data rates in optical communications but come with the challenge to compensate for modal crosstalk and mode-dependent losses, resulting in strong distortions. The holographic measurement of the transmission matrix enables not only correcting distortions but also harnessing these effects for creating a confidential data connection between legitimate communication parties, Alice and Bob. The feasibility of this physical-layer-security-based approach is demonstrated experimentally for the first time on a multimode fiber link to which the eavesdropper Eve is physically coupled. Once the proper structured light field is launched at Alice's side, the message can be delivered to Bob, and, simultaneously, the decipherment for an illegitimate wiretapper Eve is destroyed. Within a real communication scenario, we implement wiretap codes and demonstrate confidentiality by quantifying the level of secrecy. Compared to an uncoded data transmission, the amount of securely exchanged data is enhanced by a factor of 538. The complex light transportation phenomena that have long been considered limiting and have restricted the widespread use of multimode fiber are exploited for opening new perspectives on information security in spatial multiplexing communication systems. △ Less

Submitted 10 March, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 14 pages, 8 figures

Journal ref: Research, vol. 6: 0065, Jan. 2023

arXiv:2109.00527 [pdf, other]

Boosting Search Engines with Interactive Agents

Authors: Leonard Adolphs, Benjamin Boerschinger, Christian Buck, Michelle Chen Huebscher, Massimiliano Ciaramita, Lasse Espeholt, Thomas Hofmann, Yannic Kilcher, Sascha Rothe, Pier Giuseppe Sessa, Lierni Sestorain Saralegui

Abstract: This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and s… ▽ More This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and search results. We develop a novel way of generating synthetic search sessions, which leverages the power of transformer-based language models through (self-)supervised learning. We also present a reinforcement learning agent with dynamically constrained actions that learns interactive search strategies from scratch. Our search agents obtain retrieval and answer quality performance comparable to recent neural methods, using only a traditional term-based BM25 ranking function and interpretable discrete reranking and filtering actions. △ Less

Submitted 7 June, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

Comments: Published in Transactions on Machine Learning Research (06/2022)

arXiv:2106.03830 [pdf, other]

A Simple Recipe for Multilingual Grammatical Error Correction

Authors: Sascha Rothe, Jonathan Mallinson, Eric Malmi, Sebastian Krause, Aliaksei Severyn

Abstract: This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previ… ▽ More This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having established a new set of baselines for GEC, we make our results easily reproducible and accessible by releasing a cLang-8 dataset. It is produced by using our best model, which we call gT5, to clean the targets of a widely used yet noisy lang-8 dataset. cLang-8 greatly simplifies typical GEC training pipelines composed of multiple fine-tuning stages -- we demonstrate that performing a single fine-tuning step on cLang-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English. △ Less

Submitted 9 August, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2105.11921 [pdf, other]

Focus Attention: Promoting Faithfulness and Diversity in Summarization

Authors: Rahul Aralikatte, Shashi Narayan, Joshua Maynez, Sascha Rothe, Ryan McDonald

Abstract: Professional summaries are written with document-level information, such as the theme of the document, in mind. This is in contrast with most seq2seq decoders which simultaneously learn to focus on salient content, while deciding what to generate, at each decoding step. With the motivation to narrow this gap, we introduce Focus Attention Mechanism, a simple yet effective method to encourage decode… ▽ More Professional summaries are written with document-level information, such as the theme of the document, in mind. This is in contrast with most seq2seq decoders which simultaneously learn to focus on salient content, while deciding what to generate, at each decoding step. With the motivation to narrow this gap, we introduce Focus Attention Mechanism, a simple yet effective method to encourage decoders to proactively generate tokens that are similar or topical to the input document. Further, we propose a Focus Sampling method to enable generation of diverse summaries, an area currently understudied in summarization. When evaluated on the BBC extreme summarization task, two state-of-the-art models augmented with Focus Attention generate summaries that are closer to the target and more faithful to their input documents, outperforming their vanilla counterparts on \rouge and multiple faithfulness measures. We also empirically demonstrate that Focus Sampling is more effective in generating diverse and faithful summaries than top-$k$ or nucleus sampling-based decoding methods. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: ACL 2021

arXiv:2105.03137 [pdf, other]

doi 10.1109/ISWCS49558.2021.9562176

Achievable Physical-Layer Secrecy in Multi-Mode Fiber Channels using Artificial Noise

Authors: Eduard Jorswieck, Andrew Lonnstrom, Karl-Ludwig Besser, Stefan Rothe, Juergen W. Czarske

Abstract: Reliable and secure communication is an important aspect of modern fiber optic communication. In this work we consider a multi-mode fiber (MMF) channel wiretapped by an eavesdropper. We assume the transmitter knows the legitimate channel, but statistical knowledge of the eavesdropper's channel only. We propose a transmission scheme with artificial noise (AN) for such a channel. In particular, we f… ▽ More Reliable and secure communication is an important aspect of modern fiber optic communication. In this work we consider a multi-mode fiber (MMF) channel wiretapped by an eavesdropper. We assume the transmitter knows the legitimate channel, but statistical knowledge of the eavesdropper's channel only. We propose a transmission scheme with artificial noise (AN) for such a channel. In particular, we formulate the corresponding optimization problem which aims to maximize the average secrecy rate and develop an algorithm to solve it. We apply this algorithm to actual measured MMF channels. As real fiber measurements show, for a 55 mode MMF we can achieve positive average secrecy rates with the proper use of AN. Furthermore, the gain compared to standard precoding and power allocation schemes is illustrated. △ Less

Submitted 7 May, 2021; originally announced May 2021.

Comments: 5 pages, 2 figures

arXiv:2010.01054 [pdf, other]

Unsupervised Text Style Transfer with Padded Masked Language Models

Authors: Eric Malmi, Aliaksei Severyn, Sascha Rothe

Abstract: We propose Masker, an unsupervised text-editing method for style transfer. To tackle cases when no parallel source-target pairs are available, we train masked language models (MLMs) for both the source and the target domain. Then we find the text spans where the two models disagree the most in terms of likelihood. This allows us to identify the source tokens to delete to transform the source text… ▽ More We propose Masker, an unsupervised text-editing method for style transfer. To tackle cases when no parallel source-target pairs are available, we train masked language models (MLMs) for both the source and the target domain. Then we find the text spans where the two models disagree the most in terms of likelihood. This allows us to identify the source tokens to delete to transform the source text to match the style of the target domain. The deleted tokens are replaced with the target MLM, and by using a padded MLM variant, we avoid having to predetermine the number of inserted tokens. Our experiments on sentence fusion and sentiment transfer demonstrate that Masker performs competitively in a fully unsupervised setting. Moreover, in low-resource settings, it improves supervised methods' accuracy by over 10 percentage points when pre-training them on silver training data generated by Masker. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2005.11216 [pdf, other]

A Generative Approach to Titling and Clustering Wikipedia Sections

Authors: Anjalie Field, Sascha Rothe, Simon Baumgartner, Cong Yu, Abe Ittycheriah

Abstract: We evaluate the performance of transformer encoders with various decoders for information organization through a new task: generation of section headings for Wikipedia articles. Our analysis shows that decoders containing attention mechanisms over the encoder output achieve high-scoring results by generating extractive text. In contrast, a decoder without attention better facilitates semantic enco… ▽ More We evaluate the performance of transformer encoders with various decoders for information organization through a new task: generation of section headings for Wikipedia articles. Our analysis shows that decoders containing attention mechanisms over the encoder output achieve high-scoring results by generating extractive text. In contrast, a decoder without attention better facilitates semantic encoding and can be used to generate section embeddings. We additionally introduce a new loss function, which further encourages the decoder to generate high-quality embeddings. △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: Accepted to WNGT Workshop at ACL 2020

arXiv:1909.08535 [pdf, other]

Physical Layer Security in Multimode Fiber Optical Networks

Authors: Stefan Rothe, Nektarios Koukourakis, Hannes Radner, Andrew Lonnstrom, Eduard Jorswieck, Jürgen W. Czarske

Abstract: Inverse precoding algorithms in multimode fiber based communication networks are used to exploit mode dependent losses on the physical layer. This provides an asymmetry between legitimate (Bob) and unlegitimate (Eve) receiver of messages resulting in a significant SNR advantage for Bob. In combination with dynamic mode channel changes, Eve has no chance to reconstruct a sent message even in a wors… ▽ More Inverse precoding algorithms in multimode fiber based communication networks are used to exploit mode dependent losses on the physical layer. This provides an asymmetry between legitimate (Bob) and unlegitimate (Eve) receiver of messages resulting in a significant SNR advantage for Bob. In combination with dynamic mode channel changes, Eve has no chance to reconstruct a sent message even in a worst case scenario in which she is almighty. This is the first time, Physical Layer Security in a fiber optical network is investigated on the basis of measured transmission matrices. These results show that messages can be sent securely with conventional communication techniques. Translating the task of securing data from software to hardware represents the potential of a scientific paradigm shift. The introduced technique is a step towards the development of cyber physical systems. △ Less

Submitted 12 September, 2019; originally announced September 2019.

arXiv:1909.01187 [pdf, other]

Encode, Tag, Realize: High-Precision Text Editing

Authors: Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, Aliaksei Severyn

Abstract: We propose LaserTagger - a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: kee** a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach i… ▽ More We propose LaserTagger - a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: kee** a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach is evaluated on English text on four tasks: sentence fusion, sentence splitting, abstractive summarization, and grammar correction. LaserTagger achieves new state-of-the-art results on three of these tasks, performs comparably to a set of strong seq2seq baselines with a large number of training examples, and outperforms them when the number of examples is limited. Furthermore, we show that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: EMNLP 2019

arXiv:1907.12461 [pdf, ps, other]

doi 10.1162/tacl_a_00313

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Authors: Sascha Rothe, Shashi Narayan, Aliaksei Severyn

Abstract: Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the e… ▽ More Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion. △ Less

Submitted 16 April, 2020; v1 submitted 29 July, 2019; originally announced July 2019.

Comments: To be published in Transactions of the Association for Computational Linguistics (TACL)

arXiv:1809.08731 [pdf, ps, other]

Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!

Authors: Katharina Kann, Sascha Rothe, Katja Filippova

Abstract: Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though wor… ▽ More Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though word-overlap metrics like ROUGE are computed with the help of hand-written references, our referenceless methods obtain a significantly higher correlation with human fluency scores on a benchmark dataset of compressed sentences. Finally, we present ROUGE-LM, a reference-based metric which is a natural extension of WPSLOR to the case of available references. We show that ROUGE-LM yields a significantly higher correlation with human judgments than all baseline metrics, including WPSLOR on its own. △ Less

Submitted 23 September, 2018; originally announced September 2018.

Comments: Accepted to CoNLL 2018

arXiv:1711.11383 [pdf, other]

Learning to Learn from Weak Supervision by Full Supervision

Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

Abstract: In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that ar… ▽ More In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to control the magnitude of the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model. △ Less

Submitted 30 November, 2017; originally announced November 2017.

Comments: Accepted at NIPS Workshop on Meta-Learning (MetaLearn 2017), Long Beach, CA, USA

arXiv:1711.00313 [pdf, other]

Avoiding Your Teacher's Mistakes: Training Neural Networks with Controlled Weak Supervision

Authors: Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps

Abstract: Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or user click-through data for training. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and then fine-t… ▽ More Training deep neural networks requires massive amounts of training data, but for many tasks only limited labeled data is available. This makes weak supervision attractive, using weak or noisy signals like the output of heuristic methods or user click-through data for training. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and then fine-tune the parameters with a small amount of data with true labels. This feels intuitively sub-optimal as these two independent stages leave the model unaware about the varying label quality. What if we could somehow inform the model about the label quality? In this paper, we propose a semi-supervised learning method where we train two neural networks in a multi-task fashion: a "target network" and a "confidence network". The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to weight the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model. We evaluate our learning strategy on two different tasks: document ranking and sentiment classification. The results demonstrate that our approach not only enhances the performance compared to the baselines but also speeds up the learning process from weak labels. △ Less

Submitted 7 December, 2017; v1 submitted 1 November, 2017; originally announced November 2017.

arXiv:1708.03418 [pdf, other]

Learning to Attend, Copy, and Generate for Session-Based Query Suggestion

Authors: Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, Pascal Fleury

Abstract: Users try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search process. In this paper, we propose a customized sequence-to-sequence model for session-based query suggestion. In our model, we employ a query-aware… ▽ More Users try to articulate their complex information needs during search sessions by reformulating their queries. To make this process more effective, search engines provide related queries to help users in specifying the information need in their search process. In this paper, we propose a customized sequence-to-sequence model for session-based query suggestion. In our model, we employ a query-aware attention mechanism to capture the structure of the session context. is enables us to control the scope of the session from which we infer the suggested next query, which helps not only handle the noisy data but also automatically detect session boundaries. Furthermore, we observe that, based on the user query reformulation behavior, within a single session a large portion of query terms is retained from the previously submitted queries and consists of mostly infrequent or unseen terms that are usually not included in the vocabulary. We therefore empower the decoder of our model to access the source words from the session context during decoding by incorporating a copy mechanism. Moreover, we propose evaluation metrics to assess the quality of the generative models for query suggestion. We conduct an extensive set of experiments and analysis. e results suggest that our model outperforms the baselines both in terms of the generating queries and scoring candidate queries for the task of query suggestion. △ Less

Submitted 13 November, 2017; v1 submitted 10 August, 2017; originally announced August 2017.

Comments: Accepted to be published at The 26th ACM International Conference on Information and Knowledge Management (CIKM2017)

arXiv:1602.07572 [pdf, other]

doi 10.18653/v1/N16-1091

Ultradense Word Embeddings by Orthogonal Transformation

Authors: Sascha Rothe, Sebastian Ebert, Hinrich Schütze

Abstract: Embeddings are generic representations that are useful for many NLP tasks. In this paper, we introduce DENSIFIER, a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We show that ultradense embeddings generated by DENSIFIER… ▽ More Embeddings are generic representations that are useful for many NLP tasks. In this paper, we introduce DENSIFIER, a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We show that ultradense embeddings generated by DENSIFIER reach state of the art on a lexicon creation task in which words are annotated with three types of lexical information - sentiment, concreteness and frequency. On the SemEval2015 10B sentiment analysis task we show that no information is lost when the ultradense subspace is used, but training is an order of magnitude more efficient due to the compactness of the ultradense space. △ Less

Submitted 8 May, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

arXiv:1507.01127 [pdf, other]

doi 10.3115/v1/P15-1173

AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes

Authors: Sascha Rothe, Hinrich Schütze

Abstract: We present \textit{AutoExtend}, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resour… ▽ More We present \textit{AutoExtend}, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resource, but AutoExtend can be easily applied to other resources like Freebase. AutoExtend achieves state-of-the-art performance on word similarity and word sense disambiguation tasks. △ Less

Submitted 4 July, 2015; originally announced July 2015.

Showing 1–19 of 19 results for author: Rothe, S