-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization
Authors:
Alexandra Chronopoulou,
Jonas Pfeiffer,
Joshua Maynez,
Xinyi Wang,
Sebastian Ruder,
Priyanka Agrawal
Abstract:
Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task spec…
▽ More
Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task specialized parameters. Our method composes language and task PEFT modules via element-wise arithmetic operations to leverage unlabeled data and English labeled data. We extend our approach to cases where labeled data from more languages is available and propose to arithmetically compose PEFT modules trained on languages related to the target. Empirical results on summarization demonstrate that our method is an effective strategy that obtains consistent gains using minimal training of PEFT modules.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss
Authors:
Yihong Liu,
Alexandra Chronopoulou,
Hinrich Schütze,
Alexander Fraser
Abstract:
Although unsupervised neural machine translation (UNMT) has achieved success in many language pairs, the copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs, especially when low-resource languages are involved. We find this issue is closely related to an unexpected copying behavior during online back-translation (BT).…
▽ More
Although unsupervised neural machine translation (UNMT) has achieved success in many language pairs, the copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs, especially when low-resource languages are involved. We find this issue is closely related to an unexpected copying behavior during online back-translation (BT). In this work, we propose a simple but effective training schedule that incorporates a language discriminator loss. The loss imposes constraints on the intermediate translation so that the translation is in the desired language. By conducting extensive experiments on different language pairs, including similar and distant, high and low-resource languages, we find that our method alleviates the copying problem, thus improving the translation performance on low-resource languages.
△ Less
Submitted 4 June, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters
Authors:
Proyag Pal,
Brian Thompson,
Yogesh Virkar,
Prashant Mathur,
Alexandra Chronopoulou,
Marcello Federico
Abstract:
To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated speech needs to be aligned with the source in terms of speech durations. We introduce target factors in a transformer model to predict durations jointly with target language phoneme sequences. We also introduce auxiliary counters to help the decoder to keep track of the timing information while…
▽ More
To translate speech for automatic dubbing, machine translation needs to be isochronous, i.e. translated speech needs to be aligned with the source in terms of speech durations. We introduce target factors in a transformer model to predict durations jointly with target language phoneme sequences. We also introduce auxiliary counters to help the decoder to keep track of the timing information while generating target phonemes. We show that our model improves translation quality and isochrony compared to previous work where the translation model is instead trained to predict interleaved sequences of phonemes and durations.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation
Authors:
Wen Lai,
Alexandra Chronopoulou,
Alexander Fraser
Abstract:
Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration. The data imbalance problem refers to the imbalance in the amount of parallel corpora for all language pairs, especially for long-tail languages (i.e., very low-resource languages). The representation degeneration proble…
▽ More
Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration. The data imbalance problem refers to the imbalance in the amount of parallel corpora for all language pairs, especially for long-tail languages (i.e., very low-resource languages). The representation degeneration problem refers to the problem of encoded tokens tending to appear only in a small subspace of the full space available to the MNMT model. To solve these two issues, we propose Bi-ACL, a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model. We define two modules, named bidirectional autoencoder and bidirectional contrastive learning, which we combine with an online constrained beam search and a curriculum learning sampling strategy. Extensive experiments show that our proposed method is more effective both in long-tail languages and in high-resource languages. We also demonstrate that our approach is capable of transferring knowledge between domains and languages in zero-shot scenarios.
△ Less
Submitted 24 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Optimal Sampling for Estimation of Fractional Brownian Motion
Authors:
Xiang Cui,
Alexandra Chronopoulou
Abstract:
In this paper, we focus on multiple sampling problems for the estimation of the fractional Brownian motion when the maximum number of samples is limited, extending existing results in the literature in a non-Markovian framework. Two classes of sampling schemes are proposed: a deterministic scheme and a level-triggered scheme. For the deterministic sampling scheme, the sampling times are selected b…
▽ More
In this paper, we focus on multiple sampling problems for the estimation of the fractional Brownian motion when the maximum number of samples is limited, extending existing results in the literature in a non-Markovian framework. Two classes of sampling schemes are proposed: a deterministic scheme and a level-triggered scheme. For the deterministic sampling scheme, the sampling times are selected beforehand and do not depend on the process trajectory. For the level-triggered sampling scheme, the sampling times are the times when the process crosses predetermined thresholds. The sampling times are selected sequentially in time and depend on the process trajectory. For each of the schemes, we derive the optimal sampling times by minimizing the aggregate squared error distortion. We then show that the optimal sampling strategies heavily depend on the dependence structure of the process.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing
Authors:
Alexandra Chronopoulou,
Brian Thompson,
Prashant Mathur,
Yogesh Virkar,
Surafel M. Lakew,
Marcello Federico
Abstract:
Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech. The new target language speech should satisfy isochrony; that is, the new speech should be time aligned with the original video, including mouth movements, pauses, hand gestures, etc. In this paper, we propose training a model that directly optimizes both the translation as well as the spe…
▽ More
Automatic dubbing (AD) is the task of translating the original speech in a video into target language speech. The new target language speech should satisfy isochrony; that is, the new speech should be time aligned with the original video, including mouth movements, pauses, hand gestures, etc. In this paper, we propose training a model that directly optimizes both the translation as well as the speech duration of the generated translations. We show that this system generates speech that better matches the timing of the original speech, compared to prior work, while simplifying the system architecture.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
Authors:
Alexandra Chronopoulou,
Matthew E. Peters,
Alexander Fraser,
Jesse Dodge
Abstract:
Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel d…
▽ More
Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains.
△ Less
Submitted 28 March, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
$m^4Adapter$: Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter
Authors:
Wen Lai,
Alexandra Chronopoulou,
Alexander Fraser
Abstract:
Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair seen at training time. However, when a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically. We consider a very challenging scenario: adapting the MNMT model both to a new domain and to a new language…
▽ More
Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair seen at training time. However, when a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically. We consider a very challenging scenario: adapting the MNMT model both to a new domain and to a new language pair at the same time. In this paper, we propose $m^4Adapter$ (Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter), which combines domain and language knowledge using meta-learning with adapters. We present results showing that our approach is a parameter-efficient solution which effectively adapts a model to both a new language pair and a new domain, while outperforming other adapter methods. An ablation study also shows that our approach more effectively transfers domain knowledge across different languages and language information across different domains.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation
Authors:
Alexandra Chronopoulou,
Dario Stojanovski,
Alexander Fraser
Abstract:
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Self-supervised pretrained models are often fine-tuned on parallel data from one or multiple language pairs for machine translation. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be p…
▽ More
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Self-supervised pretrained models are often fine-tuned on parallel data from one or multiple language pairs for machine translation. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive. Training a new adapter on each language pair or training a single adapter on all language pairs without updating the pretrained model has been proposed as a parameter-efficient alternative. However, the former does not permit any sharing between languages, while the latter shares parameters for all languages and is susceptible to negative interference. In this paper, we propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer. Our approach outperforms related baselines, yielding higher translation scores on average when translating from English to 17 different low-resource languages. We also show that language-family adapters provide an effective method to translate to languages unseen during pretraining.
△ Less
Submitted 29 March, 2023; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Efficient Hierarchical Domain Adaptation for Pretrained Language Models
Authors:
Alexandra Chronopoulou,
Matthew E. Peters,
Jesse Dodge
Abstract:
The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is rarely used during training. Transferring their knowledge to a target domain is typically done by continuing training in-domain. In this paper, we…
▽ More
The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is rarely used during training. Transferring their knowledge to a target domain is typically done by continuing training in-domain. In this paper, we introduce a method to permit domain adaptation to many diverse domains using a computationally efficient adapter approach. Our method is based on the observation that textual domains are partially overlap**, and we represent domains as a hierarchical tree structure where each node in the tree is associated with a set of adapter weights. When combined with a frozen pretrained language model, this approach enables parameter sharing among related domains, while avoiding negative interference between unrelated ones. Experimental results with GPT-2 and a large fraction of the 100 most represented websites in C4 show across-the-board improvements in-domain. We additionally provide an inference time algorithm for a held-out domain and show that averaging over multiple paths through the tree enables further gains in generalization, while adding only a marginal cost to inference.
△ Less
Submitted 3 May, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation
Authors:
Alexandra Chronopoulou,
Dario Stojanovski,
Alexander Fraser
Abstract:
Successful methods for unsupervised neural machine translation (UNMT) employ crosslingual pretraining via self-supervision, often in the form of a masked language modeling or a sequence generation task, which requires the model to align the lexical- and high-level representations of the two languages. While cross-lingual pretraining works for similar languages with abundant corpora, it performs po…
▽ More
Successful methods for unsupervised neural machine translation (UNMT) employ crosslingual pretraining via self-supervision, often in the form of a masked language modeling or a sequence generation task, which requires the model to align the lexical- and high-level representations of the two languages. While cross-lingual pretraining works for similar languages with abundant corpora, it performs poorly in low-resource and distant languages. Previous research has shown that this is because the representations are not sufficiently aligned. In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings. Empirical results demonstrate improved performance both on UNMT (up to 4.5 BLEU) and bilingual lexicon induction using our method compared to a UNMT baseline.
△ Less
Submitted 14 April, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task
Authors:
Alexandra Chronopoulou,
Dario Stojanovski,
Viktor Hangya,
Alexander Fraser
Abstract:
This paper describes the submission of LMU Munich to the WMT 2020 unsupervised shared task, in two language directions, German<->Upper Sorbian. Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing…
▽ More
This paper describes the submission of LMU Munich to the WMT 2020 unsupervised shared task, in two language directions, German<->Upper Sorbian. Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing a UNMT model, which is trained with online backtranslation. Pseudo-parallel data obtained from an unsupervised statistical machine translation (USMT) system is used to fine-tune the UNMT model. We also apply BPE-Dropout to the low resource (Upper Sorbian) data to obtain a more robust system. We additionally experiment with residual adapters and find them useful in the Upper Sorbian->German direction. We explore sampling during backtranslation and curriculum learning to use SMT translations in a more principled way. Finally, we ensemble our best-performing systems and reach a BLEU score of 32.4 on German->Upper Sorbian and 35.2 on Upper Sorbian->German.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
Domain Adversarial Fine-Tuning as an Effective Regularizer
Authors:
Giorgos Vernikos,
Katerina Margatina,
Alexandra Chronopoulou,
Ion Androutsopoulos
Abstract:
In Natural Language Processing (NLP), pretrained language models (LMs) that are transferred to downstream tasks have been recently shown to achieve state-of-the-art results. However, standard fine-tuning can degrade the general-domain representations captured during pretraining. To address this issue, we introduce a new regularization technique, AFTER; domain Adversarial Fine-Tuning as an Effectiv…
▽ More
In Natural Language Processing (NLP), pretrained language models (LMs) that are transferred to downstream tasks have been recently shown to achieve state-of-the-art results. However, standard fine-tuning can degrade the general-domain representations captured during pretraining. To address this issue, we introduce a new regularization technique, AFTER; domain Adversarial Fine-Tuning as an Effective Regularizer. Specifically, we complement the task-specific loss used during fine-tuning with an adversarial objective. This additional loss term is related to an adversarial classifier, that aims to discriminate between in-domain and out-of-domain text representations. In-domain refers to the labeled dataset of the task at hand while out-of-domain refers to unlabeled data from a different domain. Intuitively, the adversarial classifier acts as a regularizer which prevents the model from overfitting to the task-specific domain. Empirical results on various natural language understanding tasks show that AFTER leads to improved performance compared to standard fine-tuning.
△ Less
Submitted 5 October, 2020; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT
Authors:
Alexandra Chronopoulou,
Dario Stojanovski,
Alexander Fraser
Abstract:
Using a language model (LM) pretrained on two languages with large monolingual data in order to initialize an unsupervised neural machine translation (UNMT) system yields state-of-the-art results. When limited data is available for one language, however, this method leads to poor translations. We present an effective approach that reuses an LM that is pretrained only on the high-resource language.…
▽ More
Using a language model (LM) pretrained on two languages with large monolingual data in order to initialize an unsupervised neural machine translation (UNMT) system yields state-of-the-art results. When limited data is available for one language, however, this method leads to poor translations. We present an effective approach that reuses an LM that is pretrained only on the high-resource language. The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model. To reuse the pretrained LM, we have to modify its predefined vocabulary, to account for the new language. We therefore propose a novel vocabulary extension method. Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq), yielding more than +8.3 BLEU points for all four translation directions.
△ Less
Submitted 6 October, 2020; v1 submitted 16 September, 2020;
originally announced September 2020.
-
An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models
Authors:
Alexandra Chronopoulou,
Christos Baziotis,
Alexandros Potamianos
Abstract:
A growing number of state-of-the-art transfer learning methods employ language models pretrained on large generic corpora. In this paper we present a conceptually simple and effective transfer learning approach that addresses the problem of catastrophic forgetting. Specifically, we combine the task-specific optimization function with an auxiliary language model objective, which is adjusted during…
▽ More
A growing number of state-of-the-art transfer learning methods employ language models pretrained on large generic corpora. In this paper we present a conceptually simple and effective transfer learning approach that addresses the problem of catastrophic forgetting. Specifically, we combine the task-specific optimization function with an auxiliary language model objective, which is adjusted during the training process. This preserves language regularities captured by language models, while enabling sufficient adaptation for solving the target task. Our method does not require pretraining or finetuning separate components of the network and we train our models end-to-end in a single step. We present results on a variety of challenging affective and text classification tasks, surpassing well established transfer learning methods with greater level of complexity.
△ Less
Submitted 31 May, 2019; v1 submitted 27 February, 2019;
originally announced February 2019.
-
NTUA-SLP at IEST 2018: Ensemble of Neural Transfer Methods for Implicit Emotion Classification
Authors:
Alexandra Chronopoulou,
Aikaterini Margatina,
Christos Baziotis,
Alexandros Potamianos
Abstract:
In this paper we present our approach to tackle the Implicit Emotion Shared Task (IEST) organized as part of WASSA 2018 at EMNLP 2018. Given a tweet, from which a certain word has been removed, we are asked to predict the emotion of the missing word. In this work, we experiment with neural Transfer Learning (TL) methods. Our models are based on LSTM networks, augmented with a self-attention mechan…
▽ More
In this paper we present our approach to tackle the Implicit Emotion Shared Task (IEST) organized as part of WASSA 2018 at EMNLP 2018. Given a tweet, from which a certain word has been removed, we are asked to predict the emotion of the missing word. In this work, we experiment with neural Transfer Learning (TL) methods. Our models are based on LSTM networks, augmented with a self-attention mechanism. We use the weights of various pretrained models, for initializing specific layers of our networks. We leverage a big collection of unlabeled Twitter messages, for pretraining word2vec word embeddings and a set of diverse language models. Moreover, we utilize a sentiment analysis dataset for pretraining a model, which encodes emotion related information. The submitted model consists of an ensemble of the aforementioned TL models. Our team ranked 3rd out of 30 participants, achieving an F1 score of 0.703.
△ Less
Submitted 3 September, 2018;
originally announced September 2018.
-
A Customer Choice Model with HALO Effect
Authors:
Reza Yousefi Maragheh,
Alexandra Chronopoulou,
James Mario Davis
Abstract:
In this paper, we propose an extension to the multinomial logit (MNL) model, the Halo MNL, that takes into account the interaction effects among products in an assortment. In particular, this model incorporates pairwise interactions of items in an effort to describe positive/negative effects among products that are present/absent in the assortment. Furthermore, we are interested in establishing su…
▽ More
In this paper, we propose an extension to the multinomial logit (MNL) model, the Halo MNL, that takes into account the interaction effects among products in an assortment. In particular, this model incorporates pairwise interactions of items in an effort to describe positive/negative effects among products that are present/absent in the assortment. Furthermore, we are interested in establishing sufficient conditions for identifiability, in order to build robust estimation methods. Under strict identifiability conditions, we use maximum likelihood to estimate the model parameters for which we derive closed formulas. We also perform simulation experiments, in order to numerically evaluate our method, study the accuracy of the estimators and compare it with the MNL. Last, we fit our model in the Hotel Chain dataset in Bodea et al., and we compare it with MNL in terms of efficiency, accuracy and robustness. We conclude that for rich enough datasets the model that includes interaction effects performs better in terms of how well it fits the data.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
NTUA-SLP at SemEval-2018 Task 1: Predicting Affective Content in Tweets with Deep Attentive RNNs and Transfer Learning
Authors:
Christos Baziotis,
Nikos Athanasiou,
Alexandra Chronopoulou,
Athanasia Kolovou,
Georgios Paraskevopoulos,
Nikolaos Ellinas,
Shrikanth Narayanan,
Alexandros Potamianos
Abstract:
In this paper we present deep-learning models that submitted to the SemEval-2018 Task~1 competition: "Affect in Tweets". We participated in all subtasks for English tweets. We propose a Bi-LSTM architecture equipped with a multi-layer self attention mechanism. The attention mechanism improves the model performance and allows us to identify salient words in tweets, as well as gain insight into the…
▽ More
In this paper we present deep-learning models that submitted to the SemEval-2018 Task~1 competition: "Affect in Tweets". We participated in all subtasks for English tweets. We propose a Bi-LSTM architecture equipped with a multi-layer self attention mechanism. The attention mechanism improves the model performance and allows us to identify salient words in tweets, as well as gain insight into the models making them more interpretable. Our model utilizes a set of word2vec word embeddings trained on a large collection of 550 million Twitter messages, augmented by a set of word affective features. Due to the limited amount of task-specific training data, we opted for a transfer learning approach by pretraining the Bi-LSTMs on the dataset of Semeval 2017, Task 4A. The proposed approach ranked 1st in Subtask E "Multi-Label Emotion Classification", 2nd in Subtask A "Emotion Intensity Regression" and achieved competitive results in other subtasks.
△ Less
Submitted 18 April, 2018;
originally announced April 2018.
-
Sequential Monte Carlo for fractional Stochastic Volatility Models
Authors:
Alexandra Chronopoulou,
Konstantinos Spiliopoulos
Abstract:
In this paper we consider a fractional stochastic volatility model, that is a model in which the volatility may exhibit a long-range dependent or a rough/antipersistent behavior. We propose a dynamic sequential Monte Carlo methodology that is applicable to both long memory and antipersistent processes in order to estimate the volatility as well as the unknown parameters of the model. We establish…
▽ More
In this paper we consider a fractional stochastic volatility model, that is a model in which the volatility may exhibit a long-range dependent or a rough/antipersistent behavior. We propose a dynamic sequential Monte Carlo methodology that is applicable to both long memory and antipersistent processes in order to estimate the volatility as well as the unknown parameters of the model. We establish a central limit theorem for the state and parameter filters and we study asymptotic properties (consistency and asymptotic normality) for the filter. We illustrate our results with a simulation study and we apply our method to estimating the volatility and the parameters of a long-range dependent model for S&P 500 data.
△ Less
Submitted 25 February, 2017; v1 submitted 11 August, 2015;
originally announced August 2015.
-
Maximum likelihood estimation for small noise multiscale diffusions
Authors:
Konstantinos Spiliopoulos,
Alexandra Chronopoulou
Abstract:
We study the problem of parameter estimation for stochastic differential equations with small noise and fast oscillating parameters. Depending on how fast the intensity of the noise goes to zero relative to the homogenization parameter, we consider three different regimes. For each regime, we construct the maximum likelihood estimator and we study its consistency and asymptotic normality propertie…
▽ More
We study the problem of parameter estimation for stochastic differential equations with small noise and fast oscillating parameters. Depending on how fast the intensity of the noise goes to zero relative to the homogenization parameter, we consider three different regimes. For each regime, we construct the maximum likelihood estimator and we study its consistency and asymptotic normality properties. A simulation study for the first order Langevin equation with a two scale potential is also provided.
△ Less
Submitted 18 February, 2015; v1 submitted 27 January, 2013;
originally announced January 2013.
-
On inference for fractional differential equations
Authors:
Alexandra Chronopoulou,
Samy Tindel
Abstract:
Based on Malliavin calculus tools and approximation results, we show how to compute a maximum likelihood type estimator for a rather general differential equation driven by a fractional Brownian motion with Hurst parameter H>1/2. Rates of convergence for the approximation task are provided, and numerical experiments show that our procedure leads to good results in terms of estimation.
Based on Malliavin calculus tools and approximation results, we show how to compute a maximum likelihood type estimator for a rather general differential equation driven by a fractional Brownian motion with Hurst parameter H>1/2. Rates of convergence for the approximation task are provided, and numerical experiments show that our procedure leads to good results in terms of estimation.
△ Less
Submitted 20 April, 2011;
originally announced April 2011.
-
Optimal sequential change-detection for fractional diffusion-type processes
Authors:
Alexandra Chronopoulou,
Georgios Fellouris
Abstract:
We consider the problem of detecting an abrupt change in the distribution of a sequentially observed stochastic process. We establish the optimality of the CUSUM test with respect to a modified version of Lorden's criterion for arbitrary processes with continuous paths and apply this general result to the special case of fractional diffusion-type processes. As a by-product, we show that the CUSUM…
▽ More
We consider the problem of detecting an abrupt change in the distribution of a sequentially observed stochastic process. We establish the optimality of the CUSUM test with respect to a modified version of Lorden's criterion for arbitrary processes with continuous paths and apply this general result to the special case of fractional diffusion-type processes. As a by-product, we show that the CUSUM test optimizes Lorden's original criterion when a fractional Brownian motion with Hurst index H adopts a polynomial drift term with exponent H + 1/2 after the change.
△ Less
Submitted 12 July, 2012; v1 submitted 2 February, 2011;
originally announced February 2011.
-
Variations and Hurst index estimation for a Rosenblatt process using longer filters
Authors:
Alexandra Chronopoulou,
Ciprian Tudor,
Frederi Viens
Abstract:
The Rosenblatt process is a self-similar non-Gaussian process which lives in second Wiener chaos, and occurs as the limit of correlated random sequences in so-called \textquotedblleft non-central limit theorems\textquotedblright. It shares the same covariance as fractional Brownian motion. We study the asymptotic distribution of the quadratic variations of the Rosenblatt process based on long fi…
▽ More
The Rosenblatt process is a self-similar non-Gaussian process which lives in second Wiener chaos, and occurs as the limit of correlated random sequences in so-called \textquotedblleft non-central limit theorems\textquotedblright. It shares the same covariance as fractional Brownian motion. We study the asymptotic distribution of the quadratic variations of the Rosenblatt process based on long filters, including filters based on high-order finite-difference and wavelet-based schemes. We find exact formulas for the limiting distributions, which we then use to devise strongly consistent estimators of the self-similarity parameter $H$. Unlike the case of fractional Brownian motion, no matter now high the filter orders are, the estimators are never asymptotically normal, converging instead in the mean square to the observed value of the Rosenblatt process at time 1.
△ Less
Submitted 16 December, 2009;
originally announced December 2009.
-
Self-similarity parameter estimation and reproduction property for non-Gaussian Hermite processes
Authors:
Alexandra Chronopoulou,
Frederi Viens,
Ciprian Tudor
Abstract:
We consider the class of all the Hermite processes $(Z_{t}^{(q,H)})_{t\in \lbrack 0,1]}$ of order $q\in \mathbf{N}^{\ast}$ and with Hurst parameter $% H\in (\frac{1}{2},1)$. The process $Z^{(q,H)}$ is $H$-selfsimilar, it has stationary increments and it exhibits long-range dependence identical to that of fractional Brownian motion (fBm). For $q=1$, $Z^{(1,H)}$ is fBm, which is Gaussian; for $q=2$,…
▽ More
We consider the class of all the Hermite processes $(Z_{t}^{(q,H)})_{t\in \lbrack 0,1]}$ of order $q\in \mathbf{N}^{\ast}$ and with Hurst parameter $% H\in (\frac{1}{2},1)$. The process $Z^{(q,H)}$ is $H$-selfsimilar, it has stationary increments and it exhibits long-range dependence identical to that of fractional Brownian motion (fBm). For $q=1$, $Z^{(1,H)}$ is fBm, which is Gaussian; for $q=2$, $Z^{(2,H)}$ is the Rosenblatt process, which lives in the second Wiener chaos; for any $q>2$, $Z^{(q,H)}$ is a process in the $q$th Wiener chaos. We study the variations of $Z^{(q,H)}$ for any $q$, by using multiple Wiener -Itô stochastic integrals and Malliavin calculus. We prove a reproduction property for this class of processes in the sense that the terms appearing in the chaotic decomposition of their variations give rise to other Hermite processes of different orders and with different Hurst parameters. We apply our results to construct a strongly consistent estimator for the self-similarity parameter $H$ from discrete observations of $Z^{(q,H)}$; the asymptotics of this estimator, after appropriate normalization, are proved to be distributed like a Rosenblatt random variable (value at time $1$ of a Rosenblatt process).with self-similarity parameter $1+2(H-1)/q$.
△ Less
Submitted 18 June, 2010; v1 submitted 8 July, 2008;
originally announced July 2008.