-
M2QA: Multi-domain Multilingual Question Answering
Authors:
Leon Engländer,
Hannah Sterz,
Clifton Poth,
Jonas Pfeiffer,
Ilia Kuznetsov,
Iryna Gurevych
Abstract:
Generalization and robustness to input variation are core desiderata of machine learning research. Language varies along several axes, most importantly, language instance (e.g. French) and domain (e.g. news). While adapting NLP models to new languages within a single domain, or to new domains within a single language, is widely studied, research in joint adaptation is hampered by the lack of evalu…
▽ More
Generalization and robustness to input variation are core desiderata of machine learning research. Language varies along several axes, most importantly, language instance (e.g. French) and domain (e.g. news). While adapting NLP models to new languages within a single domain, or to new domains within a single language, is widely studied, research in joint adaptation is hampered by the lack of evaluation datasets. This prevents the transfer of NLP systems from well-resourced languages and domains to non-dominant language-domain combinations. To address this gap, we introduce M2QA, a multi-domain multilingual question answering benchmark. M2QA includes 13,500 SQuAD 2.0-style question-answer instances in German, Turkish, and Chinese for the domains of product reviews, news, and creative writing. We use M2QA to explore cross-lingual cross-domain performance of fine-tuned models and state-of-the-art LLMs and investigate modular approaches to domain and language adaptation. We witness 1) considerable performance variations across domain-language combinations within model classes and 2) considerable performance drops between source and target language-domain combinations across all model sizes. We demonstrate that M2QA is far from solved, and new methods to effectively transfer both linguistic and domain-specific information are necessary. We make M2QA publicly available at https://github.com/UKPLab/m2qa.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Resonant Ion Radiation Scattering and the Integrated Atomic Cross-Section as applied to Binary Star Shock Fronts
Authors:
Raymond J. Pfeiffer
Abstract:
The current literature is rather vague regarding how to calculate the exact numerical value of the resonant ion scattering cross-section that should be used for a specific bandpass of finite width. Such a value was needed in order to calculate the ion and mass densities in the shock fronts of hot, close binary star systems. This was done based on a modeling of ultraviolet wind-line profiles, using…
▽ More
The current literature is rather vague regarding how to calculate the exact numerical value of the resonant ion scattering cross-section that should be used for a specific bandpass of finite width. Such a value was needed in order to calculate the ion and mass densities in the shock fronts of hot, close binary star systems. This was done based on a modeling of ultraviolet wind-line profiles, using IUE spectra. Therefore, a numerical integration has been carried out, in wavelength-space, of the exact expression for the cross-section over two band-passes of astrophysical interest. The exact expression employed was that derived from a solution of the Abraham-Lorentz equation. The numerical results depend on the resonant wavelength, which is taken to be at the center of the bandpass. Most texts on the subject derive an expression for the scattering cross-section in frequency-space, based on the assumption that the radiation reaction term in the Abraham-Lorentz equation may be approximated by a resistive term. The integral of this cross-section over the entire spectrum is independent of the resonant frequency, except for the transition probability. This has limited practical use when dealing with fluxes measured in a bandpass of finite width expressed in wavelength units and scattering is the only mechanism for producing the observed fluxes. Such is the case when dealing with the low densities encountered in stellar winds and shock fronts.
Integrated cross-sections that depend on the resonant wavelength are used to determine the number and mass densities of C IV and N V ions in the shock fronts found in some hot, eclipsing binary star systems, for which several IUE spectra have been obtained over a Keplerian orbital period. This then leans to a determination of the mass density and total mass in the shock once the volume of the shock is determined.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Authors:
Clifton Poth,
Hannah Sterz,
Indraneil Paul,
Sukannya Purkayastha,
Leon Engländer,
Timo Imhof,
Ivan Vulić,
Sebastian Ruder,
Iryna Gurevych,
Jonas Pfeiffer
Abstract:
We introduce Adapters, an open-source library that unifies parameter-efficient and modular transfer learning in large language models. By integrating 10 diverse adapter methods into a unified interface, Adapters offers ease of use and flexible configuration. Our library allows researchers and practitioners to leverage adapter modularity through composition blocks, enabling the design of complex ad…
▽ More
We introduce Adapters, an open-source library that unifies parameter-efficient and modular transfer learning in large language models. By integrating 10 diverse adapter methods into a unified interface, Adapters offers ease of use and flexible configuration. Our library allows researchers and practitioners to leverage adapter modularity through composition blocks, enabling the design of complex adapter setups. We demonstrate the library's efficacy by evaluating its performance against full fine-tuning on various NLP tasks. Adapters provides a powerful tool for addressing the challenges of conventional fine-tuning paradigms and promoting more efficient and modular transfer learning. The library is available via https://adapterhub.ml/adapters.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
First, Do No Harm: Algorithms, AI, and Digital Product Liability
Authors:
Marc J. Pfeiffer
Abstract:
The ethical imperative for technology should be first, do no harm. But digital innovations like AI and social media increasingly enable societal harms, from bias to misinformation. As these technologies grow ubiquitous, we need solutions to address unintended consequences. This report proposes a model to incentivize developers to prevent foreseeable algorithmic harms. It does this by expanding neg…
▽ More
The ethical imperative for technology should be first, do no harm. But digital innovations like AI and social media increasingly enable societal harms, from bias to misinformation. As these technologies grow ubiquitous, we need solutions to address unintended consequences. This report proposes a model to incentivize developers to prevent foreseeable algorithmic harms. It does this by expanding negligence and product liability laws. Digital product developers would be incentivized to mitigate potential algorithmic risks before deployment to protect themselves and investors. Standards and penalties would be set proportional to harm. Insurers would require harm mitigation during development in order to obtain coverage. This shifts tech ethics from move fast and break things to first, do no harm. Details would need careful refinement between stakeholders to enact reasonable guardrails without stifling innovation. Policy and harm prevention frameworks would likely evolve over time. Similar accountability schemes have helped address workplace, environmental, and product safety. Introducing algorithmic harm negligence liability would acknowledge the real societal costs of unethical tech. The timing is right for reform. This proposal provides a model to steer the digital revolution toward human rights and dignity. Harm prevention must be prioritized over reckless growth. Vigorous liability policies are essential to stop technologists from breaking things
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization
Authors:
Alexandra Chronopoulou,
Jonas Pfeiffer,
Joshua Maynez,
Xinyi Wang,
Sebastian Ruder,
Priyanka Agrawal
Abstract:
Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task spec…
▽ More
Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there are 7000 languages in the world and many of these languages lack labeled data for real-world language generation tasks. In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task specialized parameters. Our method composes language and task PEFT modules via element-wise arithmetic operations to leverage unlabeled data and English labeled data. We extend our approach to cases where labeled data from more languages is available and propose to arithmetically compose PEFT modules trained on languages related to the target. Empirical results on summarization demonstrate that our method is an effective strategy that obtains consistent gains using minimal training of PEFT modules.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Authors:
Benjamin Minixhofer,
Jonas Pfeiffer,
Ivan Vulić
Abstract:
Many NLP pipelines split text into sentences as one of the crucial preprocessing steps. Prior sentence segmentation tools either rely on punctuation or require a considerable amount of sentence-segmented training data: both central assumptions might fail when porting sentence segmenters to diverse languages on a massive scale. In this work, we thus introduce a multilingual punctuation-agnostic sen…
▽ More
Many NLP pipelines split text into sentences as one of the crucial preprocessing steps. Prior sentence segmentation tools either rely on punctuation or require a considerable amount of sentence-segmented training data: both central assumptions might fail when porting sentence segmenters to diverse languages on a massive scale. In this work, we thus introduce a multilingual punctuation-agnostic sentence segmentation method, currently covering 85 languages, trained in a self-supervised fashion on unsegmented text, by making use of newline characters which implicitly perform segmentation into paragraphs. We further propose an approach that adapts our method to the segmentation in a given corpus by using only a small number (64-256) of sentence-segmented examples. The main results indicate that our method outperforms all the prior best sentence-segmentation tools by an average of 6.1% F1 points. Furthermore, we demonstrate that proper sentence segmentation has a point: the use of a (powerful) sentence segmenter makes a considerable difference for a downstream application such as machine translation (MT). By using our method to match sentence segmentation to the segmentation used during training of MT models, we achieve an average improvement of 2.3 BLEU points over the best prior segmentation tool, as well as massive gains over a trivial segmenter that splits text into equally sized blocks.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations
Authors:
Jonas Pfeiffer,
Francesco Piccinno,
Massimo Nicosia,
Xinyi Wang,
Machel Reid,
Sebastian Ruder
Abstract:
Multilingual sequence-to-sequence models perform poorly with increased language coverage and fail to consistently generate text in the correct target language in few-shot settings. To address these challenges, we propose mmT5, a modular multilingual sequence-to-sequence model. mmT5 utilizes language-specific modules during pre-training, which disentangle language-specific information from language…
▽ More
Multilingual sequence-to-sequence models perform poorly with increased language coverage and fail to consistently generate text in the correct target language in few-shot settings. To address these challenges, we propose mmT5, a modular multilingual sequence-to-sequence model. mmT5 utilizes language-specific modules during pre-training, which disentangle language-specific information from language-agnostic information. We identify representation drift during fine-tuning as a key limitation of modular generative models and develop strategies that enable effective zero-shot transfer. Our model outperforms mT5 at the same parameter sizes by a large margin on representative natural language understanding and generation tasks in 40+ languages. Compared to mT5, mmT5 raises the rate of generating text in the correct language under zero-shot settings from 7% to 99%, thereby greatly alleviating the source language hallucination problem.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models
Authors:
Benjamin Minixhofer,
Jonas Pfeiffer,
Ivan Vulić
Abstract:
While many languages possess processes of joining two or more words to create compound words, previous studies have been typically limited only to languages with excessively productive compound formation (e.g., German, Dutch) and there is no public dataset containing compound and non-compound words across a large number of languages. In this work, we systematically study decompounding, the task of…
▽ More
While many languages possess processes of joining two or more words to create compound words, previous studies have been typically limited only to languages with excessively productive compound formation (e.g., German, Dutch) and there is no public dataset containing compound and non-compound words across a large number of languages. In this work, we systematically study decompounding, the task of splitting compound words into their constituents, at a wide scale. We first address the data gap by introducing a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary. We then use this dataset to evaluate an array of Large Language Models (LLMs) on the decompounding task. We find that LLMs perform poorly, especially on words which are tokenized unfavorably by subword tokenization. We thus introduce a novel methodology to train dedicated models for decompounding. The proposed two-stage procedure relies on a fully self-supervised objective in the first stage, while the second, supervised learning stage optionally fine-tunes the model on the annotated Wiktionary data. Our self-supervised models outperform the prior best unsupervised decompounding models by 13.9% accuracy on average. Our fine-tuned models outperform all prior (language-specific) decompounding tools. Furthermore, we use our models to leverage decompounding during the creation of a subword tokenizer, which we refer to as CompoundPiece. CompoundPiece tokenizes compound words more favorably on average, leading to improved performance on decompounding over an otherwise equivalent model using SentencePiece tokenization.
△ Less
Submitted 23 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Romanization-based Large-scale Adaptation of Multilingual Language Models
Authors:
Sukannya Purkayastha,
Sebastian Ruder,
Jonas Pfeiffer,
Iryna Gurevych,
Ivan Vulić
Abstract:
Large multilingual pretrained language models (mPLMs) have become the de facto state of the art for cross-lingual transfer in NLP. However, their large-scale deployment to many languages, besides pretraining data scarcity, is also hindered by the increase in vocabulary size and limitations in their parameter budget. In order to boost the capacity of mPLMs to deal with low-resource and unseen langu…
▽ More
Large multilingual pretrained language models (mPLMs) have become the de facto state of the art for cross-lingual transfer in NLP. However, their large-scale deployment to many languages, besides pretraining data scarcity, is also hindered by the increase in vocabulary size and limitations in their parameter budget. In order to boost the capacity of mPLMs to deal with low-resource and unseen languages, we explore the potential of leveraging transliteration on a massive scale. In particular, we explore the UROMAN transliteration tool, which provides map**s from UTF-8 to Latin characters for all the writing systems, enabling inexpensive romanization for virtually any language. We first focus on establishing how UROMAN compares against other language-specific and manually curated transliterators for adapting multilingual PLMs. We then study and compare a plethora of data- and parameter-efficient strategies for adapting the mPLMs to romanized and non-romanized corpora of 14 diverse low-resource languages. Our results reveal that UROMAN-based transliteration can offer strong performance for many languages, with particular gains achieved in the most challenging setups: on languages with unseen scripts and with limited training data without any vocabulary augmentation. Further analyses reveal that an improved tokenizer based on romanized data can even outperform non-transliteration-based methods in the majority of languages.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Modular Deep Learning
Authors:
Jonas Pfeiffer,
Sebastian Ruder,
Ivan Vulić,
Edoardo Maria Ponti
Abstract:
Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference and that generalise systematically to non-identically distributed tasks. Modul…
▽ More
Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference and that generalise systematically to non-identically distributed tasks. Modular deep learning has emerged as a promising solution to these challenges. In this framework, units of computation are often implemented as autonomous parameter-efficient modules. Information is conditionally routed to a subset of modules and subsequently aggregated. These properties enable positive transfer and systematic generalisation by separating computation from routing and updating modules locally. We offer a survey of modular architectures, providing a unified view over several threads of research that evolved independently in the scientific literature. Moreover, we explore various additional purposes of modularity, including scaling language models, causal inference, programme induction, and planning in reinforcement learning. Finally, we report various concrete applications where modularity has been successfully deployed such as cross-lingual and cross-modal knowledge transfer. Related talks and projects to this survey, are available at https://www.modulardeeplearning.com/.
△ Less
Submitted 27 January, 2024; v1 submitted 22 February, 2023;
originally announced February 2023.
-
FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing
Authors:
Chen Cecilia Liu,
Jonas Pfeiffer,
Ivan Vulić,
Iryna Gurevych
Abstract:
Standard fine-tuning of language models typically performs well on in-distribution data, but suffers with generalization to distribution shifts. In this work, we aim to improve the generalization of adapter-based cross-lingual task transfer where such cross-language distribution shifts are imminent. We investigate scheduled unfreezing algorithms -- originally proposed to mitigate catastrophic forg…
▽ More
Standard fine-tuning of language models typically performs well on in-distribution data, but suffers with generalization to distribution shifts. In this work, we aim to improve the generalization of adapter-based cross-lingual task transfer where such cross-language distribution shifts are imminent. We investigate scheduled unfreezing algorithms -- originally proposed to mitigate catastrophic forgetting in transfer learning -- for fine-tuning task adapters. Our experiments show that scheduled unfreezing methods close the gap to full fine-tuning and achieve stronger cross-lingual transfer performance, suggesting that these methods can go beyond just mitigating catastrophic forgetting. Next, aiming to understand these empirical findings, we investigate the learning dynamics of scheduled unfreezing using Fisher Information. Our experiments reveal that scheduled unfreezing induces different learning dynamics compared to standard fine-tuning, and provide evidence that the dynamics of Fisher Information during training correlate with cross-lingual generalization performance. We additionally propose a general scheduled unfreezing algorithm that achieves an average of 2 points improvement over four datasets compared to standard fine-tuning and provides empirical evidence for a theory-based justification of the heuristic unfreezing schedule for adapter training.
△ Less
Submitted 4 April, 2024; v1 submitted 13 January, 2023;
originally announced January 2023.
-
One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks
Authors:
Gregor Geigle,
Chen Cecilia Liu,
Jonas Pfeiffer,
Iryna Gurevych
Abstract:
Current multimodal models, aimed at solving Vision and Language (V+L) tasks, predominantly repurpose Vision Encoders (VE) as feature extractors. While many VEs -- of different architectures, trained on different data and objectives -- are publicly available, they are not designed for the downstream V+L tasks. Nonetheless, most current work assumes that a \textit{single} pre-trained VE can serve as…
▽ More
Current multimodal models, aimed at solving Vision and Language (V+L) tasks, predominantly repurpose Vision Encoders (VE) as feature extractors. While many VEs -- of different architectures, trained on different data and objectives -- are publicly available, they are not designed for the downstream V+L tasks. Nonetheless, most current work assumes that a \textit{single} pre-trained VE can serve as a general-purpose encoder. In this work, we focus on analysis and aim to understand whether the information stored within different VEs is complementary, i.e. if providing the model with features from multiple VEs can improve the performance on a target task, and how they are combined. We exhaustively experiment with three popular VEs on six downstream V+L tasks and analyze the attention and VE-dropout patterns. Our analyses suggest that diverse VEs complement each other, resulting in improved downstream V+L task performance, where the improvements are not due to simple ensemble effects (i.e. the performance does not always improve when increasing the number of encoders). We demonstrate that future VEs, which are not \textit{repurposed}, but explicitly \textit{designed} for V+L tasks, have the potential of improving performance on the target V+L tasks.
△ Less
Submitted 8 June, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Authors:
Jonas Pfeiffer,
Naman Goyal,
Xi Victoria Lin,
Xian Li,
James Cross,
Sebastian Riedel,
Mikel Artetxe
Abstract:
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while kee** the total number of trainable parameters per language constant. In contrast with prior work that learn…
▽ More
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while kee** the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-Mod) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
UKP-SQUARE: An Online Platform for Question Answering Research
Authors:
Tim Baumgärtner,
Kexin Wang,
Rachneet Sachdeva,
Max Eichler,
Gregor Geigle,
Clifton Poth,
Hannah Sterz,
Haritz Puerto,
Leonardo F. R. Ribeiro,
Jonas Pfeiffer,
Nils Reimers,
Gözde Gül Şahin,
Iryna Gurevych
Abstract:
Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e.g., extractive, abstractive), require different model architectures (e.g., generative, discriminative), and setups (e.g., with or without retrieval). Despite having a large number of powerful, specialized QA pipelines (which we refer to as Skills) that cons…
▽ More
Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e.g., extractive, abstractive), require different model architectures (e.g., generative, discriminative), and setups (e.g., with or without retrieval). Despite having a large number of powerful, specialized QA pipelines (which we refer to as Skills) that consider a single domain, model or setup, there exists no framework where users can easily explore and compare such pipelines and can extend them according to their needs. To address this issue, we present UKP-SQUARE, an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests. In addition, QA researchers can develop, manage, and share their custom Skills using our microservices that support a wide range of models (Transformers, Adapters, ONNX), datastores and retrieval techniques (e.g., sparse and dense). UKP-SQUARE is available on https://square.ukp-lab.de.
△ Less
Submitted 28 March, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Delving Deeper into Cross-lingual Visual Question Answering
Authors:
Chen Liu,
Jonas Pfeiffer,
Anna Korhonen,
Ivan Vulić,
Iryna Gurevych
Abstract:
Visual question answering (VQA) is one of the crucial vision-and-language tasks. Yet, existing VQA research has mostly focused on the English language, due to a lack of suitable evaluation resources. Previous work on cross-lingual VQA has reported poor zero-shot transfer performance of current multilingual multimodal Transformers with large gaps to monolingual performance, without any deeper analy…
▽ More
Visual question answering (VQA) is one of the crucial vision-and-language tasks. Yet, existing VQA research has mostly focused on the English language, due to a lack of suitable evaluation resources. Previous work on cross-lingual VQA has reported poor zero-shot transfer performance of current multilingual multimodal Transformers with large gaps to monolingual performance, without any deeper analysis. In this work, we delve deeper into the different aspects of cross-lingual VQA, aiming to understand the impact of 1) modeling methods and choices, including architecture, inductive bias, fine-tuning; 2) learning biases: including question types and modality biases in cross-lingual setups. The key results of our analysis are: 1) We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance, yielding +10 accuracy points over existing methods. 2) We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers, and identify question types that are the most difficult to improve on. 3) We provide an analysis of modality biases present in training data and models, revealing why zero-shot performance gaps remain for certain question types and languages.
△ Less
Submitted 8 June, 2023; v1 submitted 15 February, 2022;
originally announced February 2022.
-
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Authors:
Emanuele Bugliarello,
Fangyu Liu,
Jonas Pfeiffer,
Siva Reddy,
Desmond Elliott,
Edoardo Maria Ponti,
Ivan Vulić
Abstract:
Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existi…
▽ More
Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existing datasets and creating new ones - visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of available unlabelled textual data for pretraining, and only weakly by the typological distance of target-source languages. We hope to encourage future research efforts in this area by releasing the benchmark to the community.
△ Less
Submitted 17 July, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Masked LARk: Masked Learning, Aggregation and Reporting worKflow
Authors:
Joseph J. Pfeiffer III,
Denis Charles,
Davis Gilton,
Young Hun Jung,
Mehul Parsana,
Erik Anderson
Abstract:
Today, many web advertising data flows involve passive cross-site tracking of users. Enabling such a mechanism through the usage of third party tracking cookies (3PC) exposes sensitive user data to a large number of parties, with little oversight on how that data can be used. Thus, most browsers are moving towards removal of 3PC in subsequent browser iterations. In order to substantially improve e…
▽ More
Today, many web advertising data flows involve passive cross-site tracking of users. Enabling such a mechanism through the usage of third party tracking cookies (3PC) exposes sensitive user data to a large number of parties, with little oversight on how that data can be used. Thus, most browsers are moving towards removal of 3PC in subsequent browser iterations. In order to substantially improve end-user privacy while allowing sites to continue to sustain their business through ad funding, new privacy-preserving primitives need to be introduced.
In this paper, we discuss a new proposal, called Masked LARk, for aggregation of user engagement measurement and model training that prevents cross-site tracking, while remaining (a) flexible, for engineering development and maintenance, (b) secure, in the sense that cross-site tracking and tracing are blocked and (c) open for continued model development and training, allowing advertisers to serve relevant ads to interested users. We introduce a secure multi-party compute (MPC) protocol that utilizes "helper" parties to train models, so that once data leaves the browser, no downstream system can individually construct a complete picture of the user activity. For training, our key innovation is through the usage of masking, or the obfuscation of the true labels, while still allowing a gradient to be accurately computed in aggregate over a batch of data. Our protocol only utilizes light cryptography, at such a level that an interested yet inexperienced reader can understand the core algorithm. We develop helper endpoints that implement this system, and give example usage of training in PyTorch.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
xGQA: Cross-Lingual Visual Question Answering
Authors:
Jonas Pfeiffer,
Gregor Geigle,
Aishwarya Kamath,
Jan-Martin O. Steitz,
Stefan Roth,
Ivan Vulić,
Iryna Gurevych
Abstract:
Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling efforts. In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task. We extend the established English GQA dataset to 7 typologically divers…
▽ More
Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling efforts. In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task. We extend the established English GQA dataset to 7 typologically diverse languages, enabling us to detect and explore crucial challenges in cross-lingual visual question answering. We further propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual, and -- vice versa -- multilingual models to become multimodal. Our proposed methods outperform current state-of-the-art multilingual multimodal models (e.g., M3P) in zero-shot cross-lingual settings, but the accuracy remains low across the board; a performance drop of around 38 accuracy points in target languages showcases the difficulty of zero-shot cross-lingual transfer for this task. Our results suggest that simple cross-lingual transfer of multimodal models yields latent multilingual multimodal misalignment, calling for more sophisticated methods for vision and multilingual language modeling.
△ Less
Submitted 17 March, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
TxT: Crossmodal End-to-End Learning with Transformers
Authors:
Jan-Martin O. Steitz,
Jonas Pfeiffer,
Iryna Gurevych,
Stefan Roth
Abstract:
Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, today's multimodal pipelines by and large leverage pre-extracted, fixed features from object detectors, typically Faster R-CNN, as representations of the visual world. The obvious downside is that the visual r…
▽ More
Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, today's multimodal pipelines by and large leverage pre-extracted, fixed features from object detectors, typically Faster R-CNN, as representations of the visual world. The obvious downside is that the visual representation is not specifically tuned to the multimodal task at hand. At the same time, while transformer-based object detectors have gained popularity, they have not been employed in today's multimodal pipelines. We address both shortcomings with TxT, a transformer-based crossmodal pipeline that enables fine-tuning both language and visual components on the downstream task in a fully end-to-end manner. We overcome existing limitations of transformer-based detectors for multimodal reasoning regarding the integration of global context and their scalability. Our transformer-based multimodal model achieves considerable gains from end-to-end learning for multimodal question answering.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation
Authors:
Leonardo F. R. Ribeiro,
Jonas Pfeiffer,
Yue Zhang,
Iryna Gurevych
Abstract:
Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the target task. In this paper, we investigate different techniques for automatically generating AMR annotations, where we aim to study which source of information yiel…
▽ More
Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the target task. In this paper, we investigate different techniques for automatically generating AMR annotations, where we aim to study which source of information yields better multilingual results. Our models trained on gold AMR with silver (machine translated) sentences outperform approaches which leverage generated silver AMR. We find that combining both complementary sources of information further improves multilingual AMR-to-text generation. Our models surpass the previous state of the art for German, Italian, Spanish, and Chinese by a large margin.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
AdapterHub Playground: Simple and Flexible Few-Shot Learning with Adapters
Authors:
Tilman Beck,
Bela Bohlender,
Christina Viehmann,
Vincent Hane,
Yanik Adamson,
Jaber Khuri,
Jonas Brossmann,
Jonas Pfeiffer,
Iryna Gurevych
Abstract:
The open-access dissemination of pretrained language models through online repositories has led to a democratization of state-of-the-art natural language processing (NLP) research. This also allows people outside of NLP to use such models and adapt them to specific use-cases. However, a certain amount of technical proficiency is still required which is an entry barrier for users who want to apply…
▽ More
The open-access dissemination of pretrained language models through online repositories has led to a democratization of state-of-the-art natural language processing (NLP) research. This also allows people outside of NLP to use such models and adapt them to specific use-cases. However, a certain amount of technical proficiency is still required which is an entry barrier for users who want to apply these models to a certain task but lack the necessary knowledge or resources. In this work, we aim to overcome this gap by providing a tool which allows researchers to leverage pretrained models without writing a single line of code. Built upon the parameter-efficient adapter modules for transfer learning, our AdapterHub Playground provides an intuitive interface, allowing the usage of adapters for prediction, training and analysis of textual data for a variety of NLP tasks. We present the tool's architecture and demonstrate its advantages with prototypical use-cases, where we show that predictive performance can easily be increased in a few-shot learning scenario. Finally, we evaluate its usability in a user study. We provide the code and a live interface at https://adapter-hub.github.io/playground.
△ Less
Submitted 19 April, 2022; v1 submitted 18 August, 2021;
originally announced August 2021.
-
What to Pre-Train on? Efficient Intermediate Task Selection
Authors:
Clifton Poth,
Jonas Pfeiffer,
Andreas Rücklé,
Iryna Gurevych
Abstract:
Intermediate task fine-tuning has been shown to culminate in large transfer gains across many NLP tasks. With an abundance of candidate datasets as well as pre-trained language models, it has become infeasible to run the cross-product of all combinations to find the best transfer setting. In this work we first establish that similar sequential fine-tuning gains can be achieved in adapter settings,…
▽ More
Intermediate task fine-tuning has been shown to culminate in large transfer gains across many NLP tasks. With an abundance of candidate datasets as well as pre-trained language models, it has become infeasible to run the cross-product of all combinations to find the best transfer setting. In this work we first establish that similar sequential fine-tuning gains can be achieved in adapter settings, and subsequently consolidate previously proposed methods that efficiently identify beneficial tasks for intermediate transfer learning. We experiment with a diverse set of 42 intermediate and 11 target English classification, multiple choice, question answering, and sequence tagging tasks. Our results show that efficient embedding based methods that rely solely on the respective datasets outperform computational expensive few-shot fine-tuning approaches. Our best methods achieve an average Regret@3 of less than 1% across all target tasks, demonstrating that we are able to efficiently identify the best datasets for intermediate training.
△ Less
Submitted 10 September, 2021; v1 submitted 16 April, 2021;
originally announced April 2021.
-
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Authors:
Gregor Geigle,
Jonas Pfeiffer,
Nils Reimers,
Ivan Vulić,
Iryna Gurevych
Abstract:
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. While offering unmatched retrieval performance, such models: 1) are typically pretrained from scratch and thus less scalable, 2) suffer from huge retrieval latency and ineff…
▽ More
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. While offering unmatched retrieval performance, such models: 1) are typically pretrained from scratch and thus less scalable, 2) suffer from huge retrieval latency and inefficiency issues, which makes them impractical in realistic applications. To address these crucial gaps towards both improved and efficient cross-modal retrieval, we propose a novel fine-tuning framework that turns any pretrained text-image multi-modal model into an efficient retrieval model. The framework is based on a cooperative retrieve-and-rerank approach which combines: 1) twin networks (i.e., a bi-encoder) to separately encode all items of a corpus, enabling efficient initial retrieval, and 2) a cross-encoder component for a more nuanced (i.e., smarter) ranking of the retrieved small set of items. We also propose to jointly fine-tune the two components with shared weights, yielding a more parameter-efficient model. Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
△ Less
Submitted 18 February, 2022; v1 submitted 22 March, 2021;
originally announced March 2021.
-
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Authors:
Phillip Rust,
Jonas Pfeiffer,
Ivan Vulić,
Sebastian Ruder,
Iryna Gurevych
Abstract:
In this work, we provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolingual task performance. We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks. We first aim to establish, v…
▽ More
In this work, we provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolingual task performance. We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks. We first aim to establish, via fair and controlled comparisons, if a gap between the multilingual and the corresponding monolingual representation of that language exists, and subsequently investigate the reason for any performance difference. To disentangle conflating factors, we train new monolingual models on the same data, with monolingually and multilingually trained tokenizers. We find that while the pretraining data size is an important factor, a designated monolingual tokenizer plays an equally important role in the downstream performance. Our results show that languages that are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts. We further find that replacing the original multilingual tokenizer with the specialized monolingual tokenizer improves the downstream performance of the multilingual model for almost every task and language.
△ Less
Submitted 1 June, 2021; v1 submitted 31 December, 2020;
originally announced December 2020.
-
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts
Authors:
Jonas Pfeiffer,
Ivan Vulić,
Iryna Gurevych,
Sebastian Ruder
Abstract:
Massively multilingual language models such as multilingual BERT offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. However, due to limited capacity and large differences in pretraining data sizes, there is a profound performance gap between resource-rich and resource-poor target languages. The ultimate challenge is dealing with under-resourced languages not covered…
▽ More
Massively multilingual language models such as multilingual BERT offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. However, due to limited capacity and large differences in pretraining data sizes, there is a profound performance gap between resource-rich and resource-poor target languages. The ultimate challenge is dealing with under-resourced languages not covered at all by the models and written in scripts unseen during pretraining. In this work, we propose a series of novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts. Relying on matrix factorization, our methods capitalize on the existing latent knowledge about multiple languages already available in the pretrained model's embedding matrix. Furthermore, we show that learning of the new dedicated embedding matrix in the target language can be improved by leveraging a small number of vocabulary items (i.e., the so-called lexically overlap** tokens) shared between mBERT's and target language vocabulary. Our adaptation techniques offer substantial performance gains for languages with unseen scripts. We also demonstrate that they can yield improvements for low-resource languages written in scripts covered by the pretrained model.
△ Less
Submitted 10 September, 2021; v1 submitted 31 December, 2020;
originally announced December 2020.
-
AdapterDrop: On the Efficiency of Adapters in Transformers
Authors:
Andreas Rücklé,
Gregor Geigle,
Max Glockner,
Tilman Beck,
Jonas Pfeiffer,
Nils Reimers,
Iryna Gurevych
Abstract:
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters. In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inf…
▽ More
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters. In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions. We show that AdapterDrop can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances. We further prune adapters from AdapterFusion, which improves the inference efficiency while maintaining the task performances entirely.
△ Less
Submitted 5 October, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
Causal Transfer Random Forest: Combining Logged Data and Randomized Experiments for Robust Prediction
Authors:
Shuxi Zeng,
Murat Ali Bayir,
Joesph J. Pfeiffer III,
Denis Charles,
Emre Kiciman
Abstract:
It is often critical for prediction models to be robust to distributional shifts between training and testing data. From a causal perspective, the challenge is to distinguish the stable causal relationships from the unstable spurious correlations across shifts. We describe a causal transfer random forest (CTRF) that combines existing training data with a small amount of data from a randomized expe…
▽ More
It is often critical for prediction models to be robust to distributional shifts between training and testing data. From a causal perspective, the challenge is to distinguish the stable causal relationships from the unstable spurious correlations across shifts. We describe a causal transfer random forest (CTRF) that combines existing training data with a small amount of data from a randomized experiment to train a model which is robust to the feature shifts and therefore transfers to a new targeting distribution. Theoretically, we justify the robustness of the approach against feature shifts with the knowledge from causal learning. Empirically, we evaluate the CTRF using both synthetic data experiments and real-world experiments in the Bing Ads platform, including a click prediction task and in the context of an end-to-end counterfactual optimization system. The proposed CTRF produces robust predictions and outperforms most baseline methods compared in the presence of feature shifts.
△ Less
Submitted 14 January, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Causal Inference in the Presence of Interference in Sponsored Search Advertising
Authors:
Razieh Nabi,
Joel Pfeiffer,
Murat Ali Bayir,
Denis Charles,
Emre Kıcıman
Abstract:
In classical causal inference, inferring cause-effect relations from data relies on the assumption that units are independent and identically distributed. This assumption is violated in settings where units are related through a network of dependencies. An example of such a setting is ad placement in sponsored search advertising, where the clickability of a particular ad is potentially influenced…
▽ More
In classical causal inference, inferring cause-effect relations from data relies on the assumption that units are independent and identically distributed. This assumption is violated in settings where units are related through a network of dependencies. An example of such a setting is ad placement in sponsored search advertising, where the clickability of a particular ad is potentially influenced by where it is placed and where other ads are placed on the search result page. In such scenarios, confounding arises due to not only the individual ad-level covariates but also the placements and covariates of other ads in the system. In this paper, we leverage the language of causal inference in the presence of interference to model interactions among the ads. Quantification of such interactions allows us to better understand the click behavior of users, which in turn impacts the revenue of the host search engine and enhances user satisfaction. We illustrate the utility of our formalization through experiments carried out on the ad placement system of the Bing search engine.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale
Authors:
Andreas Rücklé,
Jonas Pfeiffer,
Iryna Gurevych
Abstract:
We study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains from community question answering forums in English. We investigate the model performances on nine benchmarks of answer selection and question similarity tasks, and show that all 140 models transfer surprisingly well, where the large majority of models substant…
▽ More
We study the zero-shot transfer capabilities of text matching models on a massive scale, by self-supervised training on 140 source domains from community question answering forums in English. We investigate the model performances on nine benchmarks of answer selection and question similarity tasks, and show that all 140 models transfer surprisingly well, where the large majority of models substantially outperforms common IR baselines. We also demonstrate that considering a broad selection of source domains is crucial for obtaining the best zero-shot transfer performances, which contrasts the standard procedure that merely relies on the largest and most similar domains. In addition, we extensively study how to best combine multiple source domains. We propose to incorporate self-supervised with supervised multi-task learning on all available source domains. Our best zero-shot transfer model considerably outperforms in-domain BERT and the previous state of the art on six benchmarks. Fine-tuning of our model with in-domain data results in additional large gains and achieves the new state of the art on all nine benchmarks.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
AdapterHub: A Framework for Adapting Transformers
Authors:
Jonas Pfeiffer,
Andreas Rücklé,
Clifton Poth,
Aishwarya Kamath,
Ivan Vulić,
Sebastian Ruder,
Kyunghyun Cho,
Iryna Gurevych
Abstract:
The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters -- small learnt bottleneck layers inserted within each laye…
▽ More
The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters -- small learnt bottleneck layers inserted within each layer of a pre-trained model -- ameliorate this issue by avoiding full fine-tuning of the entire model. However, sharing and integrating adapter layers is not straightforward. We propose AdapterHub, a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages. The framework, built on top of the popular HuggingFace Transformers library, enables extremely easy and quick adaptations of state-of-the-art pre-trained models (e.g., BERT, RoBERTa, XLM-R) across tasks and languages. Downloading, sharing, and training adapters is as seamless as possible using minimal changes to the training scripts and a specialized infrastructure. Our framework enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios. AdapterHub includes all recent adapter architectures and can be found at https://AdapterHub.ml.
△ Less
Submitted 6 October, 2020; v1 submitted 15 July, 2020;
originally announced July 2020.
-
The Gauss2++ Model -- A Comparison of Different Measure Change Specifications for a Consistent Risk Neutral and Real World Calibration
Authors:
Christoph Berninger,
Julian Pfeiffer
Abstract:
Especially in the insurance industry interest rate models play a crucial role e.g. to calculate the insurance company's liabilities, performance scenarios or risk measures. A prominant candidate is the 2-Additive-Factor Gaussian Model (Gauss2++) - in a different representation also known as the 2-Factor Hull-White model. In this paper, we propose a framework to estimate the model such that it can…
▽ More
Especially in the insurance industry interest rate models play a crucial role e.g. to calculate the insurance company's liabilities, performance scenarios or risk measures. A prominant candidate is the 2-Additive-Factor Gaussian Model (Gauss2++) - in a different representation also known as the 2-Factor Hull-White model. In this paper, we propose a framework to estimate the model such that it can be applied under the risk neutral and the real world measure in a consistent manner. We first show that any progressive and square-integrable function can be used to specify the change of measure without loosing the analytic tractability of e.g. zero-coupon bond prices in both worlds. We further propose two time dependent candidates, which are easy to calibrate: a step and a linear function. They represent two variants of our framework and distinguish between a short and a long term risk premium, which allows to regularize the interest rates in the long horizon. We apply both variants to historical data and show that they indeed produce realistic and much more stable long term interest rate forecast than the usage of a constant function. This stability over time would translate to performance scenarios of e.g. interest rate sensitive fonds and risk measures.
△ Less
Submitted 14 June, 2020;
originally announced June 2020.
-
Low Resource Multi-Task Sequence Tagging -- Revisiting Dynamic Conditional Random Fields
Authors:
Jonas Pfeiffer,
Edwin Simpson,
Iryna Gurevych
Abstract:
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks. Our analysis is aimed at datasets where each example has labels for multiple tasks. Current approaches use either a separate model for each task or standard multi-task learning to learn shared feature representations. However, these approaches ignore corr…
▽ More
We compare different models for low resource multi-task sequence tagging that leverage dependencies between label sequences for different tasks. Our analysis is aimed at datasets where each example has labels for multiple tasks. Current approaches use either a separate model for each task or standard multi-task learning to learn shared feature representations. However, these approaches ignore correlations between label sequences, which can provide important information in settings with small training datasets. To analyze which scenarios can profit from modeling dependencies between labels in different tasks, we revisit dynamic conditional random fields (CRFs) and combine them with deep neural networks. We compare single-task, multi-task and dynamic CRF setups for three diverse datasets at both sentence and document levels in English and German low resource scenarios. We show that including silver labels from pretrained part-of-speech taggers as auxiliary tasks can improve performance on downstream tasks. We find that especially in low-resource scenarios, the explicit modeling of inter-dependencies between task predictions outperforms single-task as well as standard multi-task models.
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
Authors:
Jonas Pfeiffer,
Aishwarya Kamath,
Andreas Rücklé,
Kyunghyun Cho,
Iryna Gurevych
Abstract:
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. To address these shortcomings, we propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. First, in the knowledge extraction stage we learn task specif…
▽ More
Sequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. To address these shortcomings, we propose AdapterFusion, a new two stage learning algorithm that leverages knowledge from multiple tasks. First, in the knowledge extraction stage we learn task specific parameters called adapters, that encapsulate the task-specific information. We then combine the adapters in a separate knowledge composition step. We show that by separating the two stages, i.e., knowledge extraction and knowledge composition, the classifier can effectively exploit the representations learned from multiple tasks in a non-destructive manner. We empirically evaluate AdapterFusion on 16 diverse NLU tasks, and find that it effectively combines various types of knowledge at different layers of the model. We show that our approach outperforms traditional strategies such as full fine-tuning as well as multi-task learning. Our code and adapters are available at AdapterHub.ml.
△ Less
Submitted 26 January, 2021; v1 submitted 1 May, 2020;
originally announced May 2020.
-
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
Authors:
Jonas Pfeiffer,
Ivan Vulić,
Iryna Gurevych,
Sebastian Ruder
Abstract:
The main goal behind state-of-the-art pre-trained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrap** NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pre-training. We p…
▽ More
The main goal behind state-of-the-art pre-trained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrap** NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pre-training. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations. In addition, we introduce a novel invertible adapter architecture and a strong baseline method for adapting a pre-trained multilingual model to a new language. MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and causal commonsense reasoning, and achieves competitive results on question answering. Our code and adapters are available at AdapterHub.ml
△ Less
Submitted 6 October, 2020; v1 submitted 30 April, 2020;
originally announced May 2020.
-
Fine-Tuned Neural Models for Propaganda Detection at the Sentence and Fragment levels
Authors:
Tariq Alhindi,
Jonas Pfeiffer,
Smaranda Muresan
Abstract:
This paper presents the CUNLP submission for the NLP4IF 2019 shared-task on FineGrained Propaganda Detection. Our system finished 5th out of 26 teams on the sentence-level classification task and 5th out of 11 teams on the fragment-level classification task based on our scores on the blind test set. We present our models, a discussion of our ablation studies and experiments, and an analysis of our…
▽ More
This paper presents the CUNLP submission for the NLP4IF 2019 shared-task on FineGrained Propaganda Detection. Our system finished 5th out of 26 teams on the sentence-level classification task and 5th out of 11 teams on the fragment-level classification task based on our scores on the blind test set. We present our models, a discussion of our ablation studies and experiments, and an analysis of our performance on all eighteen propaganda techniques present in the corpus of the shared task.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
What do Deep Networks Like to Read?
Authors:
Jonas Pfeiffer,
Aishwarya Kamath,
Iryna Gurevych,
Sebastian Ruder
Abstract:
Recent research towards understanding neural networks probes models in a top-down manner, but is only able to identify model tendencies that are known a priori. We propose Susceptibility Identification through Fine-Tuning (SIFT), a novel abstractive method that uncovers a model's preferences without imposing any prior. By fine-tuning an autoencoder with the gradients from a fixed classifier, we ar…
▽ More
Recent research towards understanding neural networks probes models in a top-down manner, but is only able to identify model tendencies that are known a priori. We propose Susceptibility Identification through Fine-Tuning (SIFT), a novel abstractive method that uncovers a model's preferences without imposing any prior. By fine-tuning an autoencoder with the gradients from a fixed classifier, we are able to extract propensities that characterize different kinds of classifiers in a bottom-up manner. We further leverage the SIFT architecture to rephrase sentences in order to predict the opposing class of the ground truth label, uncovering potential artifacts encoded in the fixed classification model. We evaluate our method on three diverse tasks with four different models. We contrast the propensities of the models as well as reproduce artifacts reported in the literature.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning
Authors:
Jonas Pfeiffer,
Christian M. Meyer,
Claudia Schulz,
Jan Kiesewetter,
Jan Zottmann,
Michael Sailer,
Elisabeth Bauer,
Frank Fischer,
Martin R. Fischer,
Iryna Gurevych
Abstract:
Our proposed system FAMULUS helps students learn to diagnose based on automatic feedback in virtual patient simulations, and it supports instructors in labeling training data.
Diagnosing is an exceptionally difficult skill to obtain but vital for many different professions (e.g., medical doctors, teachers).
Previous case simulation systems are limited to multiple-choice questions and thus cann…
▽ More
Our proposed system FAMULUS helps students learn to diagnose based on automatic feedback in virtual patient simulations, and it supports instructors in labeling training data.
Diagnosing is an exceptionally difficult skill to obtain but vital for many different professions (e.g., medical doctors, teachers).
Previous case simulation systems are limited to multiple-choice questions and thus cannot give constructive individualized feedback on a student's diagnostic reasoning process.
Given initially only limited data, we leverage a (replaceable) NLP model to both support experts in their further data annotation with automatic suggestions, and we provide automatic feedback for students.
We argue that because the central model consistently improves, our interactive approach encourages both students and instructors to recurrently use the tool, and thus accelerate the speed of data creation and annotation.
We show results from two user studies on diagnostic reasoning in medicine and teacher education and outline how our system can be extended to further use cases.
△ Less
Submitted 29 August, 2019;
originally announced August 2019.
-
Unbiased Estimation of the Value of an Optimized Policy
Authors:
Elon Portugaly,
Joseph J. Pfeiffer III
Abstract:
Randomized trials, also known as A/B tests, are used to select between two policies: a control and a treatment. Given a corresponding set of features, we can ideally learn an optimized policy P that maps the A/B test data features to action space and optimizes reward. However, although A/B testing provides an unbiased estimator for the value of deploying B (i.e., switching from policy A to B), dir…
▽ More
Randomized trials, also known as A/B tests, are used to select between two policies: a control and a treatment. Given a corresponding set of features, we can ideally learn an optimized policy P that maps the A/B test data features to action space and optimizes reward. However, although A/B testing provides an unbiased estimator for the value of deploying B (i.e., switching from policy A to B), direct application of those samples to learn the the optimized policy P generally does not provide an unbiased estimator of the value of P as the samples were observed when constructing P. In situations where the cost and risks associated of deploying a policy are high, such an unbiased estimator is highly desirable.
We present a procedure for learning optimized policies and getting unbiased estimates for the value of deploying them. We wrap any policy learning procedure with a bagging process and obtain out-of-bag policy inclusion decisions for each sample. We then prove that inverse-propensity-weighting effect estimator is unbiased when applied to the optimized subset. Likewise, we apply the same idea to obtain out-of-bag unbiased per-sample value estimate of the measurement that is independent of the randomized treatment, and use these estimates to build an unbiased doubly-robust effect estimator. Lastly, we empirically shown that even when the average treatment effect is negative we can find a positive optimized policy.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
Modeling and Simultaneously Removing Bias via Adversarial Neural Networks
Authors:
John Moore,
Joel Pfeiffer,
Kai Wei,
Rishabh Iyer,
Denis Charles,
Ran Gilad-Bachrach,
Levi Boyles,
Eren Manavoglu
Abstract:
In real world systems, the predictions of deployed Machine Learned models affect the training data available to build subsequent models. This introduces a bias in the training data that needs to be addressed. Existing solutions to this problem attempt to resolve the problem by either casting this in the reinforcement learning framework or by quantifying the bias and re-weighting the loss functions…
▽ More
In real world systems, the predictions of deployed Machine Learned models affect the training data available to build subsequent models. This introduces a bias in the training data that needs to be addressed. Existing solutions to this problem attempt to resolve the problem by either casting this in the reinforcement learning framework or by quantifying the bias and re-weighting the loss functions. In this work, we develop a novel Adversarial Neural Network (ANN) model, an alternative approach which creates a representation of the data that is invariant to the bias. We take the Paid Search auction as our working example and ad display position features as the confounding features for this setting. We show the success of this approach empirically on both synthetic data as well as real world paid search auction data from a major search engine.
△ Less
Submitted 18 April, 2018;
originally announced April 2018.
-
Area Rate Evaluation based on Spatial Clustering of massive MIMO Channel Measurements
Authors:
Maximilian Arnold,
Johannes Pfeiffer,
Stephan ten Brink
Abstract:
Channel models for massive MIMO are typically based on matrices with complex Gaussian entries, extended by the Kronecker and Weichselberger model. One reason for observing a gap between modeled and actual channel behavior is the absence of spatial consistency in many such models, that is, spatial correlations over an area in the x, y-dimensions are not accounted for, making it difficult to study,…
▽ More
Channel models for massive MIMO are typically based on matrices with complex Gaussian entries, extended by the Kronecker and Weichselberger model. One reason for observing a gap between modeled and actual channel behavior is the absence of spatial consistency in many such models, that is, spatial correlations over an area in the x, y-dimensions are not accounted for, making it difficult to study, e.g., area-throughput measures. In this paper, we propose an algorithm that can distinguish between regions of non-line-of-sight (NLoS) and line-of-sight (LoS) via a rank-metric criterion combined with a spiral search. With a k-means clustering algorithm a throughput per region (i.e., cluster) can be calculated, leading to what we refer to as "area-throughput". For evaluating the proposed orthogonality clustering scheme we use a simple filtered MIMO channel model which is spatially consistent, with known degrees of freedom. Moreover, we employ actual (spatially consistent) area channel measurements based on spatial sampling using a spider antenna and show that the proposed algorithm can be used to estimate the degrees of freedom, and, subsequently, the number of users that maximizes the throughput per square meter.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
Evidence of a two-stage melting of Wigner solids in two dimensions
Authors:
Jian Huang,
Talbot Knighton,
Zhe Wu,
Alessandro Serafin,
J. S. Xia L. N. Pfeiffer,
K. W. West
Abstract:
Two-dimensional (2D) solid-liquid transition (SLT)~\cite{Mermin1966Absence,Mermin1968Crystalline,Kosterlitz1972Long} concerns fundamental concepts of long-range correlations vital to magnetism, superconductivity, superfluidity, and topological matters. A long sought-after example is the melting of a Wigner Crystal (WC)~\cite{Wigner1934Interaction} of electrons. Detection efforts have targeted dist…
▽ More
Two-dimensional (2D) solid-liquid transition (SLT)~\cite{Mermin1966Absence,Mermin1968Crystalline,Kosterlitz1972Long} concerns fundamental concepts of long-range correlations vital to magnetism, superconductivity, superfluidity, and topological matters. A long sought-after example is the melting of a Wigner Crystal (WC)~\cite{Wigner1934Interaction} of electrons. Detection efforts have targeted distinctive collective modes such as pinning by disorder, resonant-frequency absorption of Goldstone modes, and melting transition. However, only one-step second-order melting of softly-pinned modes was reported. Without the evidence of genuine pinning as exhibited in the charge density waves (CDWs)~\cite{PinningCDW}, these modes are likely intermediate phases which are only part of a complete SLT. To verify if there is a WC-intermediate phase transition will not only provide a solid proof of a WC, but will also unveil the nature of the SLT in relation to the two-stage Kosterlitz-Thouless (KT) model~\cite{Kosterlitz1972Long,Kosterlitz1973Ordering,Halperin1978Theory,Nelson1979Dislocation,hexatic_ceperley,hexatic_nelson, Young1979Melting}. %(or dislocation pairs) above a critical temperature $T_c$. Through transport studies of ultra-dilute high-purity 2D systems, this work presents evidence for not only a WC, but also for a two-stage WC-liquid SLT mediated by a first-order WC-intermediate phase transition.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
Sums of Squares on the Hypercube
Authors:
Grigoriy Blekherman,
João Gouveia,
James Pfeiffer
Abstract:
Let X be a finite set of points in R^n. A polynomial p nonnegative on X can be written as a sum of squares of rational functions modulo the vanishing ideal I(X). From the point of view of applications, such as polynomial optimization, we are interested in rational function representations of small degree. We derive a general upper bound in terms of the Hilbert function of X, and we show that this…
▽ More
Let X be a finite set of points in R^n. A polynomial p nonnegative on X can be written as a sum of squares of rational functions modulo the vanishing ideal I(X). From the point of view of applications, such as polynomial optimization, we are interested in rational function representations of small degree. We derive a general upper bound in terms of the Hilbert function of X, and we show that this upper bound is tight for the case of quadratic functions on the hypercube C={0,1}^n, a very well studied case in combinatorial optimization. Using the lower bounds for C we construct a family of globally nonnegative quartic polynomials, which are not sums of squares of rational functions of small degree. To our knowledge this is the first construction for Hilbert's 17th problem of a family of polynomials of bounded degree which need increasing degrees in rational function representations as the number of variables n goes to infinity. We note that representation theory of the symmetric group S_n play a crucial role in our proofs of the lower bounds.
△ Less
Submitted 17 February, 2014;
originally announced February 2014.
-
The representation theory of generalized hyperoctahedral groups
Authors:
William McGovern,
James Pfeiffer
Abstract:
We give an explicit decomposition of $\hbox{Ind}(1)_{B_n}^{S_{2n}}$, following Barbasch and Vogan [1]. We define two natural generalizations of $B_n$, and extend the proof in [1] to recursively compute these decompositions. Although the decompositions do not appear to follow a simple pattern, we prove enough of their structure to show that they are almost never multiplicity-free.
We give an explicit decomposition of $\hbox{Ind}(1)_{B_n}^{S_{2n}}$, following Barbasch and Vogan [1]. We define two natural generalizations of $B_n$, and extend the proof in [1] to recursively compute these decompositions. Although the decompositions do not appear to follow a simple pattern, we prove enough of their structure to show that they are almost never multiplicity-free.
△ Less
Submitted 30 January, 2014;
originally announced January 2014.
-
A Semidefinite Approach to the $K_i$ Cover Problem
Authors:
João Gouveia,
James Pfeiffer
Abstract:
We apply theta body relaxations to the $K_i$-cover problem and show polynomial time solvability for certain classes of graphs. In particular, we give an effective relaxation where all $K_i$-$p$-hole facets are valid, and study its relation to an open question of Conforti et al. For the triangle free problem, we show for $K_n$ that the theta body relaxations do not converge by $(n-2)/4$ steps; we a…
▽ More
We apply theta body relaxations to the $K_i$-cover problem and show polynomial time solvability for certain classes of graphs. In particular, we give an effective relaxation where all $K_i$-$p$-hole facets are valid, and study its relation to an open question of Conforti et al. For the triangle free problem, we show for $K_n$ that the theta body relaxations do not converge by $(n-2)/4$ steps; we also prove for all $G$ an integrality gap of 2 for the second theta body.
△ Less
Submitted 1 February, 2014; v1 submitted 31 October, 2012;
originally announced November 2012.
-
High-Resolution Imaging and Optical Control of Bose-Einstein Condensates in an Atom Chip Magnetic Trap
Authors:
Evan A. Salim,
Seth C. Caliga,
Jonathan B. Pfeiffer,
Dana Z. Anderson
Abstract:
A high-resolution projection and imaging system for ultracold atoms is implemented using a compound silicon and glass atom chip. The atom chip is metalized to enable magnetic trap** while glass regions enable high numerical aperture optical access to atoms residing in the magnetic trap about 100 microns below the chip surface. The atom chip serves as a wall of the vacuum system, which enables th…
▽ More
A high-resolution projection and imaging system for ultracold atoms is implemented using a compound silicon and glass atom chip. The atom chip is metalized to enable magnetic trap** while glass regions enable high numerical aperture optical access to atoms residing in the magnetic trap about 100 microns below the chip surface. The atom chip serves as a wall of the vacuum system, which enables the use of commercial microscope components for projection and imaging. Holographically generated light patterns are used to optically slice a cigar-shaped magnetic trap into separate regions; this has been used to simultaneously generate up to four Bose-condensates. Using fluorescence techniques we have demonstrated in-trap imaging resolution down to 2.5 microns
△ Less
Submitted 23 August, 2012;
originally announced August 2012.
-
Bootstrap percolation on the Hamming torus
Authors:
Janko Gravner,
Christopher Hoffman,
James Pfeiffer,
David Sivakoff
Abstract:
The Hamming torus of dimension $d$ is the graph with vertices $\{1,\dots,n\}^d$ and an edge between any two vertices that differ in a single coordinate. Bootstrap percolation with threshold $θ$ starts with a random set of open vertices, to which every vertex belongs independently with probability $p$, and at each time step the open set grows by adjoining every vertex with at least $θ$ open neighbo…
▽ More
The Hamming torus of dimension $d$ is the graph with vertices $\{1,\dots,n\}^d$ and an edge between any two vertices that differ in a single coordinate. Bootstrap percolation with threshold $θ$ starts with a random set of open vertices, to which every vertex belongs independently with probability $p$, and at each time step the open set grows by adjoining every vertex with at least $θ$ open neighbors. We assume that $n$ is large and that $p$ scales as $n^{-α}$ for some $α>1$, and study the probability that an $i$-dimensional subgraph ever becomes open. For large $θ$, we prove that the critical exponent $α$ is about $1+d/θ$ for $i=1$, and about $1+2/θ+Θ(θ^{-3/2})$ for $i\ge2$. Our small $θ$ results are mostly limited to $d=3$, where we identify the critical $α$ in many cases and, when $θ=3$, compute exactly the critical probability that the entire graph is eventually open.
△ Less
Submitted 23 January, 2015; v1 submitted 23 February, 2012;
originally announced February 2012.
-
Fast Generation of Large Scale Social Networks with Clustering
Authors:
Joseph J. Pfeiffer III,
Timothy La Fond,
Sebastian Moreno,
Jennifer Neville
Abstract:
A key challenge within the social network literature is the problem of network generation - that is, how can we create synthetic networks that match characteristics traditionally found in most real world networks? Important characteristics that are present in social networks include a power law degree distribution, small diameter and large amounts of clustering; however, most current network gener…
▽ More
A key challenge within the social network literature is the problem of network generation - that is, how can we create synthetic networks that match characteristics traditionally found in most real world networks? Important characteristics that are present in social networks include a power law degree distribution, small diameter and large amounts of clustering; however, most current network generators, such as the Chung Lu and Kronecker models, largely ignore the clustering present in a graph and choose to focus on preserving other network statistics, such as the power law distribution. Models such as the exponential random graph model have a transitivity parameter, but are computationally difficult to learn, making scaling to large real world networks intractable. In this work, we propose an extension to the Chung Lu ran- dom graph model, the Transitive Chung Lu (TCL) model, which incorporates the notion of a random transitive edge. That is, with some probability it will choose to connect to a node exactly two hops away, having been introduced to a 'friend of a friend'. In all other cases it will follow the standard Chung Lu model, selecting a 'random surfer' from anywhere in the graph according to the given invariant distribution. We prove TCL's expected degree distribution is equal to the degree distribution of the original graph, while being able to capture the clustering present in the network. The single parameter required by our model can be learned in seconds on graphs with millions of edges, while networks can be generated in time that is linear in the number of edges. We demonstrate the performance TCL on four real- world social networks, including an email dataset with hundreds of thousands of nodes and millions of edges, showing TCL generates graphs that match the degree distribution, clustering coefficients and hop plots of the original networks.
△ Less
Submitted 21 February, 2012;
originally announced February 2012.
-
Methods to Determine Node Centrality and Clustering in Graphs with Uncertain Structure
Authors:
Joseph J. Pfeiffer III,
Jennifer Neville
Abstract:
Much of the past work in network analysis has focused on analyzing discrete graphs, where binary edges represent the "presence" or "absence" of a relationship. Since traditional network measures (e.g., betweenness centrality) utilize a discrete link structure, complex systems must be transformed to this representation in order to investigate network properties. However, in many domains there may b…
▽ More
Much of the past work in network analysis has focused on analyzing discrete graphs, where binary edges represent the "presence" or "absence" of a relationship. Since traditional network measures (e.g., betweenness centrality) utilize a discrete link structure, complex systems must be transformed to this representation in order to investigate network properties. However, in many domains there may be uncertainty about the relationship structure and any uncertainty information would be lost in translation to a discrete representation. Uncertainty may arise in domains where there is moderating link information that cannot be easily observed, i.e., links become inactive over time but may not be dropped or observed links may not always corresponds to a valid relationship. In order to represent and reason with these types of uncertainty, we move beyond the discrete graph framework and develop social network measures based on a probabilistic graph representation. More specifically, we develop measures of path length, betweenness centrality, and clustering coefficient---one set based on sampling and one based on probabilistic paths. We evaluate our methods on three real-world networks from Enron, Facebook, and DBLP, showing that our proposed methods more accurately capture salient effects without being susceptible to local noise, and that the resulting analysis produces a better understanding of the graph structure and the uncertainty resulting from its change over time.
△ Less
Submitted 2 April, 2011;
originally announced April 2011.
-
Critical current diffraction pattern of SIFS Josephson junctions with step-like F-layer
Authors:
M. Weides,
U. Peralagu,
H. Kohlstedt,
J. Pfeiffer,
M. Kemmler,
C. Gürlich,
E. Goldobin,
D. Koelle,
R. Kleiner
Abstract:
We present the latest generation of superconductor-insulator-ferromagnet-superconductor Josephson tunnel junctions with a step-like thickness of the ferromagnetic (F) layer. The F-layer thicknesses $d_1$ and $d_2$ in both halves were varied to obtain different combinations of positive and negative critical current densities $j_{c,1}$ and $j_{c,2}$. The measured dependences of the critical current…
▽ More
We present the latest generation of superconductor-insulator-ferromagnet-superconductor Josephson tunnel junctions with a step-like thickness of the ferromagnetic (F) layer. The F-layer thicknesses $d_1$ and $d_2$ in both halves were varied to obtain different combinations of positive and negative critical current densities $j_{c,1}$ and $j_{c,2}$. The measured dependences of the critical current on applied magnetic field can be well described by a model which takes into account different critical current densities (obtained from reference junctions) and different net magnetization of the multidomain ferromagnetic layer in both halves.
△ Less
Submitted 15 July, 2010; v1 submitted 26 June, 2010;
originally announced June 2010.
-
Escape Rate Measurements and Microwave Spectroscopy of 0, pi, and 0-pi ferromagnetic Josephson Tunnel Junctions
Authors:
J. Pfeiffer,
T. Gaber,
D. Koelle,
R. Kleiner,
E. Goldobin,
M. Weides,
H. Kohlstedt,
J. Lisenfeld,
A. K. Feofanov,
A. V. Ustinov
Abstract:
We present experimental studies of high quality underdamped 0, pi, and 0-pi ferromagnetic Josephson tunnel junctions of intermediate length L (lambda_J < L < 5 lambda_J, where lambda_J is the Josephson penetration depth). The junctions are fabricated as Nb/Al_2O_3/Cu_40Ni_60/Nb Superconductor-Insulator-Ferromagnet-Superconductor heterostructures. Using microwave spectroscopy, we have investigate…
▽ More
We present experimental studies of high quality underdamped 0, pi, and 0-pi ferromagnetic Josephson tunnel junctions of intermediate length L (lambda_J < L < 5 lambda_J, where lambda_J is the Josephson penetration depth). The junctions are fabricated as Nb/Al_2O_3/Cu_40Ni_60/Nb Superconductor-Insulator-Ferromagnet-Superconductor heterostructures. Using microwave spectroscopy, we have investigated the eigenfrequencies of 0, pi, and 0-pi Josephson junctions in the temperature range 1.9K...320mK. Harmonic, subharmonic and superharmonic pum** is observed in experiment, and the experimental data are compared with numerical simulations. Escape rate measurements without applied microwaves at temperatures T down to 20mK show that the width of the switching current histogram decreases with temperature and saturates below T=150mK. We analyze our data in the framework of the short junction model. The differences between experimental data and theoretical predictions are discussed.
△ Less
Submitted 5 March, 2009;
originally announced March 2009.