-
Contrastive learning of T cell receptor representations
Authors:
Yuta Nagano,
Andrew Pyo,
Martina Milighetti,
James Henderson,
John Shawe-Taylor,
Benny Chain,
Andreas Tiffeau-Mayer
Abstract:
Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein lan…
▽ More
Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here we introduce a TCR language model called SCEPTR (Simple Contrastive Embedding of the Primary sequence of T cell Receptors), capable of data-efficient transfer learning. Through our model, we introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Automated Quantum Circuit Generation for Computing Inverse Hash Functions
Authors:
Elena R. Henderson,
Jessie M. Henderson,
William V. Oxford,
Mitchell A. Thornton
Abstract:
Several cryptographic systems depend upon the computational difficulty of reversing cryptographic hash functions. Robust hash functions transform inputs to outputs in such a way that the inputs cannot be later retrieved in a reasonable amount of time even if the outputs and the function that created them are known. Consequently, hash functions can be cryptographically secure, and they are employed…
▽ More
Several cryptographic systems depend upon the computational difficulty of reversing cryptographic hash functions. Robust hash functions transform inputs to outputs in such a way that the inputs cannot be later retrieved in a reasonable amount of time even if the outputs and the function that created them are known. Consequently, hash functions can be cryptographically secure, and they are employed in encryption, authentication, and other security methods. It has been suggested that such cryptographically-secure hash functions will play a critical role in the era of post-quantum cryptography (PQC), as they do in conventional systems. In this work, we introduce a procedure that leverages the principle of reversibility to generate circuits that invert hash functions. We provide a proof-of-concept implementation and describe methods that allow for scaling the hash function inversion approach. Specifically, we implement one manifestation of the algorithm as part of a more general automated quantum circuit synthesis, compilation, and optimization toolkit. We illustrate production of reversible circuits for crypto-hash functions that inherently provide the inverse of the function, and we describe data structures that increase the scalability of the hash function inversion approach.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Limits on Inferring T-cell Specificity from Partial Information
Authors:
James Henderson,
Yuta Nagano,
Martina Milighetti,
Andreas Tiffeau-Mayer
Abstract:
A key challenge in molecular biology is to decipher the map** of protein sequence to function. To perform this map** requires the identification of sequence features most informative about function. Here, we quantify the amount of information (in bits) that T-cell receptor (TCR) sequence features provide about antigen specificity. We identify informative features by their degree of conservatio…
▽ More
A key challenge in molecular biology is to decipher the map** of protein sequence to function. To perform this map** requires the identification of sequence features most informative about function. Here, we quantify the amount of information (in bits) that T-cell receptor (TCR) sequence features provide about antigen specificity. We identify informative features by their degree of conservation among antigen-specific receptors relative to null expectations. We find that TCR specificity synergistically depends on the hypervariable regions of both receptor chains, with a degree of synergy that strongly depends on the ligand. Using a coincidence-based approach to measuring information enables us to directly bound the accuracy with which TCR specificity can be predicted from partial matches to reference sequences. We anticipate that our statistical framework will be of use for develo** machine learning models for TCR specificity prediction and for optimizing TCRs for cell therapies. The proposed coincidence-based information measures might find further applications in bounding the performance of pairwise classifiers in other fields.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Designing a Photonic Physically Unclonable Function Having Resilience to Machine Learning Attacks
Authors:
Elena R. Henderson,
Jessie M. Henderson,
Hiva Shahoei,
William V. Oxford,
Eric C. Larson,
Duncan L. MacFarlane,
Mitchell A. Thornton
Abstract:
Physically unclonable functions (PUFs) are designed to act as device 'fingerprints.' Given an input challenge, the PUF circuit should produce an unpredictable response for use in situations such as root-of-trust applications and other hardware-level cybersecurity applications. PUFs are typically subcircuits present within integrated circuits (ICs), and while conventional IC PUFs are well-understoo…
▽ More
Physically unclonable functions (PUFs) are designed to act as device 'fingerprints.' Given an input challenge, the PUF circuit should produce an unpredictable response for use in situations such as root-of-trust applications and other hardware-level cybersecurity applications. PUFs are typically subcircuits present within integrated circuits (ICs), and while conventional IC PUFs are well-understood, several implementations have proven vulnerable to malicious exploits, including those perpetrated by machine learning (ML)-based attacks. Such attacks can be difficult to prevent because they are often designed to work even when relatively few challenge-response pairs are known in advance. Hence the need for both more resilient PUF designs and analysis of ML-attack susceptibility. Previous work has developed a PUF for photonic integrated circuits (PICs). A PIC PUF not only produces unpredictable responses given manufacturing-introduced tolerances, but is also less prone to electromagnetic radiation eavesdrop** attacks than a purely electronic IC PUF. In this work, we analyze the resilience of the proposed photonic PUF when subjected to ML-based attacks. Specifically, we describe a computational PUF model for producing the large datasets required for training ML attacks; we analyze the quality of the model; and we discuss the modeled PUF's susceptibility to ML-based attacks. We find that the modeled PUF generates distributions that resemble uniform white noise, explaining the exhibited resilience to neural-network-based attacks designed to exploit latent relationships between challenges and responses. Preliminary analysis suggests that the PUF exhibits similar resilience to generative adversarial networks, and continued development will show whether more-sophisticated ML approaches better compromise the PUF and -- if so -- how design modifications might improve resilience.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
A Photonic Physically Unclonable Function's Resilience to Multiple-Valued Machine Learning Attacks
Authors:
Jessie M. Henderson,
Elena R. Henderson,
Clayton A. Harper,
Hiva Shahoei,
William V. Oxford,
Eric C. Larson,
Duncan L. MacFarlane,
Mitchell A. Thornton
Abstract:
Physically unclonable functions (PUFs) identify integrated circuits using nonlinearly-related challenge-response pairs (CRPs). Ideally, the relationship between challenges and corresponding responses is unpredictable, even if a subset of CRPs is known. Previous work developed a photonic PUF offering improved security compared to non-optical counterparts. Here, we investigate this PUF's susceptibil…
▽ More
Physically unclonable functions (PUFs) identify integrated circuits using nonlinearly-related challenge-response pairs (CRPs). Ideally, the relationship between challenges and corresponding responses is unpredictable, even if a subset of CRPs is known. Previous work developed a photonic PUF offering improved security compared to non-optical counterparts. Here, we investigate this PUF's susceptibility to Multiple-Valued-Logic-based machine learning attacks. We find that approximately 1,000 CRPs are necessary to train models that predict response bits better than random chance. Given the significant challenge of acquiring a vast number of CRPs from a photonic PUF, our results demonstrate photonic PUF resilience against such attacks.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Nonparametric Variational Regularisation of Pretrained Transformers
Authors:
Fabio Fehr,
James Henderson
Abstract:
The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their training data, and as a result the models perform poorly when the domain changes. Also, due to the model's scale, the cost of fine-tuning the model…
▽ More
The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their training data, and as a result the models perform poorly when the domain changes. Also, due to the model's scale, the cost of fine-tuning the model to the new domain is large. Nonparametric Variational Information Bottleneck (NVIB) has been proposed as a regulariser for training cross-attention in Transformers, potentially addressing the overfitting problem. We extend the NVIB framework to replace all types of attention functions in Transformers, and show that existing pretrained Transformers can be reinterpreted as Nonparametric Variational (NV) models using a proposed identity initialisation. We then show that changing the initialisation introduces a novel, information-theoretic post-training regularisation in the attention mechanism, which improves out-of-domain generalisation without any training. This success supports the hypothesis that pretrained Transformers are implicitly NV Bayesian models.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Plug-and-Play Stability for Intracortical Brain-Computer Interfaces: A One-Year Demonstration of Seamless Brain-to-Text Communication
Authors:
Chaofei Fan,
Nick Hahn,
Foram Kamdar,
Donald Avansino,
Guy H. Wilson,
Leigh Hochberg,
Krishna V. Shenoy,
Jaimie M. Henderson,
Francis R. Willett
Abstract:
Intracortical brain-computer interfaces (iBCIs) have shown promise for restoring rapid communication to people with neurological disorders such as amyotrophic lateral sclerosis (ALS). However, to maintain high performance over time, iBCIs typically need frequent recalibration to combat changes in the neural recordings that accrue over days. This requires iBCI users to stop using the iBCI and engag…
▽ More
Intracortical brain-computer interfaces (iBCIs) have shown promise for restoring rapid communication to people with neurological disorders such as amyotrophic lateral sclerosis (ALS). However, to maintain high performance over time, iBCIs typically need frequent recalibration to combat changes in the neural recordings that accrue over days. This requires iBCI users to stop using the iBCI and engage in supervised data collection, making the iBCI system hard to use. In this paper, we propose a method that enables self-recalibration of communication iBCIs without interrupting the user. Our method leverages large language models (LMs) to automatically correct errors in iBCI outputs. The self-recalibration process uses these corrected outputs ("pseudo-labels") to continually update the iBCI decoder online. Over a period of more than one year (403 days), we evaluated our Continual Online Recalibration with Pseudo-labels (CORP) framework with one clinical trial participant. CORP achieved a stable decoding accuracy of 93.84% in an online handwriting iBCI task, significantly outperforming other baseline methods. Notably, this is the longest-running iBCI stability demonstration involving a human participant. Our results provide the first evidence for long-term stabilization of a plug-and-play, high-performance communication iBCI, addressing a major barrier for the clinical translation of iBCIs.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Progress Metrics in DAG-based Consensus
Authors:
Quan Nguyen,
James Henderson,
Egor Lysenko
Abstract:
Lachesis protocol~\cite{lachesis2021} leverages a DAG of events to allow nodes to reach fast consensus of events. This work introduces DAG progress metrics to drive the nodes to emit new events more effectively. With these metrics, nodes can select event timing and can choose previous events as parents for their own new events. Our results show that our event timing and parent selection methods ca…
▽ More
Lachesis protocol~\cite{lachesis2021} leverages a DAG of events to allow nodes to reach fast consensus of events. This work introduces DAG progress metrics to drive the nodes to emit new events more effectively. With these metrics, nodes can select event timing and can choose previous events as parents for their own new events. Our results show that our event timing and parent selection methods can help reaching consensus quicker and thus can reduce lower time to finality significantly.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
Variational Autoencoders for Noise Reduction in Industrial LLRF Systems
Authors:
J. P. Edelen,
M. J. Henderson,
J. Einstein-Curtis,
C. C. Hall,
J. A. Diaz Cruz,
A. L. Edelen
Abstract:
Industrial particle accelerators inherently operate in much dirtier environments than typical research accelerators. This leads to an increase in noise both in the RF system and in other electronic systems. Combined with the fact that industrial accelerators are mass produced, there is less attention given to optimizing the performance of an individual system. As a result, industrial systems tend…
▽ More
Industrial particle accelerators inherently operate in much dirtier environments than typical research accelerators. This leads to an increase in noise both in the RF system and in other electronic systems. Combined with the fact that industrial accelerators are mass produced, there is less attention given to optimizing the performance of an individual system. As a result, industrial systems tend to under perform considering their hardware hardware capabilities. With the growing demand for accelerators for medical sterilization, food irradiation, cancer treatment, and imaging, improving the signal processing of these machines will increase the margin for the deployment of these systems. Our work is focusing on using machine learning techniques to reduce the noise of RF signals used for pulse-to-pulse feedback in industrial accelerators. We will review our algorithms, simulation results, and results working with measured data. We will then discuss next steps for deployment and testing on an industrial system.
△ Less
Submitted 7 November, 2023; v1 submitted 29 October, 2023;
originally announced November 2023.
-
Transformers as Graph-to-Graph Models
Authors:
James Henderson,
Alireza Mohammadshahi,
Andrei C. Coman,
Lesly Miculicich
Abstract:
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs…
▽ More
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pretrained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimise the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modelling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Learning to Abstract with Nonparametric Variational Information Bottleneck
Authors:
Melika Behjati,
Fabio Fehr,
James Henderson
Abstract:
Learned representations at the level of characters, sub-words, words and sentences, have each contributed to advances in understanding different NLP tasks and linguistic phenomena. However, learning textual embeddings is costly as they are tokenization specific and require different models to be trained for each level of abstraction. We introduce a novel language representation model which can lea…
▽ More
Learned representations at the level of characters, sub-words, words and sentences, have each contributed to advances in understanding different NLP tasks and linguistic phenomena. However, learning textual embeddings is costly as they are tokenization specific and require different models to be trained for each level of abstraction. We introduce a novel language representation model which can learn to compress to different levels of abstraction at different layers of the same model. We apply Nonparametric Variational Information Bottleneck (NVIB) to stacked Transformer self-attention layers in the encoder, which encourages an information-theoretic compression of the representations through the model. We find that the layers within the model correspond to increasing levels of abstraction and that their representations are more linguistically informed. Finally, we show that NVIB compression results in a model which is more robust to adversarial perturbations.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction
Authors:
Andrei C. Coman,
Christos Theodoropoulos,
Marie-Francine Moens,
James Henderson
Abstract:
Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph rela…
▽ More
Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph relations in the computation of attention weights. We introduce a joint text-graph Transformer model and a graph-assisted declarative pooling (GADePo) specification of the input, which provides explicit and high-level instructions for information aggregation. GADePo allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customisable pooling strategies. We evaluate our method across diverse datasets and models and show that our approach yields promising results that are consistently better than those achieved by the hand-coded pooling functions.
△ Less
Submitted 18 June, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
TESS: Text-to-Text Self-Conditioned Simplex Diffusion
Authors:
Rabeeh Karimi Mahabadi,
Hamish Ivison,
Jaesung Tae,
James Henderson,
Iz Beltagy,
Matthew E. Peters,
Arman Cohan
Abstract:
Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete nature and the need for a large number of diffusion steps to generate text, making diffusion-based generation expensive. In this work, we propose Text-to-text Self-c…
▽ More
Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete nature and the need for a large number of diffusion steps to generate text, making diffusion-based generation expensive. In this work, we propose Text-to-text Self-conditioned Simplex Diffusion (TESS), a text diffusion model that is fully non-autoregressive, employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space. Through extensive experiments on natural language understanding and generation tasks including summarization, text simplification, paraphrase generation, and question generation, we demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models. We publicly release our codebase at https://github.com/allenai/tess-diffusion.
△ Less
Submitted 20 February, 2024; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Automated Quantum Memory Compilation with Improved Dynamic Range
Authors:
Aviraj Sinha,
Elena R. Henderson,
Jessie M. Henderson,
Mitchell A. Thornton
Abstract:
Emerging quantum algorithms that process data require that classical input data be represented as a quantum state. These data-processing algorithms often follow the gate model of quantum computing--which requires qubits to be initialized to a basis state, typically $\lvert 0 \rangle$--and thus often employ state generation circuits to transform the initialized basis state to a data-representation…
▽ More
Emerging quantum algorithms that process data require that classical input data be represented as a quantum state. These data-processing algorithms often follow the gate model of quantum computing--which requires qubits to be initialized to a basis state, typically $\lvert 0 \rangle$--and thus often employ state generation circuits to transform the initialized basis state to a data-representation state. There are many ways to encode classical data in a qubit, and the oft-applied approach of basis encoding does not allow optimization to the extent that other variants do. In this work, we thus consider automatic synthesis of addressable, quantum read-only memory (QROM) circuits, which act as data-encoding state-generation circuits. We investigate three data encoding approaches, one of which we introduce to provide improved dynamic range and precision. We present experimental results that compare these encoding methods for QROM synthesis to better understand the implications of and applications for each.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question
Authors:
Alireza Mohammadshahi,
Thomas Scialom,
Majid Yazdani,
Pouya Yanki,
Angela Fan,
James Henderson,
Marzieh Saeidi
Abstract:
Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided refer…
▽ More
Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer modules, using pre-trained models from existing literature, thus it can be used without any further training. We demonstrate that RQUGE has a higher correlation with human judgment without relying on the reference question. Additionally, RQUGE is shown to be more robust to several adversarial corruptions. Furthermore, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on synthetic data generated by a question generation model and re-ranked by RQUGE.
△ Less
Submitted 26 May, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Privacy Analysis of Samsung's Crowd-Sourced Bluetooth Location Tracking System
Authors:
Tingfeng Yu,
James Henderson,
Alwen Tiu,
Thomas Haines
Abstract:
We present a detailed privacy analysis of Samsung's Offline Finding (OF) protocol, which is part of Samsung's Find My Mobile (FMM) location tracking system for locating Samsung mobile devices, such as Samsung smartphones and Bluetooth trackers (Galaxy SmartTags). The OF protocol uses Bluetooth Low Energy (BLE) to broadcast a unique beacon for a lost device. This beacon is then picked up by nearby…
▽ More
We present a detailed privacy analysis of Samsung's Offline Finding (OF) protocol, which is part of Samsung's Find My Mobile (FMM) location tracking system for locating Samsung mobile devices, such as Samsung smartphones and Bluetooth trackers (Galaxy SmartTags). The OF protocol uses Bluetooth Low Energy (BLE) to broadcast a unique beacon for a lost device. This beacon is then picked up by nearby Samsung phones or tablets (the {\em finder} devices), which then forward the unique beacon, along with the location it was detected at, to a Samsung managed server. The owner of a lost device can then query the server to locate their device. We examine several security and privacy related properties of the OF protocol and its implementation, from the perspectives of the owner, the finder and the vendor. These include examining: the possibility of identifying the owner of a device through the Bluetooth data obtained from the device, the possibility for a malicious actor to perform unwanted tracking against a person by exploiting the OF network, the possibility for the vendor to de-anonymise location reports to determine the locations of the owners or the finders of lost devices, and the possibility for an attacker to compromise the integrity of the location reports. Our findings suggest that there are privacy risks on all accounts, arising from issues in the design and the implementation of the OF protocol.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Quantum Algorithms for Geologic Fracture Networks
Authors:
Jessie M. Henderson,
Marianna Podzorova,
M. Cerezo,
John K. Golden,
Leonard Gleyzer,
Hari S. Viswanathan,
Daniel O'Malley
Abstract:
Solving large systems of equations is a challenge for modeling natural phenomena, such as simulating subsurface flow. To avoid systems that are intractable on current computers, it is often necessary to neglect information at small scales, an approach known as coarse-graining. For many practical applications, such as flow in porous, homogenous materials, coarse-graining offers a sufficiently-accur…
▽ More
Solving large systems of equations is a challenge for modeling natural phenomena, such as simulating subsurface flow. To avoid systems that are intractable on current computers, it is often necessary to neglect information at small scales, an approach known as coarse-graining. For many practical applications, such as flow in porous, homogenous materials, coarse-graining offers a sufficiently-accurate approximation of the solution. Unfortunately, fractured systems cannot be accurately coarse-grained, as critical network topology exists at the smallest scales, including topology that can push the network across a percolation threshold. Therefore, new techniques are necessary to accurately model important fracture systems. Quantum algorithms for solving linear systems offer a theoretically-exponential improvement over their classical counterparts, and in this work we introduce two quantum algorithms for fractured flow. The first algorithm, designed for future quantum computers which operate without error, has enormous potential, but we demonstrate that current hardware is too noisy for adequate performance. The second algorithm, designed to be noise resilient, already performs well for problems of small to medium size (order 10 to 1000 nodes), which we demonstrate experimentally and explain theoretically. We expect further improvements by leveraging quantum error mitigation and preconditioning.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
Authors:
Alireza Mohammadshahi,
Vassilina Nikoulina,
Alexandre Berard,
Caroline Brun,
James Henderson,
Laurent Besacier
Abstract:
In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the "curse of multilinguality", these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introd…
▽ More
In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the "curse of multilinguality", these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introduce SMaLL-100, a distilled version of the M2M-100 (12B) model, a massively multilingual machine translation model covering 100 languages. We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages. We evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba, and TICO-19 and demonstrate that it outperforms previous massively multilingual models of comparable sizes (200-600M) while improving inference latency and memory usage. Additionally, our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference. Code and pre-trained models: https://github.com/alirezamshi/small100
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
FASTER-CE: Fast, Sparse, Transparent, and Robust Counterfactual Explanations
Authors:
Shubham Sharma,
Alan H. Gee,
Jette Henderson,
Joydeep Ghosh
Abstract:
Counterfactual explanations have substantially increased in popularity in the past few years as a useful human-centric way of understanding individual black-box model predictions. While several properties desired of high-quality counterfactuals have been identified in the literature, three crucial concerns: the speed of explanation generation, robustness/sensitivity and succinctness of explanation…
▽ More
Counterfactual explanations have substantially increased in popularity in the past few years as a useful human-centric way of understanding individual black-box model predictions. While several properties desired of high-quality counterfactuals have been identified in the literature, three crucial concerns: the speed of explanation generation, robustness/sensitivity and succinctness of explanations (sparsity) have been relatively unexplored. In this paper, we present FASTER-CE: a novel set of algorithms to generate fast, sparse, and robust counterfactual explanations. The key idea is to efficiently find promising search directions for counterfactuals in a latent space that is specified via an autoencoder. These directions are determined based on gradients with respect to each of the original input features as well as of the target, as estimated in the latent space. The ability to quickly examine combinations of the most promising gradient directions as well as to incorporate additional user-defined constraints allows us to generate multiple counterfactual explanations that are sparse, realistic, and robust to input manipulations. Through experiments on three datasets of varied complexities, we show that FASTER-CE is not only much faster than other state of the art methods for generating multiple explanations but also is significantly superior when considering a larger set of desirable (and often conflicting) properties. Specifically we present results across multiple performance metrics: sparsity, proximity, validity, speed of generation, and the robustness of explanations, to highlight the capabilities of the FASTER-CE family.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
FEAMOE: Fair, Explainable and Adaptive Mixture of Experts
Authors:
Shubham Sharma,
Jette Henderson,
Joydeep Ghosh
Abstract:
Three key properties that are desired of trustworthy machine learning models deployed in high-stakes environments are fairness, explainability, and an ability to account for various kinds of "drift". While drifts in model accuracy, for example due to covariate shift, have been widely investigated, drifts in fairness metrics over time remain largely unexplored. In this paper, we propose FEAMOE, a n…
▽ More
Three key properties that are desired of trustworthy machine learning models deployed in high-stakes environments are fairness, explainability, and an ability to account for various kinds of "drift". While drifts in model accuracy, for example due to covariate shift, have been widely investigated, drifts in fairness metrics over time remain largely unexplored. In this paper, we propose FEAMOE, a novel "mixture-of-experts" inspired framework aimed at learning fairer, more explainable/interpretable models that can also rapidly adjust to drifts in both the accuracy and the fairness of a classifier. We illustrate our framework for three popular fairness measures and demonstrate how drift can be handled with respect to these fairness constraints. Experiments on multiple datasets show that our framework as applied to a mixture of linear experts is able to perform comparably to neural networks in terms of accuracy while producing fairer models. We then use the large-scale HMDA dataset and show that while various models trained on HMDA demonstrate drift with respect to both accuracy and fairness, FEAMOE can ably handle these drifts with respect to all the considered fairness measures and maintain model accuracy as well. We also prove that the proposed framework allows for producing fast Shapley value explanations, which makes computationally efficient feature attribution based explanations of model decisions readily available via FEAMOE.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Smart Visual Beacons with Asynchronous Optical Communications using Event Cameras
Authors:
Ziwei Wang,
Yonhon Ng,
Jack Henderson,
Robert Mahony
Abstract:
Event cameras are bio-inspired dynamic vision sensors that respond to changes in image intensity with a high temporal resolution, high dynamic range and low latency. These sensor characteristics are ideally suited to enable visual target tracking in concert with a broadcast visual communication channel for smart visual beacons with applications in distributed robotics. Visual beacons can be constr…
▽ More
Event cameras are bio-inspired dynamic vision sensors that respond to changes in image intensity with a high temporal resolution, high dynamic range and low latency. These sensor characteristics are ideally suited to enable visual target tracking in concert with a broadcast visual communication channel for smart visual beacons with applications in distributed robotics. Visual beacons can be constructed by high-frequency modulation of Light Emitting Diodes (LEDs) such as vehicle headlights, Internet of Things (IoT) LEDs, smart building lights, etc., that are already present in many real-world scenarios. The high temporal resolution characteristic of the event cameras allows them to capture visual signals at far higher data rates compared to classical frame-based cameras. In this paper, we propose a novel smart visual beacon architecture with both LED modulation and event camera demodulation algorithms. We quantitatively evaluate the relationship between LED transmission rate, communication distance and the message transmission accuracy for the smart visual beacon communication system that we prototyped. The proposed method achieves up to 4 kbps in an indoor environment and lossless transmission over a distance of 100 meters, at a transmission rate of 500 bps, in full sunlight, demonstrating the potential of the technology in an outdoor environment.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck
Authors:
James Henderson,
Fabio Fehr
Abstract:
We propose a VAE for Transformers by develo** a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components s…
▽ More
We propose a VAE for Transformers by develo** a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components supported by nonparametric methods captures the variable number of vectors supported by attention, and the exchangeability of our nonparametric distributions captures the permutation invariance of attention. This allows NVIB to regularise the number of vectors accessible with attention, as well as the amount of information in individual vectors. By regularising the cross-attention of a Transformer encoder-decoder with NVIB, we propose a nonparametric variational autoencoder (NVAE). Initial experiments on training a NVAE on natural language text show that the induced embedding space has the desired properties of a VAE for Transformers.
△ Less
Submitted 12 August, 2022; v1 submitted 27 July, 2022;
originally announced July 2022.
-
Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers
Authors:
Luis Espinosa-Anke,
Alexander Shvets,
Alireza Mohammadshahi,
James Henderson,
Leo Wanner
Abstract:
Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collo…
▽ More
Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
What Do Compressed Multilingual Machine Translation Models Forget?
Authors:
Alireza Mohammadshahi,
Vassilina Nikoulina,
Alexandre Berard,
Caroline Brun,
James Henderson,
Laurent Besacier
Abstract:
Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performa…
▽ More
Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performance averaged across multiple tasks and/or languages may hide a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the models. In this work, we assess the impact of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups, gender, and semantic biases by extensive analysis of compressed models on different machine translation benchmarks, i.e. FLORES-101, MT-Gender, and DiBiMT. We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases. Interestingly, the removal of noisy memorization with compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages. Code: https://github.com/alirezamshi/bias-compressedMT
△ Less
Submitted 27 June, 2023; v1 submitted 22 May, 2022;
originally announced May 2022.
-
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
Authors:
Rabeeh Karimi Mahabadi,
Luke Zettlemoyer,
James Henderson,
Marzieh Saeidi,
Lambert Mathias,
Veselin Stoyanov,
Majid Yazdani
Abstract:
Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as…
▽ More
Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. PERFECT makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn new multi-token label embeddings during fine-tuning, which are not tied to the model vocabulary and which allow us to avoid complex auto-regressive decoding. These embeddings are not only learnable from limited data but also enable nearly 100x faster training and inference. Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods. Our code is publicly available at https://github.com/facebookresearch/perfect.git.
△ Less
Submitted 25 April, 2022; v1 submitted 3 April, 2022;
originally announced April 2022.
-
Graph Refinement for Coreference Resolution
Authors:
Lesly Miculicich,
James Henderson
Abstract:
The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the g…
▽ More
The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the graph in a non-autoregressive manner, then iteratively refines it based on previous predictions, allowing global dependencies between decisions. The experimental results show improvements over various baselines, reinforcing the hypothesis that document-level information improves conference resolution.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
HyperMixer: An MLP-based Low Cost Alternative to Transformers
Authors:
Florian Mai,
Arnaud Pannatier,
Fabio Fehr,
Haolin Chen,
Francois Marelli,
Francois Fleuret,
James Henderson
Abstract:
Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token m…
▽ More
Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning.
△ Less
Submitted 13 November, 2023; v1 submitted 7 March, 2022;
originally announced March 2022.
-
Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation
Authors:
Florian Mai,
James Henderson
Abstract:
Text autoencoders are often used for unsupervised conditional text generation by applying map**s in the latent space to change attributes to the desired values. Recently, Mai et al. (2020) proposed Emb2Emb, a method to learn these map**s in the embedding space of an autoencoder. However, their method is restricted to autoencoders with a single-vector embedding, which limits how much informatio…
▽ More
Text autoencoders are often used for unsupervised conditional text generation by applying map**s in the latent space to change attributes to the desired values. Recently, Mai et al. (2020) proposed Emb2Emb, a method to learn these map**s in the embedding space of an autoencoder. However, their method is restricted to autoencoders with a single-vector embedding, which limits how much information can be retained. We address this issue by extending their method to Bag-of-Vectors Autoencoders (BoV-AEs), which encode the text into a variable-size bag of vectors that grows with the size of the text, as in attention-based models. This allows to encode and reconstruct much longer texts than standard autoencoders. Analogous to conventional autoencoders, we propose regularization techniques that facilitate learning meaningful operations in the latent space. Finally, we adapt Emb2Emb for a training scheme that learns to map an input bag to an output bag, including a novel loss function and neural architecture. Our empirical evaluations on unsupervised sentiment transfer show that our method performs substantially better than a standard autoencoder.
△ Less
Submitted 4 February, 2023; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning
Authors:
Christos Theodoropoulos,
James Henderson,
Andrei C. Coman,
Marie-Francine Moens
Abstract:
Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive…
▽ More
Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive learning to impose relation-related structure on the token-level representations of the sentence obtained with a CharacterBERT (El Boukkouri et al.,2020) model. The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task using only a simple KNN classifier, thereby demonstrating the success of the proposed method. Additional visualization by a tSNE analysis shows the effectiveness of the learned representation space compared to baselines. Furthermore, we show that we can learn a different space for named entity recognition, again using a contrastive learning objective, and demonstrate how to successfully combine both representation spaces in an entity-relation task.
△ Less
Submitted 4 September, 2021; v1 submitted 2 September, 2021;
originally announced September 2021.
-
The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task
Authors:
James Barry,
Alireza Mohammadshahi,
Joachim Wagner,
Jennifer Foster,
James Henderson
Abstract:
We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from ea…
▽ More
We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings. Our approach uses the Stanza pipeline to preprocess the text files, XLMRoBERTa to obtain contextualized token representations, and an edge-scoring and labeling model to predict the enhanced graph. Finally, we run a post-processing script to ensure all of our outputs are valid Enhanced UD graphs. Our system places 6th out of 9 participants with a coarse Enhanced Labeled Attachment Score (ELAS) of 83.57. We carry out additional post-deadline experiments which include using Trankit for pre-processing, XLM-RoBERTa-LARGE, treebank concatenation, and multitask learning between a basic and an enhanced dependency parser. All of these modifications improve our initial score and our final system has a coarse ELAS of 88.04.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Variational Information Bottleneck for Effective Low-Resource Fine-Tuning
Authors:
Rabeeh Karimi Mahabadi,
Yonatan Belinkov,
James Henderson
Abstract:
While large-scale pretrained language models have obtained impressive results when fine-tuned on a wide variety of tasks, they still often suffer from overfitting in low-resource scenarios. Since such models are general-purpose feature extractors, many of these features are inevitably irrelevant for a given target task. We propose to use Variational Information Bottleneck (VIB) to suppress irrelev…
▽ More
While large-scale pretrained language models have obtained impressive results when fine-tuned on a wide variety of tasks, they still often suffer from overfitting in low-resource scenarios. Since such models are general-purpose feature extractors, many of these features are inevitably irrelevant for a given target task. We propose to use Variational Information Bottleneck (VIB) to suppress irrelevant features when fine-tuning on low-resource target tasks, and show that our method successfully reduces overfitting. Moreover, we show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets, and thereby obtains better generalization to out-of-domain datasets. Evaluation on seven low-resource datasets in different tasks shows that our method significantly improves transfer learning in low-resource scenarios, surpassing prior work. Moreover, it improves generalization on 13 out of 15 out-of-domain natural language inference benchmarks. Our code is publicly available in https://github.com/rabeehk/vibert.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Authors:
Rabeeh Karimi Mahabadi,
James Henderson,
Sebastian Ruder
Abstract:
Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent wor…
▽ More
Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared "slow" weights and "fast" rank-one matrices defined per Compacter layer. By only training 0.047% of a pretrained model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms standard fine-tuning on SuperGLUE and low-resource settings. Our code is publicly available at~\url{https://github.com/rabeehk/compacter}.
△ Less
Submitted 27 November, 2021; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
Authors:
Rabeeh Karimi Mahabadi,
Sebastian Ruder,
Mostafa Dehghani,
James Henderson
Abstract:
State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which co…
▽ More
State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model. This parameter-efficient multi-task learning framework allows us to achieve the best of both worlds by sharing knowledge across tasks via hypernetworks while enabling the model to adapt to each individual task through task-specific adapters. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task. We additionally demonstrate substantial performance improvements in few-shot domain generalization across a variety of tasks. Our code is publicly available in https://github.com/rabeehk/hyperformer.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
The Closer You Look, The More You Learn: A Grey-box Approach to Protocol State Machine Learning
Authors:
Chris McMahon Stone,
Sam L. Thomas,
Mathy Vanhoef,
James Henderson,
Nicolas Bailluet,
Tom Chothia
Abstract:
In this paper, we propose a new approach to infer state machine models from protocol implementations. Our method, STATEINSPECTOR, learns protocol states by using novel program analyses to combine observations of run-time memory and I/O. It requires no access to source code and only lightweight execution monitoring of the implementation under test. We demonstrate and evaluate STATEINSPECTOR's effec…
▽ More
In this paper, we propose a new approach to infer state machine models from protocol implementations. Our method, STATEINSPECTOR, learns protocol states by using novel program analyses to combine observations of run-time memory and I/O. It requires no access to source code and only lightweight execution monitoring of the implementation under test. We demonstrate and evaluate STATEINSPECTOR's effectiveness on numerous TLS and WPA/2 implementations. In the process, we show STATEINSPECTOR enables deeper state discovery, increased learning efficiency, and more insightful post-mortem analyses than existing approaches. Further to improved learning, our method led us to discover several concerning deviations from the standards and a high impact vulnerability in a prominent Wi-Fi implementation.
△ Less
Submitted 7 June, 2021; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling
Authors:
Alireza Mohammadshahi,
James Henderson
Abstract:
Recent models have shown that incorporating syntactic knowledge into the semantic role labelling (SRL) task leads to a significant improvement. In this paper, we propose Syntax-aware Graph-to-Graph Transformer (SynG2G-Tr) model, which encodes the syntactic structure using a novel way to input graph relations as embeddings, directly into the self-attention mechanism of Transformer. This approach ad…
▽ More
Recent models have shown that incorporating syntactic knowledge into the semantic role labelling (SRL) task leads to a significant improvement. In this paper, we propose Syntax-aware Graph-to-Graph Transformer (SynG2G-Tr) model, which encodes the syntactic structure using a novel way to input graph relations as embeddings, directly into the self-attention mechanism of Transformer. This approach adds a soft bias towards attention patterns that follow the syntactic structure but also allows the model to use this information to learn alternative patterns. We evaluate our model on both span-based and dependency-based SRL datasets, and outperform previous alternative methods in both in-domain and out-of-domain settings, on CoNLL 2005 and CoNLL 2009 datasets.
△ Less
Submitted 2 June, 2023; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Inertial Collaborative Localisation for Autonomous Vehicles using a Minimum Energy Filter
Authors:
Jack Henderson,
Mohammad Zamani,
Robert Mahony,
Jochen Trumpf
Abstract:
Collaborative Localisation has been studied extensively in recent years as a way to improve pose estimation of unmanned aerial vehicles in challenging environments. However little attention has been paid toward advancing the underlying filter design beyond standard Extended Kalman Filter-based approaches. In this paper, we detail a discrete-time collaborative localisation filter using the determin…
▽ More
Collaborative Localisation has been studied extensively in recent years as a way to improve pose estimation of unmanned aerial vehicles in challenging environments. However little attention has been paid toward advancing the underlying filter design beyond standard Extended Kalman Filter-based approaches. In this paper, we detail a discrete-time collaborative localisation filter using the deterministic minimum-energy framework. The filter incorporates measurements from an inertial measurement unit and models the effects of sensor bias and gravitational acceleration. We present a simulation based on real-world vehicle trajectories and IMU data that demonstrates how collaborative localisation can improve performance over single-vehicle methods.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Inducing Meaningful Units from Character Sequences with Dynamic Capacity Slot Attention
Authors:
Melika Behjati,
James Henderson
Abstract:
Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaningful units in a sequence of characters. Rather than segmenting the sequence, our Dynamic Capacity Slot Attention model discovers continuous representations of the objects in the sequence, extending an architecture for object discovery in images. We train ou…
▽ More
Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaningful units in a sequence of characters. Rather than segmenting the sequence, our Dynamic Capacity Slot Attention model discovers continuous representations of the objects in the sequence, extending an architecture for object discovery in images. We train our model on different languages and evaluate the quality of the obtained representations with forward and reverse probing classifiers. These experiments show that our model succeeds in discovering units which are similar to those proposed previously in form, content and level of abstraction, and which show promise for capturing meaningful information at a higher level of abstraction.
△ Less
Submitted 16 January, 2024; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Multi-Adversarial Learning for Cross-Lingual Word Embeddings
Authors:
Haozhou Wang,
James Henderson,
Paola Merlo
Abstract:
Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings -- maps of matching words across languages -- without supervision. Despite these successes, GANs' performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs' incorrect assumption that source and target embedding spaces are related by…
▽ More
Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings -- maps of matching words across languages -- without supervision. Despite these successes, GANs' performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs' incorrect assumption that source and target embedding spaces are related by a single linear map** and are approximately isomorphic. We assume instead that, especially across distant languages, the map** is only piece-wise linear, and propose a multi-adversarial learning method. This novel method induces the seed cross-lingual dictionary through multiple map**s, each induced to fit the map** for one subspace. Our experiments on unsupervised bilingual lexicon induction show that this method improves performance over previous single-map** methods, especially for distant languages.
△ Less
Submitted 25 August, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Plug and Play Autoencoders for Conditional Text Generation
Authors:
Florian Mai,
Nikolaos Pappas,
Ivan Montero,
Noah A. Smith,
James Henderson
Abstract:
Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a map** within the autoencoder's embedding space, training embedding-to-embedding (Emb2Emb). This reduces the need for labeled training data for the task and makes the training procedure mo…
▽ More
Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a map** within the autoencoder's embedding space, training embedding-to-embedding (Emb2Emb). This reduces the need for labeled training data for the task and makes the training procedure more efficient. Crucial to the success of this method is a loss term for kee** the mapped embedding on the manifold of the autoencoder and a map** which is trained to navigate the manifold by learning offset vectors. Evaluations on style transfer tasks both with and without sequence-to-sequence supervision show that our method performs better than or comparable to strong baselines while being up to four times faster.
△ Less
Submitted 12 October, 2020; v1 submitted 6 October, 2020;
originally announced October 2020.
-
A Minimum Energy Filter for Localisation of an Unmanned Aerial Vehicle
Authors:
Jack Henderson,
Mohammad Zamani,
Robert Mahony,
Jochen Trumpf
Abstract:
Accurate localisation of unmanned aerial vehicles is vital for the next generation of automation tasks. This paper proposes a minimum energy filter for velocity-aided pose estimation on the extended special Euclidean group. The approach taken exploits the Lie-group symmetry of the problem to combine Inertial Measurement Unit (IMU) sensor output with landmark measurements into a robust and high per…
▽ More
Accurate localisation of unmanned aerial vehicles is vital for the next generation of automation tasks. This paper proposes a minimum energy filter for velocity-aided pose estimation on the extended special Euclidean group. The approach taken exploits the Lie-group symmetry of the problem to combine Inertial Measurement Unit (IMU) sensor output with landmark measurements into a robust and high performance state estimate. We propose an asynchronous discrete-time implementation to fuse high bandwidth IMU with low bandwidth discrete-time landmark measurements typical of real-world scenarios. The filter's performance is demonstrated by simulation.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Understanding collections of related datasets using dependent MMD coresets
Authors:
Sinead A. Williamson,
Jette Henderson
Abstract:
Understanding how two datasets differ can help us determine whether one dataset under-represents certain sub-populations, and provides insights into how well models will generalize across datasets. Representative points selected by a maximum mean discrepency (MMD) coreset can provide interpretable summaries of a single dataset, but are not easily compared across datasets. In this paper we introduc…
▽ More
Understanding how two datasets differ can help us determine whether one dataset under-represents certain sub-populations, and provides insights into how well models will generalize across datasets. Representative points selected by a maximum mean discrepency (MMD) coreset can provide interpretable summaries of a single dataset, but are not easily compared across datasets. In this paper we introduce dependent MMD coresets, a data summarization method for collections of datasets that facilitates comparison of distributions. We show that dependent MMD coresets are useful for understanding multiple related datasets and understanding model generalization between such datasets.
△ Less
Submitted 4 August, 2021; v1 submitted 24 June, 2020;
originally announced June 2020.
-
A Minimum Energy Filter for Distributed Multirobot Localisation
Authors:
Jack Henderson,
Jochen Trumpf,
Mohammad Zamani
Abstract:
We present a new approach to the cooperative localisation problem by applying the theory of minimum energy filtering. We consider the problem of estimating the pose of a group of mobile robots in an environment where robots can perceive fixed landmarks and neighbouring robots as well as share information with others over a communication channel. Whereas the vast majority of the existing literature…
▽ More
We present a new approach to the cooperative localisation problem by applying the theory of minimum energy filtering. We consider the problem of estimating the pose of a group of mobile robots in an environment where robots can perceive fixed landmarks and neighbouring robots as well as share information with others over a communication channel. Whereas the vast majority of the existing literature applies some variant of a Kalman Filter, we derive a set of filter equations for the global state estimate based on the principle of minimum energy filtering. We show how the filter equations can be decoupled and the calculations distributed among the robots in the network without requiring a central processing node. Finally, we provide a demonstration of the filter's performance in simulation.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
The Unstoppable Rise of Computational Linguistics in Deep Learning
Authors:
James Henderson
Abstract:
In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of variable binding and its instantiation in attention-based models, and argue that Transformer is not a sequence model but an induced-structure mod…
▽ More
In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of variable binding and its instantiation in attention-based models, and argue that Transformer is not a sequence model but an induced-structure model. This perspective leads to predictions of the challenges facing research in deep learning architectures for natural language understanding.
△ Less
Submitted 11 June, 2020; v1 submitted 13 May, 2020;
originally announced May 2020.
-
Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement
Authors:
Alireza Mohammadshahi,
James Henderson
Abstract:
We propose the Recursive Non-autoregressive Graph-to-Graph Transformer architecture (RNGTr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. We demonstrate the power and effectiveness of RNGTr on several dependency corpora, using a refinement model pre-trained with BER…
▽ More
We propose the Recursive Non-autoregressive Graph-to-Graph Transformer architecture (RNGTr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. We demonstrate the power and effectiveness of RNGTr on several dependency corpora, using a refinement model pre-trained with BERT. We also introduce Syntactic Transformer (SynTr), a non-recursive parser similar to our refinement model. RNGTr can improve the accuracy of a variety of initial parsers on 13 languages from the Universal Dependencies Treebanks, English and Chinese Penn Treebanks, and the German CoNLL2009 corpus, even improving over the new state-of-the-art results achieved by SynTr, significantly improving the state-of-the-art for all corpora tested.
△ Less
Submitted 10 November, 2020; v1 submitted 29 March, 2020;
originally announced March 2020.
-
Federating Recommendations Using Differentially Private Prototypes
Authors:
Mónica Ribero,
Jette Henderson,
Sinead Williamson,
Haris Vikalo
Abstract:
Machine learning methods allow us to make recommendations to users in applications across fields including entertainment, dating, and commerce, by exploiting similarities in users' interaction patterns. However, in domains that demand protection of personally sensitive data, such as medicine or banking, how can we learn such a model without accessing the sensitive data, and without inadvertently l…
▽ More
Machine learning methods allow us to make recommendations to users in applications across fields including entertainment, dating, and commerce, by exploiting similarities in users' interaction patterns. However, in domains that demand protection of personally sensitive data, such as medicine or banking, how can we learn such a model without accessing the sensitive data, and without inadvertently leaking private information? We propose a new federated approach to learning global and local private models for recommendation without collecting raw data, user statistics, or information about personal preferences. Our method produces a set of prototypes that allows us to infer global behavioral patterns, while providing differential privacy guarantees for users in any database of the system. By requiring only two rounds of communication, we both reduce the communication costs and avoid the excessive privacy loss associated with iterative procedures. We test our framework on synthetic data as well as real federated medical data and Movielens ratings data. We show local adaptation of the global model allows our method to outperform centralized matrix-factorization-based recommender system models, both in terms of accuracy of matrix reconstruction and in terms of relevance of the recommendations, while maintaining provable privacy guarantees. We also show that our method is more robust and is characterized by smaller variance than individual models learned by independent entities.
△ Less
Submitted 1 March, 2020;
originally announced March 2020.
-
Graph-to-Graph Transformer for Transition-based Dependency Parsing
Authors:
Alireza Mohammadshahi,
James Henderson
Abstract:
We propose the Graph2Graph Transformer architecture for conditioning on and predicting arbitrary graphs, and apply it to the challenging task of transition-based dependency parsing. After proposing two novel Transformer models of transition-based dependency parsing as strong baselines, we show that adding the proposed mechanisms for conditioning on and predicting graphs of Graph2Graph Transformer…
▽ More
We propose the Graph2Graph Transformer architecture for conditioning on and predicting arbitrary graphs, and apply it to the challenging task of transition-based dependency parsing. After proposing two novel Transformer models of transition-based dependency parsing as strong baselines, we show that adding the proposed mechanisms for conditioning on and predicting graphs of Graph2Graph Transformer results in significant improvements, both with and without BERT pre-training. The novel baselines and their integration with Graph2Graph Transformer significantly outperform the state-of-the-art in traditional transition-based dependency parsing on both English Penn Treebank, and 13 languages of Universal Dependencies Treebanks. Graph2Graph Transformer can be integrated with many previous structured prediction methods, making it easy to apply to a wide range of NLP tasks.
△ Less
Submitted 30 October, 2020; v1 submitted 8 November, 2019;
originally announced November 2019.
-
End-to-End Bias Mitigation by Modelling Biases in Corpora
Authors:
Rabeeh Karimi Mahabadi,
Yonatan Belinkov,
James Henderson
Abstract:
Several recent studies have shown that strong natural language understanding (NLU) models are prone to relying on unwanted dataset biases without learning the underlying task, resulting in models that fail to generalize to out-of-domain datasets and are likely to perform poorly in real-world scenarios. We propose two learning strategies to train neural models, which are more robust to such biases…
▽ More
Several recent studies have shown that strong natural language understanding (NLU) models are prone to relying on unwanted dataset biases without learning the underlying task, resulting in models that fail to generalize to out-of-domain datasets and are likely to perform poorly in real-world scenarios. We propose two learning strategies to train neural models, which are more robust to such biases and transfer better to out-of-domain datasets. The biases are specified in terms of one or more bias-only models, which learn to leverage the dataset biases. During training, the bias-only models' predictions are used to adjust the loss of the base model to reduce its reliance on biases by down-weighting the biased examples and focusing the training on the hard examples. We experiment on large-scale natural language inference and fact verification benchmarks, evaluating on out-of-domain datasets that are specifically designed to assess the robustness of models against known biases in the training data. Results show that our debiasing methods greatly improve robustness in all settings and better transfer to other textual entailment datasets. Our code and data are publicly available in \url{https://github.com/rabeehk/robust-nli}.
△ Less
Submitted 23 April, 2020; v1 submitted 13 September, 2019;
originally announced September 2019.
-
Partially-supervised Mention Detection
Authors:
Lesly Miculicich,
James Henderson
Abstract:
Learning to detect entity mentions without using syntactic information can be useful for integration and joint optimization with other tasks. However, it is common to have partially annotated data for this problem. Here, we investigate two approaches to deal with partial annotation of mentions: weighted loss and soft-target classification. We also propose two neural mention detection approaches: a…
▽ More
Learning to detect entity mentions without using syntactic information can be useful for integration and joint optimization with other tasks. However, it is common to have partially annotated data for this problem. Here, we investigate two approaches to deal with partial annotation of mentions: weighted loss and soft-target classification. We also propose two neural mention detection approaches: a sequence tagging, and an exhaustive search. We evaluate our methods with coreference resolution as a downstream task, using multitask learning. The results show that the recall and F1 score improve for all methods.
△ Less
Submitted 26 August, 2019;
originally announced August 2019.
-
Regularization Advantages of Multilingual Neural Language Models for Low Resource Domains
Authors:
Navid Rekabsaz,
Nikolaos Pappas,
James Henderson,
Banriskhem K. Khonglah,
Srikanth Madikeri
Abstract:
Neural language modeling (LM) has led to significant improvements in several applications, including Automatic Speech Recognition. However, they typically require large amounts of training data, which is not available for many domains and languages. In this study, we propose a multilingual neural language model architecture, trained jointly on the domain-specific data of several low-resource langu…
▽ More
Neural language modeling (LM) has led to significant improvements in several applications, including Automatic Speech Recognition. However, they typically require large amounts of training data, which is not available for many domains and languages. In this study, we propose a multilingual neural language model architecture, trained jointly on the domain-specific data of several low-resource languages. The proposed multilingual LM consists of language specific word embeddings in the encoder and decoder, and one language specific LSTM layer, plus two LSTM layers with shared parameters across the languages. This multilingual LM model facilitates transfer learning across the languages, acting as an extra regularizer in very low-resource scenarios. We integrate our proposed multilingual approach with a state-of-the-art highly-regularized neural LM, and evaluate on the conversational data domain for four languages over a range of training data sizes. Compared to monolingual LMs, the results show significant improvements of our proposed multilingual LM when the amount of available training data is limited, indicating the advantages of cross-lingual parameter sharing in very low-resource language modeling.
△ Less
Submitted 29 May, 2019;
originally announced June 2019.
-
CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models
Authors:
Shubham Sharma,
Jette Henderson,
Joydeep Ghosh
Abstract:
As artificial intelligence plays an increasingly important role in our society, there are ethical and moral obligations for both businesses and researchers to ensure that their machine learning models are designed, deployed, and maintained responsibly. These models need to be rigorously audited for fairness, robustness, transparency, and interpretability. A variety of methods have been developed t…
▽ More
As artificial intelligence plays an increasingly important role in our society, there are ethical and moral obligations for both businesses and researchers to ensure that their machine learning models are designed, deployed, and maintained responsibly. These models need to be rigorously audited for fairness, robustness, transparency, and interpretability. A variety of methods have been developed that focus on these issues in isolation, however, managing these methods in conjunction with model development can be cumbersome and timeconsuming. In this paper, we introduce a unified and model-agnostic approach to address these issues: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models (CERTIFAI). Unlike previous methods in this domain, CERTIFAI is a general tool that can be applied to any black-box model and any type of input data. Given a model and an input instance, CERTIFAI uses a custom genetic algorithm to generate counterfactuals: instances close to the input that change the prediction of the model. We demonstrate how these counterfactuals can be used to examine issues of robustness, interpretability, transparency, and fairness. Additionally, we introduce CERScore, the first black-box model robustness score that performs comparably to methods that have access to model internals.
△ Less
Submitted 19 May, 2019;
originally announced May 2019.