Search | arXiv e-print repository

Contrastive learning of T cell receptor representations

Authors: Yuta Nagano, Andrew Pyo, Martina Milighetti, James Henderson, John Shawe-Taylor, Benny Chain, Andreas Tiffeau-Mayer

Abstract: Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein lan… ▽ More Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here we introduce a TCR language model called SCEPTR (Simple Contrastive Embedding of the Primary sequence of T cell Receptors), capable of data-efficient transfer learning. Through our model, we introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 19 pages, 17 figures

ACM Class: J.3; I.2.7

arXiv:2404.17142 [pdf, other]

Automated Quantum Circuit Generation for Computing Inverse Hash Functions

Authors: Elena R. Henderson, Jessie M. Henderson, William V. Oxford, Mitchell A. Thornton

Abstract: Several cryptographic systems depend upon the computational difficulty of reversing cryptographic hash functions. Robust hash functions transform inputs to outputs in such a way that the inputs cannot be later retrieved in a reasonable amount of time even if the outputs and the function that created them are known. Consequently, hash functions can be cryptographically secure, and they are employed… ▽ More Several cryptographic systems depend upon the computational difficulty of reversing cryptographic hash functions. Robust hash functions transform inputs to outputs in such a way that the inputs cannot be later retrieved in a reasonable amount of time even if the outputs and the function that created them are known. Consequently, hash functions can be cryptographically secure, and they are employed in encryption, authentication, and other security methods. It has been suggested that such cryptographically-secure hash functions will play a critical role in the era of post-quantum cryptography (PQC), as they do in conventional systems. In this work, we introduce a procedure that leverages the principle of reversibility to generate circuits that invert hash functions. We provide a proof-of-concept implementation and describe methods that allow for scaling the hash function inversion approach. Specifically, we implement one manifestation of the algorithm as part of a more general automated quantum circuit synthesis, compilation, and optimization toolkit. We illustrate production of reversible circuits for crypto-hash functions that inherently provide the inverse of the function, and we describe data structures that increase the scalability of the hash function inversion approach. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 12 pages, 9 figures, 1 table

arXiv:2404.12565 [pdf, other]

Limits on Inferring T-cell Specificity from Partial Information

Authors: James Henderson, Yuta Nagano, Martina Milighetti, Andreas Tiffeau-Mayer

Abstract: A key challenge in molecular biology is to decipher the map** of protein sequence to function. To perform this map** requires the identification of sequence features most informative about function. Here, we quantify the amount of information (in bits) that T-cell receptor (TCR) sequence features provide about antigen specificity. We identify informative features by their degree of conservatio… ▽ More A key challenge in molecular biology is to decipher the map** of protein sequence to function. To perform this map** requires the identification of sequence features most informative about function. Here, we quantify the amount of information (in bits) that T-cell receptor (TCR) sequence features provide about antigen specificity. We identify informative features by their degree of conservation among antigen-specific receptors relative to null expectations. We find that TCR specificity synergistically depends on the hypervariable regions of both receptor chains, with a degree of synergy that strongly depends on the ligand. Using a coincidence-based approach to measuring information enables us to directly bound the accuracy with which TCR specificity can be predicted from partial matches to reference sequences. We anticipate that our statistical framework will be of use for develo** machine learning models for TCR specificity prediction and for optimizing TCRs for cell therapies. The proposed coincidence-based information measures might find further applications in bounding the performance of pairwise classifiers in other fields. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 24 pages, 15 figures

arXiv:2404.02440 [pdf, other]

Designing a Photonic Physically Unclonable Function Having Resilience to Machine Learning Attacks

Authors: Elena R. Henderson, Jessie M. Henderson, Hiva Shahoei, William V. Oxford, Eric C. Larson, Duncan L. MacFarlane, Mitchell A. Thornton

Abstract: Physically unclonable functions (PUFs) are designed to act as device 'fingerprints.' Given an input challenge, the PUF circuit should produce an unpredictable response for use in situations such as root-of-trust applications and other hardware-level cybersecurity applications. PUFs are typically subcircuits present within integrated circuits (ICs), and while conventional IC PUFs are well-understoo… ▽ More Physically unclonable functions (PUFs) are designed to act as device 'fingerprints.' Given an input challenge, the PUF circuit should produce an unpredictable response for use in situations such as root-of-trust applications and other hardware-level cybersecurity applications. PUFs are typically subcircuits present within integrated circuits (ICs), and while conventional IC PUFs are well-understood, several implementations have proven vulnerable to malicious exploits, including those perpetrated by machine learning (ML)-based attacks. Such attacks can be difficult to prevent because they are often designed to work even when relatively few challenge-response pairs are known in advance. Hence the need for both more resilient PUF designs and analysis of ML-attack susceptibility. Previous work has developed a PUF for photonic integrated circuits (PICs). A PIC PUF not only produces unpredictable responses given manufacturing-introduced tolerances, but is also less prone to electromagnetic radiation eavesdrop** attacks than a purely electronic IC PUF. In this work, we analyze the resilience of the proposed photonic PUF when subjected to ML-based attacks. Specifically, we describe a computational PUF model for producing the large datasets required for training ML attacks; we analyze the quality of the model; and we discuss the modeled PUF's susceptibility to ML-based attacks. We find that the modeled PUF generates distributions that resemble uniform white noise, explaining the exhibited resilience to neural-network-based attacks designed to exploit latent relationships between challenges and responses. Preliminary analysis suggests that the PUF exhibits similar resilience to generative adversarial networks, and continued development will show whether more-sophisticated ML approaches better compromise the PUF and -- if so -- how design modifications might improve resilience. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 14 pages, 8 figures

arXiv:2403.01299 [pdf, other]

A Photonic Physically Unclonable Function's Resilience to Multiple-Valued Machine Learning Attacks

Authors: Jessie M. Henderson, Elena R. Henderson, Clayton A. Harper, Hiva Shahoei, William V. Oxford, Eric C. Larson, Duncan L. MacFarlane, Mitchell A. Thornton

Abstract: Physically unclonable functions (PUFs) identify integrated circuits using nonlinearly-related challenge-response pairs (CRPs). Ideally, the relationship between challenges and corresponding responses is unpredictable, even if a subset of CRPs is known. Previous work developed a photonic PUF offering improved security compared to non-optical counterparts. Here, we investigate this PUF's susceptibil… ▽ More Physically unclonable functions (PUFs) identify integrated circuits using nonlinearly-related challenge-response pairs (CRPs). Ideally, the relationship between challenges and corresponding responses is unpredictable, even if a subset of CRPs is known. Previous work developed a photonic PUF offering improved security compared to non-optical counterparts. Here, we investigate this PUF's susceptibility to Multiple-Valued-Logic-based machine learning attacks. We find that approximately 1,000 CRPs are necessary to train models that predict response bits better than random chance. Given the significant challenge of acquiring a vast number of CRPs from a photonic PUF, our results demonstrate photonic PUF resilience against such attacks. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures

arXiv:2312.00662 [pdf, other]

Nonparametric Variational Regularisation of Pretrained Transformers

Authors: Fabio Fehr, James Henderson

Abstract: The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their training data, and as a result the models perform poorly when the domain changes. Also, due to the model's scale, the cost of fine-tuning the model… ▽ More The current paradigm of large-scale pre-training and fine-tuning Transformer large language models has lead to significant improvements across the board in natural language processing. However, such large models are susceptible to overfitting to their training data, and as a result the models perform poorly when the domain changes. Also, due to the model's scale, the cost of fine-tuning the model to the new domain is large. Nonparametric Variational Information Bottleneck (NVIB) has been proposed as a regulariser for training cross-attention in Transformers, potentially addressing the overfitting problem. We extend the NVIB framework to replace all types of attention functions in Transformers, and show that existing pretrained Transformers can be reinterpreted as Nonparametric Variational (NV) models using a proposed identity initialisation. We then show that changing the initialisation introduces a novel, information-theoretic post-training regularisation in the attention mechanism, which improves out-of-domain generalisation without any training. This success supports the hypothesis that pretrained Transformers are implicitly NV Bayesian models. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.03611 [pdf, other]

Plug-and-Play Stability for Intracortical Brain-Computer Interfaces: A One-Year Demonstration of Seamless Brain-to-Text Communication

Authors: Chaofei Fan, Nick Hahn, Foram Kamdar, Donald Avansino, Guy H. Wilson, Leigh Hochberg, Krishna V. Shenoy, Jaimie M. Henderson, Francis R. Willett

Abstract: Intracortical brain-computer interfaces (iBCIs) have shown promise for restoring rapid communication to people with neurological disorders such as amyotrophic lateral sclerosis (ALS). However, to maintain high performance over time, iBCIs typically need frequent recalibration to combat changes in the neural recordings that accrue over days. This requires iBCI users to stop using the iBCI and engag… ▽ More Intracortical brain-computer interfaces (iBCIs) have shown promise for restoring rapid communication to people with neurological disorders such as amyotrophic lateral sclerosis (ALS). However, to maintain high performance over time, iBCIs typically need frequent recalibration to combat changes in the neural recordings that accrue over days. This requires iBCI users to stop using the iBCI and engage in supervised data collection, making the iBCI system hard to use. In this paper, we propose a method that enables self-recalibration of communication iBCIs without interrupting the user. Our method leverages large language models (LMs) to automatically correct errors in iBCI outputs. The self-recalibration process uses these corrected outputs ("pseudo-labels") to continually update the iBCI decoder online. Over a period of more than one year (403 days), we evaluated our Continual Online Recalibration with Pseudo-labels (CORP) framework with one clinical trial participant. CORP achieved a stable decoding accuracy of 93.84% in an online handwriting iBCI task, significantly outperforming other baseline methods. Notably, this is the longest-running iBCI stability demonstration involving a human participant. Our results provide the first evidence for long-term stabilization of a plug-and-play, high-performance communication iBCI, addressing a major barrier for the clinical translation of iBCIs. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02339 [pdf, other]

Progress Metrics in DAG-based Consensus

Authors: Quan Nguyen, James Henderson, Egor Lysenko

Abstract: Lachesis protocol~\cite{lachesis2021} leverages a DAG of events to allow nodes to reach fast consensus of events. This work introduces DAG progress metrics to drive the nodes to emit new events more effectively. With these metrics, nodes can select event timing and can choose previous events as parents for their own new events. Our results show that our event timing and parent selection methods ca… ▽ More Lachesis protocol~\cite{lachesis2021} leverages a DAG of events to allow nodes to reach fast consensus of events. This work introduces DAG progress metrics to drive the nodes to emit new events more effectively. With these metrics, nodes can select event timing and can choose previous events as parents for their own new events. Our results show that our event timing and parent selection methods can help reaching consensus quicker and thus can reduce lower time to finality significantly. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2311.02096 [pdf, other]

Variational Autoencoders for Noise Reduction in Industrial LLRF Systems

Authors: J. P. Edelen, M. J. Henderson, J. Einstein-Curtis, C. C. Hall, J. A. Diaz Cruz, A. L. Edelen

Abstract: Industrial particle accelerators inherently operate in much dirtier environments than typical research accelerators. This leads to an increase in noise both in the RF system and in other electronic systems. Combined with the fact that industrial accelerators are mass produced, there is less attention given to optimizing the performance of an individual system. As a result, industrial systems tend… ▽ More Industrial particle accelerators inherently operate in much dirtier environments than typical research accelerators. This leads to an increase in noise both in the RF system and in other electronic systems. Combined with the fact that industrial accelerators are mass produced, there is less attention given to optimizing the performance of an individual system. As a result, industrial systems tend to under perform considering their hardware hardware capabilities. With the growing demand for accelerators for medical sterilization, food irradiation, cancer treatment, and imaging, improving the signal processing of these machines will increase the margin for the deployment of these systems. Our work is focusing on using machine learning techniques to reduce the noise of RF signals used for pulse-to-pulse feedback in industrial accelerators. We will review our algorithms, simulation results, and results working with measured data. We will then discuss next steps for deployment and testing on an industrial system. △ Less

Submitted 7 November, 2023; v1 submitted 29 October, 2023; originally announced November 2023.

Comments: Talk presented at LLRF Workshop 2023 (LLRF2023, arXiv: 2310.03199)

Report number: LLRF2023/97

arXiv:2310.17936 [pdf, other]

Transformers as Graph-to-Graph Models

Authors: James Henderson, Alireza Mohammadshahi, Andrei C. Coman, Lesly Miculicich

Abstract: We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs… ▽ More We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pretrained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimise the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modelling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: Accepted to Big Picture workshop at EMNLP 2023

arXiv:2310.17284 [pdf, other]

Learning to Abstract with Nonparametric Variational Information Bottleneck

Authors: Melika Behjati, Fabio Fehr, James Henderson

Abstract: Learned representations at the level of characters, sub-words, words and sentences, have each contributed to advances in understanding different NLP tasks and linguistic phenomena. However, learning textual embeddings is costly as they are tokenization specific and require different models to be trained for each level of abstraction. We introduce a novel language representation model which can lea… ▽ More Learned representations at the level of characters, sub-words, words and sentences, have each contributed to advances in understanding different NLP tasks and linguistic phenomena. However, learning textual embeddings is costly as they are tokenization specific and require different models to be trained for each level of abstraction. We introduce a novel language representation model which can learn to compress to different levels of abstraction at different layers of the same model. We apply Nonparametric Variational Information Bottleneck (NVIB) to stacked Transformer self-attention layers in the encoder, which encourages an information-theoretic compression of the representations through the model. We find that the layers within the model correspond to increasing levels of abstraction and that their representations are more linguistically informed. Finally, we show that NVIB compression results in a model which is more robust to adversarial perturbations. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of EMNLP 2023

arXiv:2308.14423 [pdf, other]

GADePo: Graph-Assisted Declarative Pooling Transformers for Document-Level Relation Extraction

Authors: Andrei C. Coman, Christos Theodoropoulos, Marie-Francine Moens, James Henderson

Abstract: Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph rela… ▽ More Document-level relation extraction typically relies on text-based encoders and hand-coded pooling heuristics to aggregate information learned by the encoder. In this paper, we leverage the intrinsic graph processing capabilities of the Transformer model and propose replacing hand-coded pooling methods with new tokens in the input, which are designed to aggregate information via explicit graph relations in the computation of attention weights. We introduce a joint text-graph Transformer model and a graph-assisted declarative pooling (GADePo) specification of the input, which provides explicit and high-level instructions for information aggregation. GADePo allows the pooling process to be guided by domain-specific knowledge or desired outcomes but still learned by the Transformer, leading to more flexible and customisable pooling strategies. We evaluate our method across diverse datasets and models and show that our approach yields promising results that are consistently better than those achieved by the hand-coded pooling functions. △ Less

Submitted 18 June, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted to KnowledgeNLP workshop at ACL 2024

arXiv:2305.08379 [pdf, other]

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Authors: Rabeeh Karimi Mahabadi, Hamish Ivison, Jaesung Tae, James Henderson, Iz Beltagy, Matthew E. Peters, Arman Cohan

Abstract: Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete nature and the need for a large number of diffusion steps to generate text, making diffusion-based generation expensive. In this work, we propose Text-to-text Self-c… ▽ More Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete nature and the need for a large number of diffusion steps to generate text, making diffusion-based generation expensive. In this work, we propose Text-to-text Self-conditioned Simplex Diffusion (TESS), a text diffusion model that is fully non-autoregressive, employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space. Through extensive experiments on natural language understanding and generation tasks including summarization, text simplification, paraphrase generation, and question generation, we demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models. We publicly release our codebase at https://github.com/allenai/tess-diffusion. △ Less

Submitted 20 February, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: EACL 2024

arXiv:2211.09860 [pdf, other]

Automated Quantum Memory Compilation with Improved Dynamic Range

Authors: Aviraj Sinha, Elena R. Henderson, Jessie M. Henderson, Mitchell A. Thornton

Abstract: Emerging quantum algorithms that process data require that classical input data be represented as a quantum state. These data-processing algorithms often follow the gate model of quantum computing--which requires qubits to be initialized to a basis state, typically $\lvert 0 \rangle$--and thus often employ state generation circuits to transform the initialized basis state to a data-representation… ▽ More Emerging quantum algorithms that process data require that classical input data be represented as a quantum state. These data-processing algorithms often follow the gate model of quantum computing--which requires qubits to be initialized to a basis state, typically $\lvert 0 \rangle$--and thus often employ state generation circuits to transform the initialized basis state to a data-representation state. There are many ways to encode classical data in a qubit, and the oft-applied approach of basis encoding does not allow optimization to the extent that other variants do. In this work, we thus consider automatic synthesis of addressable, quantum read-only memory (QROM) circuits, which act as data-encoding state-generation circuits. We investigate three data encoding approaches, one of which we introduce to provide improved dynamic range and precision. We present experimental results that compare these encoding methods for QROM synthesis to better understand the implications of and applications for each. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 14 pages, 9 figures, and 13 tables

arXiv:2211.01482 [pdf, other]

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

Authors: Alireza Mohammadshahi, Thomas Scialom, Majid Yazdani, Pouya Yanki, Angela Fan, James Henderson, Marzieh Saeidi

Abstract: Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided refer… ▽ More Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer modules, using pre-trained models from existing literature, thus it can be used without any further training. We demonstrate that RQUGE has a higher correlation with human judgment without relying on the reference question. Additionally, RQUGE is shown to be more robust to several adversarial corruptions. Furthermore, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on synthetic data generated by a question generation model and re-ranked by RQUGE. △ Less

Submitted 26 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted to Findings of ACL 2023

arXiv:2210.14702 [pdf, other]

Privacy Analysis of Samsung's Crowd-Sourced Bluetooth Location Tracking System

Authors: Tingfeng Yu, James Henderson, Alwen Tiu, Thomas Haines

Abstract: We present a detailed privacy analysis of Samsung's Offline Finding (OF) protocol, which is part of Samsung's Find My Mobile (FMM) location tracking system for locating Samsung mobile devices, such as Samsung smartphones and Bluetooth trackers (Galaxy SmartTags). The OF protocol uses Bluetooth Low Energy (BLE) to broadcast a unique beacon for a lost device. This beacon is then picked up by nearby… ▽ More We present a detailed privacy analysis of Samsung's Offline Finding (OF) protocol, which is part of Samsung's Find My Mobile (FMM) location tracking system for locating Samsung mobile devices, such as Samsung smartphones and Bluetooth trackers (Galaxy SmartTags). The OF protocol uses Bluetooth Low Energy (BLE) to broadcast a unique beacon for a lost device. This beacon is then picked up by nearby Samsung phones or tablets (the {\em finder} devices), which then forward the unique beacon, along with the location it was detected at, to a Samsung managed server. The owner of a lost device can then query the server to locate their device. We examine several security and privacy related properties of the OF protocol and its implementation, from the perspectives of the owner, the finder and the vendor. These include examining: the possibility of identifying the owner of a device through the Bluetooth data obtained from the device, the possibility for a malicious actor to perform unwanted tracking against a person by exploiting the OF network, the possibility for the vendor to de-anonymise location reports to determine the locations of the owners or the finders of lost devices, and the possibility for an attacker to compromise the integrity of the location reports. Our findings suggest that there are privacy risks on all accounts, arising from issues in the design and the implementation of the OF protocol. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.11685 [pdf, other]

doi 10.1038/s41598-023-29643-4

Quantum Algorithms for Geologic Fracture Networks

Authors: Jessie M. Henderson, Marianna Podzorova, M. Cerezo, John K. Golden, Leonard Gleyzer, Hari S. Viswanathan, Daniel O'Malley

Abstract: Solving large systems of equations is a challenge for modeling natural phenomena, such as simulating subsurface flow. To avoid systems that are intractable on current computers, it is often necessary to neglect information at small scales, an approach known as coarse-graining. For many practical applications, such as flow in porous, homogenous materials, coarse-graining offers a sufficiently-accur… ▽ More Solving large systems of equations is a challenge for modeling natural phenomena, such as simulating subsurface flow. To avoid systems that are intractable on current computers, it is often necessary to neglect information at small scales, an approach known as coarse-graining. For many practical applications, such as flow in porous, homogenous materials, coarse-graining offers a sufficiently-accurate approximation of the solution. Unfortunately, fractured systems cannot be accurately coarse-grained, as critical network topology exists at the smallest scales, including topology that can push the network across a percolation threshold. Therefore, new techniques are necessary to accurately model important fracture systems. Quantum algorithms for solving linear systems offer a theoretically-exponential improvement over their classical counterparts, and in this work we introduce two quantum algorithms for fractured flow. The first algorithm, designed for future quantum computers which operate without error, has enormous potential, but we demonstrate that current hardware is too noisy for adequate performance. The second algorithm, designed to be noise resilient, already performs well for problems of small to medium size (order 10 to 1000 nodes), which we demonstrate experimentally and explain theoretically. We expect further improvements by leveraging quantum error mitigation and preconditioning. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 20 pages, 12 figures

Report number: LA-UR-22-29135

Journal ref: Sci Rep 13, 2906 (2023)

arXiv:2210.11621 [pdf, other]

SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages

Authors: Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier

Abstract: In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the "curse of multilinguality", these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introd… ▽ More In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the "curse of multilinguality", these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introduce SMaLL-100, a distilled version of the M2M-100 (12B) model, a massively multilingual machine translation model covering 100 languages. We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages. We evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba, and TICO-19 and demonstrate that it outperforms previous massively multilingual models of comparable sizes (200-600M) while improving inference latency and memory usage. Additionally, our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference. Code and pre-trained models: https://github.com/alirezamshi/small100 △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: Accepted to EMNLP 2022

Journal ref: https://aclanthology.org/2022.emnlp-main.571

arXiv:2210.06578 [pdf, other]

FASTER-CE: Fast, Sparse, Transparent, and Robust Counterfactual Explanations

Authors: Shubham Sharma, Alan H. Gee, Jette Henderson, Joydeep Ghosh

Abstract: Counterfactual explanations have substantially increased in popularity in the past few years as a useful human-centric way of understanding individual black-box model predictions. While several properties desired of high-quality counterfactuals have been identified in the literature, three crucial concerns: the speed of explanation generation, robustness/sensitivity and succinctness of explanation… ▽ More Counterfactual explanations have substantially increased in popularity in the past few years as a useful human-centric way of understanding individual black-box model predictions. While several properties desired of high-quality counterfactuals have been identified in the literature, three crucial concerns: the speed of explanation generation, robustness/sensitivity and succinctness of explanations (sparsity) have been relatively unexplored. In this paper, we present FASTER-CE: a novel set of algorithms to generate fast, sparse, and robust counterfactual explanations. The key idea is to efficiently find promising search directions for counterfactuals in a latent space that is specified via an autoencoder. These directions are determined based on gradients with respect to each of the original input features as well as of the target, as estimated in the latent space. The ability to quickly examine combinations of the most promising gradient directions as well as to incorporate additional user-defined constraints allows us to generate multiple counterfactual explanations that are sparse, realistic, and robust to input manipulations. Through experiments on three datasets of varied complexities, we show that FASTER-CE is not only much faster than other state of the art methods for generating multiple explanations but also is significantly superior when considering a larger set of desirable (and often conflicting) properties. Specifically we present results across multiple performance metrics: sparsity, proximity, validity, speed of generation, and the robustness of explanations, to highlight the capabilities of the FASTER-CE family. △ Less

Submitted 12 October, 2022; originally announced October 2022.

arXiv:2210.04995 [pdf, other]

FEAMOE: Fair, Explainable and Adaptive Mixture of Experts

Authors: Shubham Sharma, Jette Henderson, Joydeep Ghosh

Abstract: Three key properties that are desired of trustworthy machine learning models deployed in high-stakes environments are fairness, explainability, and an ability to account for various kinds of "drift". While drifts in model accuracy, for example due to covariate shift, have been widely investigated, drifts in fairness metrics over time remain largely unexplored. In this paper, we propose FEAMOE, a n… ▽ More Three key properties that are desired of trustworthy machine learning models deployed in high-stakes environments are fairness, explainability, and an ability to account for various kinds of "drift". While drifts in model accuracy, for example due to covariate shift, have been widely investigated, drifts in fairness metrics over time remain largely unexplored. In this paper, we propose FEAMOE, a novel "mixture-of-experts" inspired framework aimed at learning fairer, more explainable/interpretable models that can also rapidly adjust to drifts in both the accuracy and the fairness of a classifier. We illustrate our framework for three popular fairness measures and demonstrate how drift can be handled with respect to these fairness constraints. Experiments on multiple datasets show that our framework as applied to a mixture of linear experts is able to perform comparably to neural networks in terms of accuracy while producing fairer models. We then use the large-scale HMDA dataset and show that while various models trained on HMDA demonstrate drift with respect to both accuracy and fairness, FEAMOE can ably handle these drifts with respect to all the considered fairness measures and maintain model accuracy as well. We also prove that the proposed framework allows for producing fast Shapley value explanations, which makes computationally efficient feature attribution based explanations of model decisions readily available via FEAMOE. △ Less

Submitted 10 October, 2022; originally announced October 2022.

arXiv:2208.01710 [pdf, other]

Smart Visual Beacons with Asynchronous Optical Communications using Event Cameras

Authors: Ziwei Wang, Yonhon Ng, Jack Henderson, Robert Mahony

Abstract: Event cameras are bio-inspired dynamic vision sensors that respond to changes in image intensity with a high temporal resolution, high dynamic range and low latency. These sensor characteristics are ideally suited to enable visual target tracking in concert with a broadcast visual communication channel for smart visual beacons with applications in distributed robotics. Visual beacons can be constr… ▽ More Event cameras are bio-inspired dynamic vision sensors that respond to changes in image intensity with a high temporal resolution, high dynamic range and low latency. These sensor characteristics are ideally suited to enable visual target tracking in concert with a broadcast visual communication channel for smart visual beacons with applications in distributed robotics. Visual beacons can be constructed by high-frequency modulation of Light Emitting Diodes (LEDs) such as vehicle headlights, Internet of Things (IoT) LEDs, smart building lights, etc., that are already present in many real-world scenarios. The high temporal resolution characteristic of the event cameras allows them to capture visual signals at far higher data rates compared to classical frame-based cameras. In this paper, we propose a novel smart visual beacon architecture with both LED modulation and event camera demodulation algorithms. We quantitatively evaluate the relationship between LED transmission rate, communication distance and the message transmission accuracy for the smart visual beacon communication system that we prototyped. The proposed method achieves up to 4 kbps in an indoor environment and lossless transmission over a distance of 100 meters, at a transmission rate of 500 bps, in full sunlight, demonstrating the potential of the technology in an outdoor environment. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: 7 pages, 8 figures, accepted by IEEE International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2207.13529 [pdf, other]

A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

Authors: James Henderson, Fabio Fehr

Abstract: We propose a VAE for Transformers by develo** a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components s… ▽ More We propose a VAE for Transformers by develo** a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components supported by nonparametric methods captures the variable number of vectors supported by attention, and the exchangeability of our nonparametric distributions captures the permutation invariance of attention. This allows NVIB to regularise the number of vectors accessible with attention, as well as the amount of information in individual vectors. By regularising the cross-attention of a Transformer encoder-decoder with NVIB, we propose a nonparametric variational autoencoder (NVAE). Initial experiments on training a NVAE on natural language text show that the induced embedding space has the desired properties of a VAE for Transformers. △ Less

Submitted 12 August, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: 33 pages, 10 figures, 3 tables. First time this work has been made public

arXiv:2205.11456 [pdf, other]

Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers

Authors: Luis Espinosa-Anke, Alexander Shvets, Alireza Mohammadshahi, James Henderson, Leo Wanner

Abstract: Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collo… ▽ More Recognizing and categorizing lexical collocations in context is useful for language learning, dictionary compilation and downstream NLP. However, it is a challenging task due to the varying degrees of frozenness lexical collocations exhibit. In this paper, we put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: Accepted to *SEM2022

arXiv:2205.10828 [pdf, other]

What Do Compressed Multilingual Machine Translation Models Forget?

Authors: Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier

Abstract: Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performa… ▽ More Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performance averaged across multiple tasks and/or languages may hide a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the models. In this work, we assess the impact of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups, gender, and semantic biases by extensive analysis of compressed models on different machine translation benchmarks, i.e. FLORES-101, MT-Gender, and DiBiMT. We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases. Interestingly, the removal of noisy memorization with compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages. Code: https://github.com/alirezamshi/bias-compressedMT △ Less

Submitted 27 June, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

Comments: Accepted to Findings of EMNLP 2022, presented at WMT 2022

Journal ref: https://aclanthology.org/2022.findings-emnlp.317/

arXiv:2204.01172 [pdf, other]

PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

Authors: Rabeeh Karimi Mahabadi, Luke Zettlemoyer, James Henderson, Marzieh Saeidi, Lambert Mathias, Veselin Stoyanov, Majid Yazdani

Abstract: Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as… ▽ More Current methods for few-shot fine-tuning of pretrained masked language models (PLMs) require carefully engineered prompts and verbalizers for each new task to convert examples into a cloze-format that the PLM can score. In this work, we propose PERFECT, a simple and efficient method for few-shot fine-tuning of PLMs without relying on any such handcrafting, which is highly effective given as few as 32 data points. PERFECT makes two key design choices: First, we show that manually engineered task prompts can be replaced with task-specific adapters that enable sample-efficient fine-tuning and reduce memory and storage costs by roughly factors of 5 and 100, respectively. Second, instead of using handcrafted verbalizers, we learn new multi-token label embeddings during fine-tuning, which are not tied to the model vocabulary and which allow us to avoid complex auto-regressive decoding. These embeddings are not only learnable from limited data but also enable nearly 100x faster training and inference. Experiments on a wide range of few-shot NLP tasks demonstrate that PERFECT, while being simple and efficient, also outperforms existing state-of-the-art few-shot learning methods. Our code is publicly available at https://github.com/facebookresearch/perfect.git. △ Less

Submitted 25 April, 2022; v1 submitted 3 April, 2022; originally announced April 2022.

Comments: ACL, 2022

arXiv:2203.16574 [pdf, other]

Graph Refinement for Coreference Resolution

Authors: Lesly Miculicich, James Henderson

Abstract: The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the g… ▽ More The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the graph in a non-autoregressive manner, then iteratively refines it based on previous predictions, allowing global dependencies between decisions. The experimental results show improvements over various baselines, reinforcing the hypothesis that document-level information improves conference resolution. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.03691 [pdf, other]

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Authors: Florian Mai, Arnaud Pannatier, Fabio Fehr, Haolin Chen, Francois Marelli, Francois Fleuret, James Henderson

Abstract: Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token m… ▽ More Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning. △ Less

Submitted 13 November, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: Published at ACL 2023

Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

arXiv:2110.07002 [pdf, other]

Bag-of-Vectors Autoencoders for Unsupervised Conditional Text Generation

Authors: Florian Mai, James Henderson

Abstract: Text autoencoders are often used for unsupervised conditional text generation by applying map**s in the latent space to change attributes to the desired values. Recently, Mai et al. (2020) proposed Emb2Emb, a method to learn these map**s in the embedding space of an autoencoder. However, their method is restricted to autoencoders with a single-vector embedding, which limits how much informatio… ▽ More Text autoencoders are often used for unsupervised conditional text generation by applying map**s in the latent space to change attributes to the desired values. Recently, Mai et al. (2020) proposed Emb2Emb, a method to learn these map**s in the embedding space of an autoencoder. However, their method is restricted to autoencoders with a single-vector embedding, which limits how much information can be retained. We address this issue by extending their method to Bag-of-Vectors Autoencoders (BoV-AEs), which encode the text into a variable-size bag of vectors that grows with the size of the text, as in attention-based models. This allows to encode and reconstruct much longer texts than standard autoencoders. Analogous to conventional autoencoders, we propose regularization techniques that facilitate learning meaningful operations in the latent space. Finally, we adapt Emb2Emb for a training scheme that learns to map an input bag to an output bag, including a novel loss function and neural architecture. Our empirical evaluations on unsupervised sentiment transfer show that our method performs substantially better than a standard autoencoder. △ Less

Submitted 4 February, 2023; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: Published at AACL 2022

Journal ref: In Proceedings of AACL/IJCNLP 2022, pages 468-488. Association of Computational Linguistics (2022)

arXiv:2109.00840 [pdf, other]

doi 10.18653/v1/2021.conll-1.27

Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

Authors: Christos Theodoropoulos, James Henderson, Andrei C. Coman, Marie-Francine Moens

Abstract: Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive… ▽ More Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive learning to impose relation-related structure on the token-level representations of the sentence obtained with a CharacterBERT (El Boukkouri et al.,2020) model. The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task using only a simple KNN classifier, thereby demonstrating the success of the proposed method. Additional visualization by a tSNE analysis shows the effectiveness of the learned representation space compared to baselines. Furthermore, we show that we can learn a different space for named entity recognition, again using a contrastive learning objective, and demonstrate how to successfully combine both representation spaces in an entity-relation task. △ Less

Submitted 4 September, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: To be presented at CoNLL 2021

Journal ref: Conference: 2021 Proceedings of the 25th Conference on Computational Natural Language Learning

arXiv:2107.01982 [pdf, other]

The DCU-EPFL Enhanced Dependency Parser at the IWPT 2021 Shared Task

Authors: James Barry, Alireza Mohammadshahi, Joachim Wagner, Jennifer Foster, James Henderson

Abstract: We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from ea… ▽ More We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings. Our approach uses the Stanza pipeline to preprocess the text files, XLMRoBERTa to obtain contextualized token representations, and an edge-scoring and labeling model to predict the enhanced graph. Finally, we run a post-processing script to ensure all of our outputs are valid Enhanced UD graphs. Our system places 6th out of 9 participants with a coarse Enhanced Labeled Attachment Score (ELAS) of 83.57. We carry out additional post-deadline experiments which include using Trankit for pre-processing, XLM-RoBERTa-LARGE, treebank concatenation, and multitask learning between a basic and an enhanced dependency parser. All of these modifications improve our initial score and our final system has a coarse ELAS of 88.04. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: Submitted to the IWPT 2021 Shared Task: From Raw Text to Enhanced Universal Dependencies: the Parsing Shared Task at IWPT 2021

arXiv:2106.05469 [pdf, other]

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

Authors: Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Abstract: While large-scale pretrained language models have obtained impressive results when fine-tuned on a wide variety of tasks, they still often suffer from overfitting in low-resource scenarios. Since such models are general-purpose feature extractors, many of these features are inevitably irrelevant for a given target task. We propose to use Variational Information Bottleneck (VIB) to suppress irrelev… ▽ More While large-scale pretrained language models have obtained impressive results when fine-tuned on a wide variety of tasks, they still often suffer from overfitting in low-resource scenarios. Since such models are general-purpose feature extractors, many of these features are inevitably irrelevant for a given target task. We propose to use Variational Information Bottleneck (VIB) to suppress irrelevant features when fine-tuning on low-resource target tasks, and show that our method successfully reduces overfitting. Moreover, we show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets, and thereby obtains better generalization to out-of-domain datasets. Evaluation on seven low-resource datasets in different tasks shows that our method significantly improves transfer learning in low-resource scenarios, surpassing prior work. Moreover, it improves generalization on 13 out of 15 out-of-domain natural language inference benchmarks. Our code is publicly available in https://github.com/rabeehk/vibert. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: ICLR, 2021

arXiv:2106.04647 [pdf, other]

Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Authors: Rabeeh Karimi Mahabadi, James Henderson, Sebastian Ruder

Abstract: Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent wor… ▽ More Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared "slow" weights and "fast" rank-one matrices defined per Compacter layer. By only training 0.047% of a pretrained model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms standard fine-tuning on SuperGLUE and low-resource settings. Our code is publicly available at~\url{https://github.com/rabeehk/compacter}. △ Less

Submitted 27 November, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: accepted in NeurIPS, 2021

arXiv:2106.04489 [pdf, other]

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Authors: Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, James Henderson

Abstract: State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which co… ▽ More State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model. This parameter-efficient multi-task learning framework allows us to achieve the best of both worlds by sharing knowledge across tasks via hypernetworks while enabling the model to adapt to each individual task through task-specific adapters. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task. We additionally demonstrate substantial performance improvements in few-shot domain generalization across a variety of tasks. Our code is publicly available in https://github.com/rabeehk/hyperformer. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: accepted in ACL, 2021

arXiv:2106.02623 [pdf, other]

The Closer You Look, The More You Learn: A Grey-box Approach to Protocol State Machine Learning

Authors: Chris McMahon Stone, Sam L. Thomas, Mathy Vanhoef, James Henderson, Nicolas Bailluet, Tom Chothia

Abstract: In this paper, we propose a new approach to infer state machine models from protocol implementations. Our method, STATEINSPECTOR, learns protocol states by using novel program analyses to combine observations of run-time memory and I/O. It requires no access to source code and only lightweight execution monitoring of the implementation under test. We demonstrate and evaluate STATEINSPECTOR's effec… ▽ More In this paper, we propose a new approach to infer state machine models from protocol implementations. Our method, STATEINSPECTOR, learns protocol states by using novel program analyses to combine observations of run-time memory and I/O. It requires no access to source code and only lightweight execution monitoring of the implementation under test. We demonstrate and evaluate STATEINSPECTOR's effectiveness on numerous TLS and WPA/2 implementations. In the process, we show STATEINSPECTOR enables deeper state discovery, increased learning efficiency, and more insightful post-mortem analyses than existing approaches. Further to improved learning, our method led us to discover several concerning deviations from the standards and a high impact vulnerability in a prominent Wi-Fi implementation. △ Less

Submitted 7 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2104.07704 [pdf, other]

Syntax-Aware Graph-to-Graph Transformer for Semantic Role Labelling

Authors: Alireza Mohammadshahi, James Henderson

Abstract: Recent models have shown that incorporating syntactic knowledge into the semantic role labelling (SRL) task leads to a significant improvement. In this paper, we propose Syntax-aware Graph-to-Graph Transformer (SynG2G-Tr) model, which encodes the syntactic structure using a novel way to input graph relations as embeddings, directly into the self-attention mechanism of Transformer. This approach ad… ▽ More Recent models have shown that incorporating syntactic knowledge into the semantic role labelling (SRL) task leads to a significant improvement. In this paper, we propose Syntax-aware Graph-to-Graph Transformer (SynG2G-Tr) model, which encodes the syntactic structure using a novel way to input graph relations as embeddings, directly into the self-attention mechanism of Transformer. This approach adds a soft bias towards attention patterns that follow the syntactic structure but also allows the model to use this information to learn alternative patterns. We evaluate our model on both span-based and dependency-based SRL datasets, and outperform previous alternative methods in both in-domain and out-of-domain settings, on CoNLL 2005 and CoNLL 2009 datasets. △ Less

Submitted 2 June, 2023; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: Accepted to Rep4NLP at ACL 2023

arXiv:2104.05897 [pdf, other]

Inertial Collaborative Localisation for Autonomous Vehicles using a Minimum Energy Filter

Authors: Jack Henderson, Mohammad Zamani, Robert Mahony, Jochen Trumpf

Abstract: Collaborative Localisation has been studied extensively in recent years as a way to improve pose estimation of unmanned aerial vehicles in challenging environments. However little attention has been paid toward advancing the underlying filter design beyond standard Extended Kalman Filter-based approaches. In this paper, we detail a discrete-time collaborative localisation filter using the determin… ▽ More Collaborative Localisation has been studied extensively in recent years as a way to improve pose estimation of unmanned aerial vehicles in challenging environments. However little attention has been paid toward advancing the underlying filter design beyond standard Extended Kalman Filter-based approaches. In this paper, we detail a discrete-time collaborative localisation filter using the deterministic minimum-energy framework. The filter incorporates measurements from an inertial measurement unit and models the effects of sensor bias and gravitational acceleration. We present a simulation based on real-world vehicle trajectories and IMU data that demonstrates how collaborative localisation can improve performance over single-vehicle methods. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: Submitted to IEEE 2021 Conference on Decision and Control (CDC2021)

arXiv:2102.01223 [pdf, other]

Inducing Meaningful Units from Character Sequences with Dynamic Capacity Slot Attention

Authors: Melika Behjati, James Henderson

Abstract: Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaningful units in a sequence of characters. Rather than segmenting the sequence, our Dynamic Capacity Slot Attention model discovers continuous representations of the objects in the sequence, extending an architecture for object discovery in images. We train ou… ▽ More Characters do not convey meaning, but sequences of characters do. We propose an unsupervised distributional method to learn the abstract meaningful units in a sequence of characters. Rather than segmenting the sequence, our Dynamic Capacity Slot Attention model discovers continuous representations of the objects in the sequence, extending an architecture for object discovery in images. We train our model on different languages and evaluate the quality of the obtained representations with forward and reverse probing classifiers. These experiments show that our model succeeds in discovering units which are similar to those proposed previously in form, content and level of abstraction, and which show promise for capturing meaningful information at a higher level of abstraction. △ Less

Submitted 16 January, 2024; v1 submitted 1 February, 2021; originally announced February 2021.

Comments: Accepted to TMLR 2023

arXiv:2010.08432 [pdf, other]

Multi-Adversarial Learning for Cross-Lingual Word Embeddings

Authors: Haozhou Wang, James Henderson, Paola Merlo

Abstract: Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings -- maps of matching words across languages -- without supervision. Despite these successes, GANs' performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs' incorrect assumption that source and target embedding spaces are related by… ▽ More Generative adversarial networks (GANs) have succeeded in inducing cross-lingual word embeddings -- maps of matching words across languages -- without supervision. Despite these successes, GANs' performance for the difficult case of distant languages is still not satisfactory. These limitations have been explained by GANs' incorrect assumption that source and target embedding spaces are related by a single linear map** and are approximately isomorphic. We assume instead that, especially across distant languages, the map** is only piece-wise linear, and propose a multi-adversarial learning method. This novel method induces the seed cross-lingual dictionary through multiple map**s, each induced to fit the map** for one subspace. Our experiments on unsupervised bilingual lexicon induction show that this method improves performance over previous single-map** methods, especially for distant languages. △ Less

Submitted 25 August, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

arXiv:2010.02983 [pdf, other]

Plug and Play Autoencoders for Conditional Text Generation

Authors: Florian Mai, Nikolaos Pappas, Ivan Montero, Noah A. Smith, James Henderson

Abstract: Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a map** within the autoencoder's embedding space, training embedding-to-embedding (Emb2Emb). This reduces the need for labeled training data for the task and makes the training procedure mo… ▽ More Text autoencoders are commonly used for conditional generation tasks such as style transfer. We propose methods which are plug and play, where any pretrained autoencoder can be used, and only require learning a map** within the autoencoder's embedding space, training embedding-to-embedding (Emb2Emb). This reduces the need for labeled training data for the task and makes the training procedure more efficient. Crucial to the success of this method is a loss term for kee** the mapped embedding on the manifold of the autoencoder and a map** which is trained to navigate the manifold by learning offset vectors. Evaluations on style transfer tasks both with and without sequence-to-sequence supervision show that our method performs better than or comparable to strong baselines while being up to four times faster. △ Less

Submitted 12 October, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

Comments: To be published in EMNLP 2020

arXiv:2009.04630 [pdf, other]

doi 10.1109/CDC42340.2020.9303730

A Minimum Energy Filter for Localisation of an Unmanned Aerial Vehicle

Authors: Jack Henderson, Mohammad Zamani, Robert Mahony, Jochen Trumpf

Abstract: Accurate localisation of unmanned aerial vehicles is vital for the next generation of automation tasks. This paper proposes a minimum energy filter for velocity-aided pose estimation on the extended special Euclidean group. The approach taken exploits the Lie-group symmetry of the problem to combine Inertial Measurement Unit (IMU) sensor output with landmark measurements into a robust and high per… ▽ More Accurate localisation of unmanned aerial vehicles is vital for the next generation of automation tasks. This paper proposes a minimum energy filter for velocity-aided pose estimation on the extended special Euclidean group. The approach taken exploits the Lie-group symmetry of the problem to combine Inertial Measurement Unit (IMU) sensor output with landmark measurements into a robust and high performance state estimate. We propose an asynchronous discrete-time implementation to fuse high bandwidth IMU with low bandwidth discrete-time landmark measurements typical of real-world scenarios. The filter's performance is demonstrated by simulation. △ Less

Submitted 9 September, 2020; originally announced September 2020.

Comments: To be presented at the 59th IEEE Conference on Decision and Control (CDC), 14-18 December 2020

arXiv:2006.14621 [pdf, other]

Understanding collections of related datasets using dependent MMD coresets

Authors: Sinead A. Williamson, Jette Henderson

Abstract: Understanding how two datasets differ can help us determine whether one dataset under-represents certain sub-populations, and provides insights into how well models will generalize across datasets. Representative points selected by a maximum mean discrepency (MMD) coreset can provide interpretable summaries of a single dataset, but are not easily compared across datasets. In this paper we introduc… ▽ More Understanding how two datasets differ can help us determine whether one dataset under-represents certain sub-populations, and provides insights into how well models will generalize across datasets. Representative points selected by a maximum mean discrepency (MMD) coreset can provide interpretable summaries of a single dataset, but are not easily compared across datasets. In this paper we introduce dependent MMD coresets, a data summarization method for collections of datasets that facilitates comparison of distributions. We show that dependent MMD coresets are useful for understanding multiple related datasets and understanding model generalization between such datasets. △ Less

Submitted 4 August, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

arXiv:2005.07303 [pdf, other]

doi 10.1016/j.ifacol.2020.12.1068

A Minimum Energy Filter for Distributed Multirobot Localisation

Authors: Jack Henderson, Jochen Trumpf, Mohammad Zamani

Abstract: We present a new approach to the cooperative localisation problem by applying the theory of minimum energy filtering. We consider the problem of estimating the pose of a group of mobile robots in an environment where robots can perceive fixed landmarks and neighbouring robots as well as share information with others over a communication channel. Whereas the vast majority of the existing literature… ▽ More We present a new approach to the cooperative localisation problem by applying the theory of minimum energy filtering. We consider the problem of estimating the pose of a group of mobile robots in an environment where robots can perceive fixed landmarks and neighbouring robots as well as share information with others over a communication channel. Whereas the vast majority of the existing literature applies some variant of a Kalman Filter, we derive a set of filter equations for the global state estimate based on the principle of minimum energy filtering. We show how the filter equations can be decoupled and the calculations distributed among the robots in the network without requiring a central processing node. Finally, we provide a demonstration of the filter's performance in simulation. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: To be published at 21st IFAC World Congress, Berlin, Germany, July 12-17, 2020

arXiv:2005.06420 [pdf, other]

The Unstoppable Rise of Computational Linguistics in Deep Learning

Authors: James Henderson

Abstract: In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of variable binding and its instantiation in attention-based models, and argue that Transformer is not a sequence model but an induced-structure mod… ▽ More In this paper, we trace the history of neural networks applied to natural language understanding tasks, and identify key contributions which the nature of language has made to the development of neural network architectures. We focus on the importance of variable binding and its instantiation in attention-based models, and argue that Transformer is not a sequence model but an induced-structure model. This perspective leads to predictions of the challenges facing research in deep learning architectures for natural language understanding. △ Less

Submitted 11 June, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

Comments: 13 pages. Accepted for publication at ACL 2020, in the theme track

arXiv:2003.13118 [pdf, other]

doi 10.1162/tacl_a_00358

Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement

Authors: Alireza Mohammadshahi, James Henderson

Abstract: We propose the Recursive Non-autoregressive Graph-to-Graph Transformer architecture (RNGTr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. We demonstrate the power and effectiveness of RNGTr on several dependency corpora, using a refinement model pre-trained with BER… ▽ More We propose the Recursive Non-autoregressive Graph-to-Graph Transformer architecture (RNGTr) for the iterative refinement of arbitrary graphs through the recursive application of a non-autoregressive Graph-to-Graph Transformer and apply it to syntactic dependency parsing. We demonstrate the power and effectiveness of RNGTr on several dependency corpora, using a refinement model pre-trained with BERT. We also introduce Syntactic Transformer (SynTr), a non-recursive parser similar to our refinement model. RNGTr can improve the accuracy of a variety of initial parsers on 13 languages from the Universal Dependencies Treebanks, English and Chinese Penn Treebanks, and the German CoNLL2009 corpus, even improving over the new state-of-the-art results achieved by SynTr, significantly improving the state-of-the-art for all corpora tested. △ Less

Submitted 10 November, 2020; v1 submitted 29 March, 2020; originally announced March 2020.

Comments: Accepted to Transactions of the Association for Computational Linguistics (TACL) journal

arXiv:2003.00602 [pdf, other]

Federating Recommendations Using Differentially Private Prototypes

Authors: Mónica Ribero, Jette Henderson, Sinead Williamson, Haris Vikalo

Abstract: Machine learning methods allow us to make recommendations to users in applications across fields including entertainment, dating, and commerce, by exploiting similarities in users' interaction patterns. However, in domains that demand protection of personally sensitive data, such as medicine or banking, how can we learn such a model without accessing the sensitive data, and without inadvertently l… ▽ More Machine learning methods allow us to make recommendations to users in applications across fields including entertainment, dating, and commerce, by exploiting similarities in users' interaction patterns. However, in domains that demand protection of personally sensitive data, such as medicine or banking, how can we learn such a model without accessing the sensitive data, and without inadvertently leaking private information? We propose a new federated approach to learning global and local private models for recommendation without collecting raw data, user statistics, or information about personal preferences. Our method produces a set of prototypes that allows us to infer global behavioral patterns, while providing differential privacy guarantees for users in any database of the system. By requiring only two rounds of communication, we both reduce the communication costs and avoid the excessive privacy loss associated with iterative procedures. We test our framework on synthetic data as well as real federated medical data and Movielens ratings data. We show local adaptation of the global model allows our method to outperform centralized matrix-factorization-based recommender system models, both in terms of accuracy of matrix reconstruction and in terms of relevance of the recommendations, while maintaining provable privacy guarantees. We also show that our method is more robust and is characterized by smaller variance than individual models learned by independent entities. △ Less

Submitted 1 March, 2020; originally announced March 2020.

arXiv:1911.03561 [pdf, other]

doi 10.18653/v1/2020.findings-emnlp.294

Graph-to-Graph Transformer for Transition-based Dependency Parsing

Authors: Alireza Mohammadshahi, James Henderson

Abstract: We propose the Graph2Graph Transformer architecture for conditioning on and predicting arbitrary graphs, and apply it to the challenging task of transition-based dependency parsing. After proposing two novel Transformer models of transition-based dependency parsing as strong baselines, we show that adding the proposed mechanisms for conditioning on and predicting graphs of Graph2Graph Transformer… ▽ More We propose the Graph2Graph Transformer architecture for conditioning on and predicting arbitrary graphs, and apply it to the challenging task of transition-based dependency parsing. After proposing two novel Transformer models of transition-based dependency parsing as strong baselines, we show that adding the proposed mechanisms for conditioning on and predicting graphs of Graph2Graph Transformer results in significant improvements, both with and without BERT pre-training. The novel baselines and their integration with Graph2Graph Transformer significantly outperform the state-of-the-art in traditional transition-based dependency parsing on both English Penn Treebank, and 13 languages of Universal Dependencies Treebanks. Graph2Graph Transformer can be integrated with many previous structured prediction methods, making it easy to apply to a wide range of NLP tasks. △ Less

Submitted 30 October, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

Comments: Accepted to Findings of EMNLP 2020

arXiv:1909.06321 [pdf, other]

End-to-End Bias Mitigation by Modelling Biases in Corpora

Authors: Rabeeh Karimi Mahabadi, Yonatan Belinkov, James Henderson

Abstract: Several recent studies have shown that strong natural language understanding (NLU) models are prone to relying on unwanted dataset biases without learning the underlying task, resulting in models that fail to generalize to out-of-domain datasets and are likely to perform poorly in real-world scenarios. We propose two learning strategies to train neural models, which are more robust to such biases… ▽ More Several recent studies have shown that strong natural language understanding (NLU) models are prone to relying on unwanted dataset biases without learning the underlying task, resulting in models that fail to generalize to out-of-domain datasets and are likely to perform poorly in real-world scenarios. We propose two learning strategies to train neural models, which are more robust to such biases and transfer better to out-of-domain datasets. The biases are specified in terms of one or more bias-only models, which learn to leverage the dataset biases. During training, the bias-only models' predictions are used to adjust the loss of the base model to reduce its reliance on biases by down-weighting the biased examples and focusing the training on the hard examples. We experiment on large-scale natural language inference and fact verification benchmarks, evaluating on out-of-domain datasets that are specifically designed to assess the robustness of models against known biases in the training data. Results show that our debiasing methods greatly improve robustness in all settings and better transfer to other textual entailment datasets. Our code and data are publicly available in \url{https://github.com/rabeehk/robust-nli}. △ Less

Submitted 23 April, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

Comments: Accepted in ACL 2020 as a long paper

arXiv:1908.09507 [pdf, other]

Partially-supervised Mention Detection

Authors: Lesly Miculicich, James Henderson

Abstract: Learning to detect entity mentions without using syntactic information can be useful for integration and joint optimization with other tasks. However, it is common to have partially annotated data for this problem. Here, we investigate two approaches to deal with partial annotation of mentions: weighted loss and soft-target classification. We also propose two neural mention detection approaches: a… ▽ More Learning to detect entity mentions without using syntactic information can be useful for integration and joint optimization with other tasks. However, it is common to have partially annotated data for this problem. Here, we investigate two approaches to deal with partial annotation of mentions: weighted loss and soft-target classification. We also propose two neural mention detection approaches: a sequence tagging, and an exhaustive search. We evaluate our methods with coreference resolution as a downstream task, using multitask learning. The results show that the recall and F1 score improve for all methods. △ Less

Submitted 26 August, 2019; originally announced August 2019.

arXiv:1906.01496 [pdf, other]

Regularization Advantages of Multilingual Neural Language Models for Low Resource Domains

Authors: Navid Rekabsaz, Nikolaos Pappas, James Henderson, Banriskhem K. Khonglah, Srikanth Madikeri

Abstract: Neural language modeling (LM) has led to significant improvements in several applications, including Automatic Speech Recognition. However, they typically require large amounts of training data, which is not available for many domains and languages. In this study, we propose a multilingual neural language model architecture, trained jointly on the domain-specific data of several low-resource langu… ▽ More Neural language modeling (LM) has led to significant improvements in several applications, including Automatic Speech Recognition. However, they typically require large amounts of training data, which is not available for many domains and languages. In this study, we propose a multilingual neural language model architecture, trained jointly on the domain-specific data of several low-resource languages. The proposed multilingual LM consists of language specific word embeddings in the encoder and decoder, and one language specific LSTM layer, plus two LSTM layers with shared parameters across the languages. This multilingual LM model facilitates transfer learning across the languages, acting as an extra regularizer in very low-resource scenarios. We integrate our proposed multilingual approach with a state-of-the-art highly-regularized neural LM, and evaluate on the conversational data domain for four languages over a range of training data sizes. Compared to monolingual LMs, the results show significant improvements of our proposed multilingual LM when the amount of available training data is limited, indicating the advantages of cross-lingual parameter sharing in very low-resource language modeling. △ Less

Submitted 29 May, 2019; originally announced June 2019.

arXiv:1905.07857 [pdf, other]

doi 10.1145/3375627.3375812

CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models

Authors: Shubham Sharma, Jette Henderson, Joydeep Ghosh

Abstract: As artificial intelligence plays an increasingly important role in our society, there are ethical and moral obligations for both businesses and researchers to ensure that their machine learning models are designed, deployed, and maintained responsibly. These models need to be rigorously audited for fairness, robustness, transparency, and interpretability. A variety of methods have been developed t… ▽ More As artificial intelligence plays an increasingly important role in our society, there are ethical and moral obligations for both businesses and researchers to ensure that their machine learning models are designed, deployed, and maintained responsibly. These models need to be rigorously audited for fairness, robustness, transparency, and interpretability. A variety of methods have been developed that focus on these issues in isolation, however, managing these methods in conjunction with model development can be cumbersome and timeconsuming. In this paper, we introduce a unified and model-agnostic approach to address these issues: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models (CERTIFAI). Unlike previous methods in this domain, CERTIFAI is a general tool that can be applied to any black-box model and any type of input data. Given a model and an input instance, CERTIFAI uses a custom genetic algorithm to generate counterfactuals: instances close to the input that change the prediction of the model. We demonstrate how these counterfactuals can be used to examine issues of robustness, interpretability, transparency, and fairness. Additionally, we introduce CERScore, the first black-box model robustness score that performs comparably to methods that have access to model internals. △ Less

Submitted 19 May, 2019; originally announced May 2019.

Showing 1–50 of 70 results for author: Henderson, J