-
Optirank: classification for RNA-Seq data with optimal ranking reference genes
Authors:
Paola Malsot,
Filipe Martins,
Didier Trono,
Guillaume Obozinski
Abstract:
Classification algorithms using RNA-Sequencing (RNA-Seq) data as input are used in a variety of biological applications. By nature, RNA-Seq data is subject to uncontrolled fluctuations both within and especially across datasets, which presents a major difficulty for a trained classifier to generalize to an external dataset. Replacing raw gene counts with the rank of gene counts inside an observati…
▽ More
Classification algorithms using RNA-Sequencing (RNA-Seq) data as input are used in a variety of biological applications. By nature, RNA-Seq data is subject to uncontrolled fluctuations both within and especially across datasets, which presents a major difficulty for a trained classifier to generalize to an external dataset. Replacing raw gene counts with the rank of gene counts inside an observation has proven effective to mitigate this problem. However, the rank of a feature is by definition relative to all other features, including highly variable features that introduce noise in the ranking. To address this problem and obtain more robust ranks, we propose a logistic regression model, optirank, which learns simultaneously the parameters of the model and the genes to use as a reference set in the ranking. We show the effectiveness of this method on simulated data. We also consider real classification tasks, which present different kinds of distribution shifts between train and test data. Those tasks concern a variety of applications, such as cancer of unknown primary classification, identification of specific gene signatures, and determination of cell type in single-cell RNA-Seq datasets. On those real tasks, optirank performs at least as well as the vanilla logistic regression on classical ranks, while producing sparser solutions. In addition, to increase the robustness against dataset shifts, we propose a multi-source learning scheme and demonstrate its effectiveness when used in combination with rank-based classifiers.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
A categorification of Quinn's finite total homotopy TQFT with application to TQFTs and once-extended TQFTs derived from strict omega-groupoids
Authors:
João Faria Martins,
Timothy Porter
Abstract:
We first revisit the construction of Quinn's Finite Total Homotopy TQFT, which depends on the choice of a homotopy finite space, $\boldsymbol{B}$. We build our construction directly from homotopy theoretical techniques, and hence, as in Quinn's original notes from 1995, the construction works in all dimensions.
Our aim in this is to provide background for giving in detail the construction of a o…
▽ More
We first revisit the construction of Quinn's Finite Total Homotopy TQFT, which depends on the choice of a homotopy finite space, $\boldsymbol{B}$. We build our construction directly from homotopy theoretical techniques, and hence, as in Quinn's original notes from 1995, the construction works in all dimensions.
Our aim in this is to provide background for giving in detail the construction of a once-extended TQFT categorifying Quinn's TQFT, in the form of a symmetric monoidal bifunctor from the bicategory of manifolds, cobordisms and extended cobordisms, initially to the symmetric monoidal bicategory of profunctors (enriched over vector spaces), and then to the Morita bicategory of algebras, bimodules and bimodule maps. These once-extended versions of Quinn's TQFT likewise are defined for all dimensions, and, as with the original version, depend on the choice of a homotopy finite space $\boldsymbol{B}$.
To show the utility of this approach, we explicitly compute both Quinn's finite total homotopy TQFT, and its extended version, for the case when $\boldsymbol{B}$ is the classifying space of a homotopically finite omega-groupoid, in this paper taking the form of a crossed complex, following Brown and Higgins.
The constructions in this paper include, in particular, the description of once-extended TQFTs derived from the classifying space of a finite strict 2-group, of relevance for modelling discrete higher gauge theory, but the techniques involved are considerably more general.
△ Less
Submitted 6 January, 2023;
originally announced January 2023.
-
Python Code Generation by Asking Clarification Questions
Authors:
Haau-Sing Li,
Mohsen Mesgar,
André F. T. Martins,
Iryna Gurevych
Abstract:
Code generation from text requires understanding the user's intent from a natural language description and generating an executable code snippet that satisfies this intent. While recent pretrained language models demonstrate remarkable performance for this task, these models fail when the given natural language description is under-specified. In this work, we introduce a novel and more realistic s…
▽ More
Code generation from text requires understanding the user's intent from a natural language description and generating an executable code snippet that satisfies this intent. While recent pretrained language models demonstrate remarkable performance for this task, these models fail when the given natural language description is under-specified. In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. Therefore, we collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers. The empirical results of our evaluation of pretrained language model performance on code generation show that clarifications result in more precisely generated code, as shown by the substantial improvement of model performance in all evaluation metrics. Alongside this, our task and dataset introduce new challenges to the community, including when and what clarification questions should be asked. Our code and dataset are available on GitHub.
△ Less
Submitted 26 May, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation
Authors:
Nuno M. Guerreiro,
Pierre Colombo,
Pablo Piantanida,
André F. T. Martins
Abstract:
Neural machine translation (NMT) has become the de-facto standard in real-world machine translation applications. However, NMT models can unpredictably produce severely pathological translations, known as hallucinations, that seriously undermine user trust. It becomes thus crucial to implement effective preventive strategies to guarantee their proper functioning. In this paper, we address the prob…
▽ More
Neural machine translation (NMT) has become the de-facto standard in real-world machine translation applications. However, NMT models can unpredictably produce severely pathological translations, known as hallucinations, that seriously undermine user trust. It becomes thus crucial to implement effective preventive strategies to guarantee their proper functioning. In this paper, we address the problem of hallucination detection in NMT by following a simple intuition: as hallucinations are detached from the source content, they exhibit encoder-decoder attention patterns that are statistically different from those of good quality translations. We frame this problem with an optimal transport formulation and propose a fully unsupervised, plug-in detector that can be used with any attention-based NMT model. Experimental results show that our detector not only outperforms all previous model-based detectors, but is also competitive with detectors that employ large models trained on millions of samples.
△ Less
Submitted 19 May, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Linear Rank Intersection Types
Authors:
Fábio Reis,
Sandra Alves,
Mário Florido
Abstract:
Non-idempotent intersection types provide quantitative information about typed programs, and have been used to obtain time and space complexity measures. Intersection type systems characterize termination, so restrictions need to be made in order to make typability decidable. One such restriction consists in using a notion of finite rank for the idempotent intersection types. In this work, we defi…
▽ More
Non-idempotent intersection types provide quantitative information about typed programs, and have been used to obtain time and space complexity measures. Intersection type systems characterize termination, so restrictions need to be made in order to make typability decidable. One such restriction consists in using a notion of finite rank for the idempotent intersection types. In this work, we define a new notion of rank for the non-idempotent intersection types. We then define a novel type system and a type inference algorithm for the lambda-calculus, using the new notion of rank 2. In the second part of this work, we extend the type system and the type inference algorithm to use the quantitative properties of the non-idempotent intersection types to infer quantitative information related to resource usage.
△ Less
Submitted 4 May, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
A quantum dot-based frequency multiplier
Authors:
G. A. Oakes,
L. Peri,
L. Cochrane,
F. Martins,
L. Hutin,
B. Bertrand,
M. Vinet,
A. Gomez Saiz,
C. J. B. Ford,
C. G. Smith,
M. F. Gonzalez-Zalba
Abstract:
Silicon offers the enticing opportunity to integrate hybrid quantum-classical computing systems on a single platform. For qubit control and readout, high-frequency signals are required. Therefore, devices that can facilitate its generation are needed. Here, we present a quantum dot-based radiofrequency multiplier operated at cryogenic temperatures. The device is based on the non-linear capacitance…
▽ More
Silicon offers the enticing opportunity to integrate hybrid quantum-classical computing systems on a single platform. For qubit control and readout, high-frequency signals are required. Therefore, devices that can facilitate its generation are needed. Here, we present a quantum dot-based radiofrequency multiplier operated at cryogenic temperatures. The device is based on the non-linear capacitance-voltage characteristics of quantum dot systems arising from their low-dimensional density of states. We implement the multiplier in a multi-gate silicon nanowire transistor using two complementary device configurations: a single quantum dot coupled to a charge reservoir and a coupled double quantum dot. We study the harmonic voltage conversion as a function of energy detuning, multiplication factor and harmonic phase noise and find near ideal performance up to a multiplication factor of 10. Our results demonstrate a method for high-frequency conversion that could be readily integrated into silicon-based quantum computing systems and be applied to other semiconductors.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Improving abstractive summarization with energy-based re-ranking
Authors:
Diogo Pernes,
Afonso Mendes,
André F. T. Martins
Abstract:
Current abstractive summarization systems present important weaknesses which prevent their deployment in real-world applications, such as the omission of relevant information and the generation of factual inconsistencies (also known as hallucinations). At the same time, automatic evaluation metrics such as CTC scores have been recently proposed that exhibit a higher correlation with human judgment…
▽ More
Current abstractive summarization systems present important weaknesses which prevent their deployment in real-world applications, such as the omission of relevant information and the generation of factual inconsistencies (also known as hallucinations). At the same time, automatic evaluation metrics such as CTC scores have been recently proposed that exhibit a higher correlation with human judgments than traditional lexical-overlap metrics such as ROUGE. In this work, we intend to close the loop by leveraging the recent advances in summarization metrics to create quality-aware abstractive summarizers. Namely, we propose an energy-based model that learns to re-rank summaries according to one or a combination of these metrics. We experiment using several metrics to train our energy-based re-ranker and show that it consistently improves the scores achieved by the predicted summaries. Nonetheless, human evaluation results show that the re-ranking approach should be used with care for highly abstractive summaries, as the available metrics are not yet sufficiently reliable for this purpose.
△ Less
Submitted 7 November, 2022; v1 submitted 27 October, 2022;
originally announced October 2022.
-
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task
Authors:
Ricardo Rei,
Marcos Treviso,
Nuno M. Guerreiro,
Chrysoula Zerva,
Ana C. Farinha,
Christine Maroti,
José G. C. de Souza,
Taisiya Glushkova,
Duarte M. Alves,
Alon Lavie,
Luisa Coheur,
André F. T. Martins
Abstract:
We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equip** it w…
▽ More
We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equip** it with a word-level sequence tagger and an explanation extractor. Our results suggest that incorporating references during pretraining improves performance across several language pairs on downstream tasks, and that jointly training with sentence and word-level objectives yields a further boost. Furthermore, combining attention and gradient information proved to be the top strategy for extracting good explanations of sentence-level QE models. Overall, our submissions achieved the best results for all three tasks for almost all language pairs by a considerable margin.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Machine Learning Partners in Criminal Networks
Authors:
Diego D. Lopes,
Bruno R. da Cunha,
Alvaro F. Martins,
Sebastian Goncalves,
Ervin K. Lenzi,
Quentin S. Hanley,
Matjaz Perc,
Haroldo V. Ribeiro
Abstract:
Recent research has shown that criminal networks have complex organizational structures, but whether this can be used to predict static and dynamic properties of criminal networks remains little explored. Here, by combining graph representation learning and machine learning methods, we show that structural properties of political corruption, police intelligence, and money laundering networks can b…
▽ More
Recent research has shown that criminal networks have complex organizational structures, but whether this can be used to predict static and dynamic properties of criminal networks remains little explored. Here, by combining graph representation learning and machine learning methods, we show that structural properties of political corruption, police intelligence, and money laundering networks can be used to recover missing criminal partnerships, distinguish among different types of criminal and legal associations, as well as predict the total amount of money exchanged among criminal agents, all with outstanding accuracy. We also show that our approach can anticipate future criminal associations during the dynamic growth of corruption networks with significant accuracy. Thus, similar to evidence found at crime scenes, we conclude that structural patterns of criminal networks carry crucial information about illegal activities, which allows machine learning methods to predict missing information and even anticipate future criminal behavior.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
Efficient Methods for Natural Language Processing: A Survey
Authors:
Marcos Treviso,
Ji-Ung Lee,
Tianchu Ji,
Betty van Aken,
Qingqing Cao,
Manuel R. Ciosici,
Michael Hassid,
Kenneth Heafield,
Sara Hooker,
Colin Raffel,
Pedro H. Martins,
André F. T. Martins,
Jessica Zosa Forde,
Peter Milder,
Edwin Simpson,
Noam Slonim,
Jesse Dodge,
Emma Strubell,
Niranjan Balasubramanian,
Leon Derczynski,
Iryna Gurevych,
Roy Schwartz
Abstract:
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few…
▽ More
Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for develo** more efficient methods.
△ Less
Submitted 24 March, 2023; v1 submitted 31 August, 2022;
originally announced September 2022.
-
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
Authors:
Nuno M. Guerreiro,
Elena Voita,
André F. T. Martins
Abstract:
Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground. Previous work has been limited in several ways: it often resorts to artificial settings where the problem is amplified, it disregards some (common) types of hallucinations, and it does not validate adequacy of detection heuristi…
▽ More
Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground. Previous work has been limited in several ways: it often resorts to artificial settings where the problem is amplified, it disregards some (common) types of hallucinations, and it does not validate adequacy of detection heuristics. In this paper, we set foundations for the study of NMT hallucinations. First, we work in a natural setting, i.e., in-domain data without artificial noise neither in training nor in inference. Next, we annotate a dataset of over 3.4k sentences indicating different kinds of critical errors and hallucinations. Then, we turn to detection methods and both revisit methods used previously and propose using glass-box uncertainty-based detectors. Overall, we show that for preventive settings, (i) previously used methods are largely inadequate, (ii) sequence log-probability works best and performs on par with reference-based methods. Finally, we propose DeHallucinator, a simple method for alleviating hallucinations at test time that significantly reduces the hallucinatory rate. To ease future research, we release our annotated dataset for WMT18 German-English data, along with the model, training data, and code.
△ Less
Submitted 5 March, 2023; v1 submitted 10 August, 2022;
originally announced August 2022.
-
Boundedness of geometric invariants near a singularity which is a suspension of a singular curve
Authors:
Luciana F. Martins,
Kentaro Saji,
Samuel P. dos Santos,
Keisuke Teramoto
Abstract:
Near a singular point of a surface or a curve, geometric invariants diverge in general, and the orders of diverge, in particular the boundedness about these invariants represent geometry of the surface and the curve. In this paper, we study boundedness and orders of several geometric invariants near a singular point of a surface which is a suspension of a singular curve in the plane and those of c…
▽ More
Near a singular point of a surface or a curve, geometric invariants diverge in general, and the orders of diverge, in particular the boundedness about these invariants represent geometry of the surface and the curve. In this paper, we study boundedness and orders of several geometric invariants near a singular point of a surface which is a suspension of a singular curve in the plane and those of curves passing through the singular point. We evaluates the orders of Gaussian and mean curvatures and them of geodesic, normal curvatures and geodesic torsion for the curve.
△ Less
Submitted 21 December, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Chunk-based Nearest Neighbor Machine Translation
Authors:
Pedro Henrique Martins,
Zita Marinho,
André F. T. Martins
Abstract:
Semi-parametric models, which augment generation with retrieval, have led to impressive results in language modeling and machine translation, due to their ability to retrieve fine-grained information from a datastore of examples. One of the most prominent approaches, $k$NN-MT, exhibits strong domain adaptation capabilities by retrieving tokens from domain-specific datastores \citep{khandelwal2020n…
▽ More
Semi-parametric models, which augment generation with retrieval, have led to impressive results in language modeling and machine translation, due to their ability to retrieve fine-grained information from a datastore of examples. One of the most prominent approaches, $k$NN-MT, exhibits strong domain adaptation capabilities by retrieving tokens from domain-specific datastores \citep{khandelwal2020nearest}. However, $k$NN-MT requires an expensive retrieval operation for every single generated token, leading to a very low decoding speed (around 8 times slower than a parametric model). In this paper, we introduce a \textit{chunk-based} $k$NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token. We propose several strategies for incorporating the retrieved chunks into the generation process, and for selecting the steps at which the model needs to search for neighbors in the datastore. Experiments on machine translation in two settings, static and ``on-the-fly'' domain adaptation, show that the chunk-based $k$NN-MT model leads to significant speed-ups (up to 4 times) with only a small drop in translation quality.
△ Less
Submitted 7 November, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Quality-Aware Decoding for Neural Machine Translation
Authors:
Patrick Fernandes,
António Farinhas,
Ricardo Rei,
José G. C. de Souza,
Perez Ogayo,
Graham Neubig,
André F. T. Martins
Abstract:
Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search. In this paper, we bring together these two lines of research and propose quality-aware decoding for NMT…
▽ More
Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search. In this paper, we bring together these two lines of research and propose quality-aware decoding for NMT, by leveraging recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods like $N$-best reranking and minimum Bayes risk decoding. We perform an extensive comparison of various possible candidate generation and ranking methods across four datasets and two model classes and find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics (COMET and BLEURT) and to human assessments. Our code is available at https://github.com/deep-spin/qaware-decode.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Efficient Machine Translation Domain Adaptation
Authors:
Pedro Henrique Martins,
Zita Marinho,
André F. T. Martins
Abstract:
Machine translation models struggle when translating out-of-domain text, which makes domain adaptation a topic of critical importance. However, most domain adaptation methods focus on fine-tuning or training the entire or part of the model on every new domain, which can be costly. On the other hand, semi-parametric models have been shown to successfully perform domain adaptation by retrieving exam…
▽ More
Machine translation models struggle when translating out-of-domain text, which makes domain adaptation a topic of critical importance. However, most domain adaptation methods focus on fine-tuning or training the entire or part of the model on every new domain, which can be costly. On the other hand, semi-parametric models have been shown to successfully perform domain adaptation by retrieving examples from an in-domain datastore (Khandelwal et al., 2021). A drawback of these retrieval-augmented models, however, is that they tend to be substantially slower. In this paper, we explore several approaches to speed up nearest neighbor machine translation. We adapt the methods recently proposed by He et al. (2021) for language modeling, and introduce a simple but effective caching strategy that avoids performing retrieval when similar contexts have been seen before. Translation quality and runtimes for several domains show the effectiveness of the proposed solutions.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Learning to Scaffold: Optimizing Model Explanations for Teaching
Authors:
Patrick Fernandes,
Marcos Treviso,
Danish Pruthi,
André F. T. Martins,
Graham Neubig
Abstract:
Modern machine learning models are opaque, and as a result there is a burgeoning academic subfield on methods that explain these models' behavior. However, what is the precise goal of providing such explanations, and how can we demonstrate that explanations achieve this goal? Some research argues that explanations should help teach a student (either human or machine) to simulate the model being ex…
▽ More
Modern machine learning models are opaque, and as a result there is a burgeoning academic subfield on methods that explain these models' behavior. However, what is the precise goal of providing such explanations, and how can we demonstrate that explanations achieve this goal? Some research argues that explanations should help teach a student (either human or machine) to simulate the model being explained, and that the quality of explanations can be measured by the simulation accuracy of students on unexplained examples. In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model. We train models on three natural language processing and computer vision tasks, and find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods. Through human annotations and a user study, we further find that these learned explanations more closely align with how humans would explain the required decisions in these tasks. Our code is available at https://github.com/coderpat/learning-scaffold
△ Less
Submitted 29 November, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Sharp inequalities for discrete and continuous multi-tiling, using the Bombieri-Siegel approach
Authors:
Michel Faleiros Martins,
Sinai Robins
Abstract:
Given a finite subset $F$ of integer points in $\mathbb Z^d$, it is of interest to seek conditions on $F$ that allow it to multi-tile $\mathbb Z^d$ by translations. In addition to the continuous multi-tiling results presented here, we also give analogous discrete applications to arithmetic combinatorics. Namely we give a discretized version of the Bombieri-Siegel formula, namely a finite sum of di…
▽ More
Given a finite subset $F$ of integer points in $\mathbb Z^d$, it is of interest to seek conditions on $F$ that allow it to multi-tile $\mathbb Z^d$ by translations. In addition to the continuous multi-tiling results presented here, we also give analogous discrete applications to arithmetic combinatorics. Namely we give a discretized version of the Bombieri-Siegel formula, namely a finite sum of discrete covariograms, taken over any finite set of integer points in $\mathbb R^d$. As a consequence, we arrive at a new equivalent condition for multi-tiling $\mathbb Z^d$ by translating $F$ with a fixed integer sublattice.
Similar questions related to convex bodies have already been investigated extensively. In order to develop lattice sums of the cross covariogram for any two bounded sets $A, B\subset \mathbb R^d$, we prove a refined continuous version of the classical Bombieri-Siegel formula from the geometry of numbers. To achieve this goal, we use a variant of the Poisson Summation formula, adapted for continuous functions of compact support.
As an application of this refined Bombieri-Siegel formula, a new characterization of multi-tilings of Euclidean space by translations of a compact set by using a lattice is given. A further consequence is a spectral formula for the volume of any bounded measurable set.
△ Less
Submitted 15 December, 2023; v1 submitted 18 April, 2022;
originally announced April 2022.
-
Disentangling Uncertainty in Machine Translation Evaluation
Authors:
Chrysoula Zerva,
Taisiya Glushkova,
Ricardo Rei,
André F. T. Martins
Abstract:
Trainable evaluation metrics for machine translation (MT) exhibit strong correlation with human judgements, but they are often hard to interpret and might produce unreliable scores under noisy or out-of-domain data. Recent work has attempted to mitigate this with simple uncertainty quantification techniques (Monte Carlo dropout and deep ensembles), however these techniques (as we show) are limited…
▽ More
Trainable evaluation metrics for machine translation (MT) exhibit strong correlation with human judgements, but they are often hard to interpret and might produce unreliable scores under noisy or out-of-domain data. Recent work has attempted to mitigate this with simple uncertainty quantification techniques (Monte Carlo dropout and deep ensembles), however these techniques (as we show) are limited in several ways -- for example, they are unable to distinguish between different kinds of uncertainty, and they are time and memory consuming. In this paper, we propose more powerful and efficient uncertainty predictors for MT evaluation, and we assess their ability to target different sources of aleatoric and epistemic uncertainty. To this end, we develop and compare training objectives for the COMET metric to enhance it with an uncertainty prediction output, including heteroscedastic regression, divergence minimization, and direct uncertainty prediction. Our experiments show improved results on uncertainty prediction for the WMT metrics task datasets, with a substantial reduction in computational costs. Moreover, they demonstrate the ability of these predictors to address specific uncertainty causes in MT evaluation, such as low quality references and out-of-domain data.
△ Less
Submitted 29 November, 2022; v1 submitted 13 April, 2022;
originally announced April 2022.
-
Universality of political corruption networks
Authors:
Alvaro F. Martins,
Bruno R. da Cunha,
Quentin S. Hanley,
Sebastian Goncalves,
Matjaz Perc,
Haroldo V. Ribeiro
Abstract:
Corruption crimes demand highly coordinated actions among criminal agents to succeed. But research dedicated to corruption networks is still in its infancy and indeed little is known about the properties of these networks. Here we present a comprehensive investigation of corruption networks related to political scandals in Spain and Brazil over nearly three decades. We show that corruption network…
▽ More
Corruption crimes demand highly coordinated actions among criminal agents to succeed. But research dedicated to corruption networks is still in its infancy and indeed little is known about the properties of these networks. Here we present a comprehensive investigation of corruption networks related to political scandals in Spain and Brazil over nearly three decades. We show that corruption networks of both countries share universal structural and dynamical properties, including similar degree distributions, clustering and assortativity coefficients, modular structure, and a growth process that is marked by the coalescence of network components due to a few recidivist criminals. We propose a simple model that not only reproduces these empirical properties but reveals also that corruption networks operate near a critical recidivism rate below which the network is entirely fragmented and above which it is overly connected. Our research thus indicates that actions focused on decreasing corruption recidivism may substantially mitigate this type of organized crime.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
BASiNETEntropy: an alignment-free method for classification of biological sequences through complex networks and entropy maximization
Authors:
Murilo Montanini Breve,
Matheus Henrique Pimenta-Zanon,
Fabrício Martins Lopes
Abstract:
The discovery of nucleic acids and the structure of DNA have brought considerable advances in the understanding of life. The development of next-generation sequencing technologies has led to a large-scale generation of data, for which computational methods have become essential for analysis and knowledge discovery. In particular, RNAs have received much attention because of the diversity of their…
▽ More
The discovery of nucleic acids and the structure of DNA have brought considerable advances in the understanding of life. The development of next-generation sequencing technologies has led to a large-scale generation of data, for which computational methods have become essential for analysis and knowledge discovery. In particular, RNAs have received much attention because of the diversity of their functionalities in the organism and the discoveries of different classes with different functions in many biological processes. Therefore, the correct identification of RNA sequences is increasingly important to provide relevant information to understand the functioning of organisms. This work addresses this context by presenting a new method for the classification of biological sequences through complex networks and entropy maximization. The maximum entropy principle is proposed to identify the most informative edges about the RNA class, generating a filtered complex network. The proposed method was evaluated in the classification of different RNA classes from 13 species. The proposed method was compared to PLEK, CPC2 and BASiNET methods, outperforming all compared methods. BASiNETEntropy classified all RNA sequences with high accuracy and low standard deviation in results, showing assertiveness and robustness. The proposed method is implemented in an open source in R language and is freely available at https://cran.r-project.org/web/packages/BASiNETEntropy.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
Fast high-fidelity single-shot readout of spins in silicon using a single-electron box
Authors:
G. A. Oakes,
V. N. Ciriano-Tejel,
D. Wise,
M. A. Fogarty,
T. Lundberg,
C. Lainé,
S. Schaal,
F. Martins,
D. J. Ibberson,
L. Hutin,
B. Bertrand,
N. Stelmashenko,
J. A. W. Robinson,
L. Ibberson,
A. Hashim,
I. Siddiqi,
A. Lee,
M. Vinet,
C. G. Smith,
J. J. L. Morton,
M. F. Gonzalez-Zalba
Abstract:
Three key metrics for readout systems in quantum processors are measurement speed, fidelity and footprint. Fast high-fidelity readout enables mid-circuit measurements, a necessary feature for many dynamic algorithms and quantum error correction, while a small footprint facilitates the design of scalable, highly-connected architectures with the associated increase in computing performance. Here, we…
▽ More
Three key metrics for readout systems in quantum processors are measurement speed, fidelity and footprint. Fast high-fidelity readout enables mid-circuit measurements, a necessary feature for many dynamic algorithms and quantum error correction, while a small footprint facilitates the design of scalable, highly-connected architectures with the associated increase in computing performance. Here, we present two complementary demonstrations of fast high-fidelity single-shot readout of spins in silicon quantum dots using a compact, dispersive charge sensor: a radio-frequency single-electron box. The sensor, despite requiring fewer electrodes than conventional detectors, performs at the state-of-the-art achieving spin read-out fidelity of 99.2% in less than 6 $μ$s. We demonstrate that low-loss high-impedance resonators, highly coupled to the sensing dot, in conjunction with Josephson parametric amplification are instrumental in achieving optimal performance. We quantify the benefit of Pauli spin blockade over spin-dependent tunneling to a reservoir, as the spin-to-charge conversion mechanism in these readout schemes. Our results place dispersive charge sensing at the forefront of readout methodologies for scalable semiconductor spin-based quantum processors.
△ Less
Submitted 13 March, 2022;
originally announced March 2022.
-
Differentiable Causal Discovery Under Latent Interventions
Authors:
Gonçalo R. A. Faria,
André F. T. Martins,
Mário A. T. Figueiredo
Abstract:
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and…
▽ More
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and one observation distribution, but where we do not know which distribution originated each sample and how the intervention affected the system, \textit{i.e.}, interventions are entirely latent. We propose a method based on neural networks and variational inference that addresses this scenario by framing it as learning a shared causal graph among an infinite mixture (under a Dirichlet process prior) of intervention structural causal models. Experiments with synthetic and real data show that our approach and its semi-supervised variant are able to discover causal relations in this challenging scenario.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
Spectroscopic evolution of very massive stars at Z = 1/2.5 Zsun
Authors:
F. Martins,
A. Palacios
Abstract:
Stars with masses in excess of 100 Msun are observed in the Local Universe, but they remain rare objects. Because of the shape of the mass function, they are expected to be present only in the most massive and youngest clusters. They may thus be formed in number in highly star-forming galaxies. Very massive stars (VMSs) experience strong stellar winds that are stronger than those of their less mas…
▽ More
Stars with masses in excess of 100 Msun are observed in the Local Universe, but they remain rare objects. Because of the shape of the mass function, they are expected to be present only in the most massive and youngest clusters. They may thus be formed in number in highly star-forming galaxies. Very massive stars (VMSs) experience strong stellar winds that are stronger than those of their less massive OB-type counterparts. These strong winds therefore need to be taken into account in evolutionary models and synthetic spectra to properly predict the appearance of VMS. We present evolutionary models computed with the code STAREVOL. They include a recent mass-loss recipe that is relevant for VMSs. We subsequently calculated atmosphere models and synthetic spectra along the resulting tracks with the code CMFGEN. We studied stars with masses between 150 and 400 Msun and focused on a metallicity Z=1/2.5Zsun. We studied the impact of our VMS spectra on the spectral energy distribution of young starbursts. We show that the optical and UV range is dominated by HeII 4686 and HeII 1640 emission for almost the entire main-sequence evolution of VMSs, in contrast to less massive stars. In the UV spectral range, carbon, nitrogen, and iron lines shape the spectra of VMSs, which appear for most of their evolution as WNh objects. The morphology of the synthetic spectra is similar to that of VMSs in the Large Magellanic Cloud. We show that stars with masses higher than 100 Msun emit nearly as much light as all other stars in young starbursts. The integrated UV spectrum of these starbursts is significantly affected by the presence of VMSs.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
The Gaia-ESO Survey: The analysis of the hot-star spectra
Authors:
R. Blomme,
S. Daflon,
M. Gebran,
A. Herrero,
A. Lobel,
L. Mahy,
F. Martins,
T. Morel,
S. R. Berlanas,
A. Blazere,
Y. Fremat,
E. Gosset,
J. Maiz Apellaniz,
W. Santos,
T. Semaan,
S. Simon-Diaz,
D. Volpi,
G. Holgado,
F. Jimenez-Esteban,
M. F. Nieva,
N. Przybilla,
G. Gilmore,
S. Randich,
I. Negueruela,
T. Prusti
, et al. (22 additional authors not shown)
Abstract:
The Gaia-ESO Survey (GES) is a large public spectroscopic survey that has collected, over a period of 6 years, spectra of ~ 10^5 stars. This survey provides not only the reduced spectra, but also the stellar parameters and abundances resulting from the analysis of the spectra. The GES dataflow is organised in 19 working groups. Working group 13 (WG13) is responsible for the spectral analysis of th…
▽ More
The Gaia-ESO Survey (GES) is a large public spectroscopic survey that has collected, over a period of 6 years, spectra of ~ 10^5 stars. This survey provides not only the reduced spectra, but also the stellar parameters and abundances resulting from the analysis of the spectra. The GES dataflow is organised in 19 working groups. Working group 13 (WG13) is responsible for the spectral analysis of the hottest stars (O, B and A type, with a formal cut-off of Teff > 7000 K) that were observed as part of GES. We present the procedures and techniques that have been applied to the reduced spectra, in order to determine the stellar parameters and abundances of these stars. The procedure used is similar to that of other working groups in GES. A number of groups (called `Nodes') each independently analyse the spectra, using their state-of-the-art techniques and codes. Specific for the analysis in WG13 is the large temperature range that is covered (Teff = 7000 - 50,000 K), requiring the use of different analysis codes. Most Nodes can therefore only handle part of the data. Quality checks are applied to the results of these Nodes by comparing them to benchmark stars, and by comparing them one to another. For each star the Node values are then homogenised into a single result: the recommended parameters and abundances. Eight Nodes each analysed (part of) the data. In total 17,693 spectra of 6462 stars were analysed, most of them in 37 open star clusters. The homogenisation led to stellar parameters for 5584 stars. Abundances were determined for a more limited number of stars. Elements studied are He, C, N, O, Ne, Mg, Al, Si and Sc. Abundances for at least one of those elements were determined for 292 stars. The hot-star data analysed here, as well as the Gaia-ESO Survey data in general, will be of considerable use in future studies of stellar evolution and open clusters.
△ Less
Submitted 1 March, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Modeling Structure with Undirected Neural Networks
Authors:
Tsvetomila Mihaylova,
Vlad Niculae,
André F. T. Martins
Abstract:
Neural networks are powerful function estimators, leading to their status as a paradigm of choice for modeling structured data. However, unlike other structured representations that emphasize the modularity of the problem -- e.g., factor graphs -- neural networks are usually monolithic map**s from inputs to outputs, with a fixed computation order. This limitation prevents them from capturing dif…
▽ More
Neural networks are powerful function estimators, leading to their status as a paradigm of choice for modeling structured data. However, unlike other structured representations that emphasize the modularity of the problem -- e.g., factor graphs -- neural networks are usually monolithic map**s from inputs to outputs, with a fixed computation order. This limitation prevents them from capturing different directions of computation and interaction between the modeled variables.
In this paper, we combine the representational strengths of factor graphs and of neural networks, proposing undirected neural networks (UNNs): a flexible framework for specifying computations that can be performed in any order. For particular choices, our proposed models subsume and extend many existing architectures: feed-forward, recurrent, self-attention networks, auto-encoders, and networks with implicit layers. We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks: tree-constrained dependency parsing, convolutional image classification, and sequence completion with attention. By varying the computation order, we show how a single UNN can be used both as a classifier and a prototype generator, and how it can fill in missing parts of an input sequence, making them a promising field for further research.
△ Less
Submitted 17 June, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Multipole-moment effects in ion-molecule reactions at low temperatures: part I -- Ion-dipole enhancement of the rate coefficients of the He$^+$ + NH$_3$ and He$^+$ + ND$_3$ reactions at collision energies near $0$ K
Authors:
Valentina Zhelyazkova,
Fernanda B. V. Martins,
Josef A. Agner,
Hansjürg Schmutz,
Frédéric Merkt
Abstract:
The energy dependence of the rates of the reactions between He$^+$ and ammonia (NY$_3$, Y= {H,D}), forming NY$_2^+$, Y and He as well as NY$^+$, Y$_2$ and He, and the corresponding product branching ratios have been measured at low collision energies between 0 and $k_{\mathrm{B}}\cdot40$ K using a recently developed merged-beam technique [Allmendinger {\it et al.}, ChemPhysChem {\bf 17}, 3596 (201…
▽ More
The energy dependence of the rates of the reactions between He$^+$ and ammonia (NY$_3$, Y= {H,D}), forming NY$_2^+$, Y and He as well as NY$^+$, Y$_2$ and He, and the corresponding product branching ratios have been measured at low collision energies between 0 and $k_{\mathrm{B}}\cdot40$ K using a recently developed merged-beam technique [Allmendinger {\it et al.}, ChemPhysChem {\bf 17}, 3596 (2016)]. To avoid heating of the ions by stray electric fields, the reactions are observed within the large orbit of a highly excited Rydberg electron. A beam of He Rydberg atoms was merged with a supersonic beam of ammonia using a curved surface-electrode Rydberg-Stark deflector, which was also used for adjusting the final velocity of the He Rydberg atoms, and thus the collision energy ($E_{\rm coll}$). A collision-energy resolution of about 200 mK was reached at the lowest $E_{\rm coll}$ values. The reaction rate coefficients exhibit a sharp increase at collision energies below $\sim k_{\mathrm{B}}\cdot5$ K and pronounced deviations from Langevin-capture behaviour. The experimental results are interpreted in terms of an adiabatic capture model describing the rotational-state-dependent orientation of the ammonia molecules by the electric field of the He$^+$ atom. The model faithfully describes the experimental observations. The enhancement of the reaction yields of both reactions observed at the lowest collision energies is attributed to high-field-seeking states which experience linear Stark shifts at low electric fields. Thermal capture rate constants are derived from the model for the temperature range between 0 and 10~K relevant for astrochemistry. Comparison of the calculated thermal capture rate coefficients with the absolute reaction rates measured above 27 K by Marquette {\it et al.} (Chem. Phys. Lett., 1985, {\bf 122}, 431) suggests that only 40\% of the close collisions are reactive.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
Ion-molecule reactions below 1~K: Observation of a strong enhancement of the reaction rate of the ion-dipole reaction He$^+$+ CH$_3$F
Authors:
Valentina Zhelyazkova,
Fernanda B. V. Martins,
Josef A. Agner,
Hansjürg Schmutz,
Frédéric Merkt
Abstract:
The reaction between He$^+$ and CH$_3$F forming predominantly CH$_2^+$ and CHF$^+$ has been studied at collision energies $E_{\rm coll}$ between 0 and $k_{\rm B}\cdot 10$~K in a merged-beam apparatus. To avoid heating of the ions by stray electric fields, the reaction was observed within the orbit of a highly excited Rydberg electron. Supersonic beams of CH$_3$F and He($n$) Rydberg atoms with prin…
▽ More
The reaction between He$^+$ and CH$_3$F forming predominantly CH$_2^+$ and CHF$^+$ has been studied at collision energies $E_{\rm coll}$ between 0 and $k_{\rm B}\cdot 10$~K in a merged-beam apparatus. To avoid heating of the ions by stray electric fields, the reaction was observed within the orbit of a highly excited Rydberg electron. Supersonic beams of CH$_3$F and He($n$) Rydberg atoms with principal quantum number $n= 30$ and 35 were merged and their relative velocity tuned using a Rydberg-Stark decelerator and deflector, allowing an energy resolution of 150 mK. A strong enhancement of the reaction rate was observed below $E_{\rm coll}/k_{\rm B} = 1$~K. The experimental results are interpreted with an adiabatic capture model that accounts for the state-dependent orientation of the polar CH$_3$F molecules by the Stark effect as they approach the He$^+$ ion. The enhancement of the reaction rate at low collision energies is primarily attributed to para-CH$_3$F molecules in the $J=1,\,KM=1$ high-field-seeking states, which represent about 8\% of the population at the 6~K rotational temperature of the supersonic beam.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Dynamics of hole singlet triplet qubits with large g-factor differences
Authors:
Daniel Jirovec,
Philipp M. Mutter,
Andrea Hofmann,
Josip Kukucka,
Alessandro Crippa,
Frederico Martins,
Andrea Ballabio,
Daniel Chrastina,
Giovanni Isella,
Guido Burkard,
Georgios Katsaros
Abstract:
The spin-orbit interaction is the key element for electrically tunable spin qubits. Here we probe the effect of cubic Rashba spin-orbit interaction on mixing of the spin states by investigating singlet-triplet oscillations in a planar Ge hole double quantum dot. By varying the magnetic field direction we find an intriguing transformation of the funnel into a butterfly-shaped pattern. Landau-Zener…
▽ More
The spin-orbit interaction is the key element for electrically tunable spin qubits. Here we probe the effect of cubic Rashba spin-orbit interaction on mixing of the spin states by investigating singlet-triplet oscillations in a planar Ge hole double quantum dot. By varying the magnetic field direction we find an intriguing transformation of the funnel into a butterfly-shaped pattern. Landau-Zener sweeps disentangle the Zeeman mixing effect from the spin-orbit induced coupling and show that large singlet-triplet avoided crossings do not imply a strong spin-orbit interaction. Our work emphasizes the need for a complete knowledge of the energy landscape when working with hole spin qubits.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Revisiting Coulomb diamond signatures in quantum Hall interferometers
Authors:
N. Moreau,
S. Faniel,
F. Martins,
L. Desplanque,
X. Wallart,
S. Melinte,
V. Bayot,
B. Hackens
Abstract:
Coulomb diamonds are the archetypal signatures of Coulomb blockade, a well-known charging effect mainly observed in nanometer-sized "electronic islands" tunnel-coupled with charge reservoirs. Here, we identify apparent Coulomb diamond features in the scanning gate spectroscopy of a quantum point contact carved out of a semiconductor heterostructure, in the quantum Hall regime. Varying the scanning…
▽ More
Coulomb diamonds are the archetypal signatures of Coulomb blockade, a well-known charging effect mainly observed in nanometer-sized "electronic islands" tunnel-coupled with charge reservoirs. Here, we identify apparent Coulomb diamond features in the scanning gate spectroscopy of a quantum point contact carved out of a semiconductor heterostructure, in the quantum Hall regime. Varying the scanning gate parameters and the magnetic field, the diamonds are found to smoothly evolve to checkerboard patterns. To explain this surprising behavior, we put forward a model which relies on the presence of a nanometer-sized Fabry-Pérot quantum Hall interferometer at the center of the constriction with tunable tunneling paths coupling the central part of the interferometer to the quantum Hall channels running along the device edges. Both types of signatures, diamonds and checkerboards, and the observed transition, are reproduced by simply varying the interferometer size and the transmission probabilities at the tunneling paths. The new proposed interpretation of diamond phenomenology will likely lead to revisit previous data, and opens the way towards engineering more complex interferometric devices with nanoscale dimensions.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Complex Network-Based Approach for Feature Extraction and Classification of Musical Genres
Authors:
Matheus Henrique Pimenta-Zanon,
Glaucia Maria Bressan,
Fabrício Martins Lopes
Abstract:
Musical genre's classification has been a relevant research topic. The association between music and genres is fundamental for the media industry, which manages musical recommendation systems, and for music streaming services, which may appear classified by genres. In this context, this work presents a feature extraction method for the automatic classification of musical genres, based on complex n…
▽ More
Musical genre's classification has been a relevant research topic. The association between music and genres is fundamental for the media industry, which manages musical recommendation systems, and for music streaming services, which may appear classified by genres. In this context, this work presents a feature extraction method for the automatic classification of musical genres, based on complex networks and their topological measurements. The proposed method initially converts the musics into sequences of musical notes and then maps the sequences as complex networks. Topological measurements are extracted to characterize the network topology, which composes a feature vector that applies to the classification of musical genres. The method was evaluated in the classification of 10 musical genres by adopting the GTZAN dataset and 8 musical genres by adopting the FMA dataset. The results were compared with methods in the literature. The proposed method outperformed all compared methods by presenting high accuracy and low standard deviation, showing its suitability for the musical genre's classification, which contributes to the media industry in the automatic classification with assertiveness and robustness. The proposed method is implemented in an open source in the Python language and freely available at https://github.com/omatheuspimenta/examinner.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
Predicting Attention Sparsity in Transformers
Authors:
Marcos Treviso,
António Góis,
Patrick Fernandes,
Erick Fonseca,
André F. T. Martins
Abstract:
Transformers' quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax. An alternative path, used by entmax transformers, consists of having built-in exact sparse attention; however this approach still requires quadratic computation. In this paper, we propose Sparsefinder, a simple model trained to identify the sparsi…
▽ More
Transformers' quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax. An alternative path, used by entmax transformers, consists of having built-in exact sparse attention; however this approach still requires quadratic computation. In this paper, we propose Sparsefinder, a simple model trained to identify the sparsity pattern of entmax attention before computing it. We experiment with three variants of our method, based on distances, quantization, and clustering, on two tasks: machine translation (attention in the decoder) and masked language modeling (encoder-only). Our work provides a new angle to study model efficiency by doing extensive analysis of the tradeoff between the sparsity and recall of the predicted attention graph. This allows for detailed comparison between different models along their Pareto curves, important to guide future benchmarks for sparse attention models.
△ Less
Submitted 21 April, 2022; v1 submitted 24 September, 2021;
originally announced September 2021.
-
When Does Translation Require Context? A Data-driven, Multilingual Exploration
Authors:
Patrick Fernandes,
Kayo Yin,
Emmy Liu,
André F. T. Martins,
Graham Neubig
Abstract:
Although proper handling of discourse significantly contributes to the quality of machine translation (MT), these improvements are not adequately measured in common translation quality metrics. Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation, however not in a fully systematic way. In this paper, we develop the Multilingual Discourse-Aware (Mu…
▽ More
Although proper handling of discourse significantly contributes to the quality of machine translation (MT), these improvements are not adequately measured in common translation quality metrics. Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation, however not in a fully systematic way. In this paper, we develop the Multilingual Discourse-Aware (MuDA) benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena in any given dataset. The choice of phenomena is inspired by a novel methodology to systematically identify translations requiring context. We confirm the difficulty of previously studied phenomena while uncovering others that were previously unaddressed. We find that common context-aware MT models make only marginal improvements over context-agnostic models, which suggests these models do not handle these ambiguities effectively. We release code and data for 14 language pairs to encourage the MT community to focus on accurately capturing discourse phenomena.
△ Less
Submitted 27 June, 2023; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Uncertainty-Aware Machine Translation Evaluation
Authors:
Taisiya Glushkova,
Chrysoula Zerva,
Ricardo Rei,
André F. T. Martins
Abstract:
Several neural-based metrics have been recently proposed to evaluate machine translation quality. However, all of them resort to point estimates, which provide limited information at segment level. This is made worse as they are trained on noisy, biased and scarce human judgements, often resulting in unreliable quality predictions. In this paper, we introduce uncertainty-aware MT evaluation and an…
▽ More
Several neural-based metrics have been recently proposed to evaluate machine translation quality. However, all of them resort to point estimates, which provide limited information at segment level. This is made worse as they are trained on noisy, biased and scarce human judgements, often resulting in unreliable quality predictions. In this paper, we introduce uncertainty-aware MT evaluation and analyze the trustworthiness of the predicted quality. We combine the COMET framework with two uncertainty estimation methods, Monte Carlo dropout and deep ensembles, to obtain quality scores along with confidence intervals. We compare the performance of our uncertainty-aware MT evaluation methods across multiple language pairs from the QT21 dataset and the WMT20 metrics task, augmented with MQM annotations. We experiment with varying numbers of references and further discuss the usefulness of uncertainty-aware quality estimation (without references) to flag possibly critical translation mistakes.
△ Less
Submitted 24 March, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
SPECTRA: Sparse Structured Text Rationalization
Authors:
Nuno Miguel Guerreiro,
André F. T. Martins
Abstract:
Selective rationalization aims to produce decisions along with rationales (e.g., text highlights or word alignments between two sentences). Commonly, rationales are modeled as stochastic binary masks, requiring sampling-based gradient estimators, which complicates training and requires careful hyperparameter tuning. Sparse attention mechanisms are a deterministic alternative, but they lack a way t…
▽ More
Selective rationalization aims to produce decisions along with rationales (e.g., text highlights or word alignments between two sentences). Commonly, rationales are modeled as stochastic binary masks, requiring sampling-based gradient estimators, which complicates training and requires careful hyperparameter tuning. Sparse attention mechanisms are a deterministic alternative, but they lack a way to regularize the rationale extraction (e.g., to control the sparsity of a text highlight or the number of alignments). In this paper, we present a unified framework for deterministic extraction of structured explanations via constrained inference on a factor graph, forming a differentiable layer. Our approach greatly eases training and rationale regularization, generally outperforming previous work on what comes to performance and plausibility of the extracted rationales. We further provide a comparative study of stochastic and deterministic methods for rationale extraction for classification and natural language inference tasks, jointly assessing their predictive power, quality of the explanations, and model variability.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Computational methods for differentially expressed gene analysis from RNA-Seq: an overview
Authors:
Juliana Costa-Silva,
Douglas S. Domingues,
David Menotti,
Mariangela Hungria,
Fabricio M Lopes
Abstract:
The analysis of differential gene expression from RNA-Seq data has become a standard for several research areas mainly involving bioinformatics. The steps for the computational analysis of these data include many data types and file formats, and a wide variety of computational tools that can be applied alone or together as pipelines. This paper presents a review of differential expression analysis…
▽ More
The analysis of differential gene expression from RNA-Seq data has become a standard for several research areas mainly involving bioinformatics. The steps for the computational analysis of these data include many data types and file formats, and a wide variety of computational tools that can be applied alone or together as pipelines. This paper presents a review of differential expression analysis pipeline, addressing its steps and the respective objectives, the principal methods available in each step and their properties, bringing an overview in an organized way in this context. In particular, this review aims to address mainly the aspects involved in the differentially expressed gene (DEG) analysis from RNA sequencing data (RNA-Seq), considering the computational methods and its properties. In addition, a timeline of the evolution of computational methods for DEG is presented and discussed, as well as the relationships existing between the main computational tools are presented by an interaction network. A discussion on the challenges and gaps in DEG analysis is also highlighted in this review.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
$\infty$-former: Infinite Memory Transformer
Authors:
Pedro Henrique Martins,
Zita Marinho,
André F. T. Martins
Abstract:
Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length. While variations of efficient transformers have been proposed, they all have a finite memory capacity and are forced to drop old information. In this paper, we propose the $\infty$-former, which extends the vanilla transformer with an unbounded long-t…
▽ More
Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length. While variations of efficient transformers have been proposed, they all have a finite memory capacity and are forced to drop old information. In this paper, we propose the $\infty$-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the $\infty$-former's attention complexity becomes independent of the context length, trading off memory length with precision. In order to control where precision is more important, $\infty$-former maintains "sticky memories" being able to model arbitrarily long contexts while kee** the computation budget fixed. Experiments on a synthetic sorting task, language modeling, and document grounded dialogue generation demonstrate the $\infty$-former's ability to retain information from long sequences.
△ Less
Submitted 25 March, 2022; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Sparse Communication via Mixed Distributions
Authors:
António Farinhas,
Wilker Aziz,
Vlad Niculae,
André F. T. Martins
Abstract:
Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-Softmax transf…
▽ More
Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-Softmax transformation) build continuous relaxations that are discrete approximations in the zero-temperature limit, while others (such as sparsemax transformations and the Hard Concrete distribution) produce discrete/continuous hybrids. In this paper, we build rigorous theoretical foundations for these hybrids, which we call "mixed random variables." Our starting point is a new "direct sum" base measure defined on the face lattice of the probability simplex. From this measure, we introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. Our framework suggests two strategies for representing and sampling mixed random variables, an extrinsic ("sample-and-project") and an intrinsic one (based on face stratification). We experiment with both approaches on an emergent communication benchmark and on modeling MNIST and Fashion-MNIST data with variational auto-encoders with mixed latent variables.
△ Less
Submitted 11 February, 2022; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Sparse Continuous Distributions and Fenchel-Young Losses
Authors:
André F. T. Martins,
Marcos Treviso,
António Farinhas,
Pedro M. Q. Aguiar,
Mário A. T. Figueiredo,
Mathieu Blondel,
Vlad Niculae
Abstract:
Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fused…
▽ More
Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fusedmax), has led to distributions with varying support.
This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $Ω$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $Ω$ is a Tsallis negentropy with parameter $α$, we obtain ``deformed exponential families,'' which include $α$-entmax and sparsemax ($α=2$) as particular cases. For quadratic energy functions, the resulting densities are $β$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $Ω$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $α\in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.
△ Less
Submitted 4 August, 2022; v1 submitted 4 August, 2021;
originally announced August 2021.
-
rSoccer: A Framework for Studying Reinforcement Learning in Small and Very Small Size Robot Soccer
Authors:
Felipe B. Martins,
Mateus G. Machado,
Hansenclever F. Bassani,
Pedro H. M. Braga,
Edna S. Barros
Abstract:
Reinforcement learning is an active research area with a vast number of applications in robotics, and the RoboCup competition is an interesting environment for studying and evaluating reinforcement learning methods. A known difficulty in applying reinforcement learning to robotics is the high number of experience samples required, being the use of simulated environments for training the agents fol…
▽ More
Reinforcement learning is an active research area with a vast number of applications in robotics, and the RoboCup competition is an interesting environment for studying and evaluating reinforcement learning methods. A known difficulty in applying reinforcement learning to robotics is the high number of experience samples required, being the use of simulated environments for training the agents followed by transfer learning to real-world (sim-to-real) a viable path. This article introduces an open-source simulator for the IEEE Very Small Size Soccer and the Small Size League optimized for reinforcement learning experiments. We also propose a framework for creating OpenAI Gym environments with a set of benchmarks tasks for evaluating single-agent and multi-agent robot soccer skills. We then demonstrate the learning capabilities of two state-of-the-art reinforcement learning methods as well as their limitations in certain scenarios introduced in this framework. We believe this will make it easier for more teams to compete in these categories using end-to-end reinforcement learning approaches and further develop this research area.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Do Context-Aware Translation Models Pay the Right Attention?
Authors:
Kayo Yin,
Patrick Fernandes,
Danish Pruthi,
Aditi Chaudhary,
André F. T. Martins,
Graham Neubig
Abstract:
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so. As a result, they inaccurately disambiguate pronouns and polysemous words that require context for resolution. In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words? Are models paying large amounts of attention to the same context?…
▽ More
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so. As a result, they inaccurately disambiguate pronouns and polysemous words that require context for resolution. In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words? Are models paying large amounts of attention to the same context? What if we explicitly train them to do so? To answer these questions, we introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations that professional translators found useful for pronoun disambiguation. Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words. Furthermore, we measure the degree of alignment between the model's attention scores and the supporting context from SCAT, and apply a guided attention strategy to encourage agreement between the two.
△ Less
Submitted 7 August, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
Measuring and Increasing Context Usage in Context-Aware Machine Translation
Authors:
Patrick Fernandes,
Kayo Yin,
Graham Neubig,
André F. T. Martins
Abstract:
Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -- context from sentences other than those currently being translated. However, while many current methods present model architectures that theoretically can use this extra context, it is often not clear how much they do actually utilize it at translation time. In this pa…
▽ More
Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -- context from sentences other than those currently being translated. However, while many current methods present model architectures that theoretically can use this extra context, it is often not clear how much they do actually utilize it at translation time. In this paper, we introduce a new metric, conditional cross-mutual information, to quantify the usage of context by these models. Using this metric, we measure how much document-level machine translation systems use particular varieties of context. We find that target context is referenced more than source context, and that conditioning on a longer context has a diminishing effect on results. We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models. Experiments show that our method increases context usage and that this reflects on the translation quality according to metrics such as BLEU and COMET, as well as performance on anaphoric pronoun resolution and lexical cohesion contrastive datasets.
△ Less
Submitted 2 June, 2021; v1 submitted 7 May, 2021;
originally announced May 2021.
-
On the maximum helium content of multiple populations in the globular cluster NGC6752
Authors:
Fabrice Martins,
William Chantereau,
Corinne Charbonnel
Abstract:
Multiple populations in globular clusters are usually explained by the formation of stars out of material with a chemical composition that is polluted to different degrees by the ejecta of short-lived, massive stars of various type. Among other things, these polluters differ by the amount of helium they spread in the surrounding medium. In this study we investigate whether the present-day photomet…
▽ More
Multiple populations in globular clusters are usually explained by the formation of stars out of material with a chemical composition that is polluted to different degrees by the ejecta of short-lived, massive stars of various type. Among other things, these polluters differ by the amount of helium they spread in the surrounding medium. In this study we investigate whether the present-day photometric method used to infer the helium content of multiple populations indeed gives the true value or underestimates it by missing very He-rich, but rare stars. We focus on the specific case of NGC6752. We compute atmosphere models and synthetic spectra along isochrones produced for this cluster for a very broad range of He abundances covering the predictions of different pollution scenarios, including the extreme case of the fast-rotating massive star (FRMS) scenario. We calculate synthetic photometry in HST filters best suited to study the helium content. We subsequently build synthetic clusters with various distributions of stars. We finally determine the maximum helium mass fraction of these synthetic clusters using a method similar to that applied to observational data. We build toy models of clusters with various distributions of multiple populations and ensure that we are able to recover the input maximum Y. We then build synthetic clusters with the populations predicted by the FRMS scenario and find that while we slightly underestimate the maximum Y value, we are still able to detect stars much more He-rich than the current observed maximum Y. It is easier to determine the maximum Y on main sequence stars than on red giant branch stars, but qualitatively the results are unaffected by the sample choice. We show that in NGC6752 it is unlikely that stars more He-rich than the current observational limit of about 0.3 are present.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Multimodal Continuous Visual Attention Mechanisms
Authors:
António Farinhas,
André F. T. Martins,
Pedro M. Q. Aguiar
Abstract:
Visual attention mechanisms are a key component of neural network models for computer vision. By focusing on a discrete set of objects or image regions, these mechanisms identify the most relevant features and use them to build more powerful representations. Recently, continuous-domain alternatives to discrete attention models have been proposed, which exploit the continuity of images. These appro…
▽ More
Visual attention mechanisms are a key component of neural network models for computer vision. By focusing on a discrete set of objects or image regions, these mechanisms identify the most relevant features and use them to build more powerful representations. Recently, continuous-domain alternatives to discrete attention models have been proposed, which exploit the continuity of images. These approaches model attention as simple unimodal densities (e.g. a Gaussian), making them less suitable to deal with images whose region of interest has a complex shape or is composed of multiple non-contiguous patches. In this paper, we introduce a new continuous attention mechanism that produces multimodal densities, in the form of mixtures of Gaussians. We use the EM algorithm to obtain a clustering of relevant regions in the image, and a description length penalty to select the number of components in the mixture. Our densities decompose as a linear combination of unimodal attention mechanisms, enabling closed-form Jacobians for the backpropagation step. Experiments on visual question answering in the VQA-v2 dataset show competitive accuracies and a selection of regions that mimics human attention more closely in VQA-HAT. We present several examples that suggest how multimodal attention maps are naturally more interpretable than their unimodal counterparts, showing the ability of our model to automatically segregate objects from ground in complex scenes.
△ Less
Submitted 7 April, 2021;
originally announced April 2021.
-
Exactly solvable models for 2+1D topological phases derived from crossed modules of semisimple Hopf algebras
Authors:
Vincent Koppen,
João Faria Martins,
Paul Purdon Martin
Abstract:
We define an exactly solvable model for 2+1D topological phases of matter on a triangulated surface derived from a crossed module of semisimple finite-dimensional Hopf algebras, the `Hopf-algebraic higher Kitaev model'. This model generalizes both the Kitaev quantum double model for a semisimple Hopf algebra and the full higher Kitaev model derived from a 2-group, and can hence be interpreted as a…
▽ More
We define an exactly solvable model for 2+1D topological phases of matter on a triangulated surface derived from a crossed module of semisimple finite-dimensional Hopf algebras, the `Hopf-algebraic higher Kitaev model'. This model generalizes both the Kitaev quantum double model for a semisimple Hopf algebra and the full higher Kitaev model derived from a 2-group, and can hence be interpreted as a Hopf-algebraic discrete higher gauge theory.
We construct a family of crossed modules of semisimple Hopf algebras, $(\mathscr{F}_{\mathbb{C}}(X) \otimes \mathbb{C}E \xrightarrow{\partial} \mathscr{F}_{\mathbb{C}}(Y) \rtimes \mathbb{C} G, \triangleright)$, that depends on four finite groups, $E,G,X$ and $Y$. We calculate the ground-state spaces of the resulting model on a triangulated surface when $G=E=\{1\}$ and when $Y=\{1\}$, prove that those ground-state spaces are canonically independent of the triangulations, and so depend only on the underlying surface; and moreover we find a 2+1D TQFT whose state spaces on surfaces give the ground-state spaces. These TQFTs are particular cases of Quinn's finite total homotopy TQFT and hence the state spaces assigned to surfaces are free vector spaces on sets of homotopy classes of maps from a surface to homotopy finite spaces, in this case obtained as classifying spaces of finite groupoids and finite crossed modules of groupoids.
We leave it as an open problem whether the ground-state space of the Hopf-algebraic higher Kitaev model on a triangulated surface is independent of the triangulation for general crossed modules of semisimple Hopf algebras, whether a TQFT always exists whose state space on a surface gives the ground-state space of the model, and whether the ground-state space of the model obtained from $E,G,X,Y$ can always be given a homotopical explanation.
△ Less
Submitted 22 July, 2023; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Reconciling the Discrete-Continuous Divide: Towards a Mathematical Theory of Sparse Communication
Authors:
André F. T. Martins
Abstract:
Neural networks and other machine learning models compute continuous representations, while humans communicate with discrete symbols. Reconciling these two forms of communication is desirable to generate human-readable interpretations or to learn discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-softmax transformation) bui…
▽ More
Neural networks and other machine learning models compute continuous representations, while humans communicate with discrete symbols. Reconciling these two forms of communication is desirable to generate human-readable interpretations or to learn discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-softmax transformation) build continuous relaxations that are discrete approximations in the zero-temperature limit, while others (such as sparsemax transformations and the hard concrete distribution) produce discrete/continuous hybrids. In this paper, we build rigorous theoretical foundations for these hybrids. Our starting point is a new "direct sum" base measure defined on the face lattice of the probability simplex. From this measure, we introduce a new entropy function that includes the discrete and differential entropies as particular cases, and has an interpretation in terms of code optimality, as well as two other information-theoretic counterparts that generalize the mutual information and Kullback-Leibler divergences. Finally, we introduce "mixed languages" as strings of hybrid symbols and a new mixed weighted finite state automaton that recognizes a class of regular mixed languages, generalizing closure properties of regular languages.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
The effects of surface fossil magnetic fields on massive star evolution: III. The case of $τ$ Sco
Authors:
Z. Keszthelyi,
G. Meynet,
F. Martins,
A. de Koter,
A. David-Uraz
Abstract:
$τ…
▽ More
$τ$ Sco, a well-studied magnetic B-type star in the Upper Sco association, has a number of surprising characteristics. It rotates very slowly and shows nitrogen excess. Its surface magnetic field is much more complex than a purely dipolar configuration which is unusual for a magnetic massive star. We employ the CMFGEN radiative transfer code to determine the fundamental parameters and surface CNO and helium abundances. Then, we employ MESA and GENEC stellar evolution models accounting for the effects of surface magnetic fields. To reconcile $τ$ Sco's properties with single-star models, an increase is necessary in the efficiency of rotational mixing by a factor of 3 to 10 and in the efficiency of magnetic braking by a factor of 10. The spin down could be explained by assuming a magnetic field decay scenario. However, the simultaneous chemical enrichment challenges the single-star scenario. Previous works indeed suggested a stellar merger origin for $τ$ Sco. However, the merger scenario also faces similar challenges as our magnetic single-star models to explain $τ$ Sco's simultaneous slow rotation and nitrogen excess. In conclusion, the single-star channel seems less likely and versatile to explain these discrepancies, while the merger scenario and other potential binary-evolution channels still require further assessment as to whether they may self-consistently explain the observables of $τ$ Sco.
△ Less
Submitted 23 April, 2021; v1 submitted 24 March, 2021;
originally announced March 2021.
-
Motion groupoids and map** class groupoids
Authors:
Fiona Torzewska,
João Faria Martins,
Paul Purdon Martin
Abstract:
Here $\underline{M}$ denotes a pair $(M,A)$ of a manifold and a subset (e.g. $A=\partial M$ or $A=\emptyset$). We construct for each $\underline{M}$ its motion groupoid $\mathrm{Mot}_{\underline{M}}$, whose object set is the power set $ {\mathcal P} M$ of $M$, and whose morphisms are certain equivalence classes of continuous flows of the `ambient space' $M$, that fix $A$, acting on…
▽ More
Here $\underline{M}$ denotes a pair $(M,A)$ of a manifold and a subset (e.g. $A=\partial M$ or $A=\emptyset$). We construct for each $\underline{M}$ its motion groupoid $\mathrm{Mot}_{\underline{M}}$, whose object set is the power set $ {\mathcal P} M$ of $M$, and whose morphisms are certain equivalence classes of continuous flows of the `ambient space' $M$, that fix $A$, acting on ${\mathcal P} M$. These groupoids generalise the classical definition of a motion group associated to a manifold $M$ and a submanifold $N$, which can be recovered by considering the automorphisms in $\mathrm{Mot}_{\underline{M}}$ of $N\in {\mathcal P} M$. We also construct the map** class groupoid $\mathrm{MCG}_{\underline{M}}$ associated to a pair $\underline{M}$ with the same object class, whose morphisms are now equivalence classes of homeomorphisms of $M$, that fix $A$. We recover the classical definition of the map** class group of a pair by taking automorphisms at the appropriate object. For each pair $\underline{M}$ we explicitly construct a functor $\mathsf{F}\colon \mathrm{Mot}_{\underline{M}} \to \mathrm{MCG}_{\underline{M}}$, which is the identity on objects, and prove that this is full and faithful, and hence an isomorphism, if $π_0$ and $π_1$ of the appropriate space of self-homeomorphisms of $M$ are trivial. In particular, we have an isomorphism in the physically important case $\underline{M}=([0,1]^n, \partial [0,1]^n)$, for any $n\in \mathbb{N}$. We show that the congruence relation used in the construction $\mathrm{Mot}_{\underline{M}}$ can be formulated entirely in terms of a level preserving isotopy relation on the trajectories of objects under flows -- worldlines (e.g. monotonic `tangles'). We examine several explicit examples of $\mathrm{Mot}_{\underline{M}}$ and $\mathrm{MCG}_{\underline{M}}$ demonstrating the utility of the constructions.
△ Less
Submitted 6 September, 2023; v1 submitted 18 March, 2021;
originally announced March 2021.
-
Smoothing and Shrinking the Sparse Seq2Seq Search Space
Authors:
Ben Peters,
André F. T. Martins
Abstract:
Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is its length bias: models give high scores to short, inadequate hypotheses and often make the empty string the argmax -- the so-called cat got your…
▽ More
Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is its length bias: models give high scores to short, inadequate hypotheses and often make the empty string the argmax -- the so-called cat got your tongue problem. Recently proposed entmax-based sparse sequence-to-sequence models present a possible solution, since they can shrink the search space by assigning zero probability to bad hypotheses, but their ability to handle word-level tasks with transformers has never been tested. In this work, we show that entmax-based models effectively solve the cat got your tongue problem, removing a major source of model error for neural machine translation. In addition, we generalize label smoothing, a critical regularization technique, to the broader family of Fenchel-Young losses, which includes both cross-entropy and the entmax losses. Our resulting label-smoothed entmax loss models set a new state of the art on multilingual grapheme-to-phoneme conversion and deliver improvements and better calibration properties on cross-lingual morphological inflection and machine translation for 6 language pairs.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
A Voronoi-tessellation-based approach for detection of coherent structures in sparsely-seeded flows
Authors:
F. A. C. Martins,
D. E. Rival
Abstract:
A novel algorithm to detect coherent structures with sparse Lagrangian particle tracking data, using Voronoi tessellation and techniques from spectral graph theory, is tested. Neighbouring tracer particles are naturally identified through the Voronoi tessellation of the tracers' distribution. The method examines the \textit{neighbouring time} of tracer trajectories, defined as the total flow time…
▽ More
A novel algorithm to detect coherent structures with sparse Lagrangian particle tracking data, using Voronoi tessellation and techniques from spectral graph theory, is tested. Neighbouring tracer particles are naturally identified through the Voronoi tessellation of the tracers' distribution. The method examines the \textit{neighbouring time} of tracer trajectories, defined as the total flow time two Voronoi cells share a common Voronoi edge, by converting this information into a Cartesian distance in the graph representation of the Voronoi diagram. Coherence is assigned to groups of Voronoi cells whose neighbouring time remains high throughout the time interval of analysis. The technique is first tested on the two-dimensional synthetic data of a double-gyre flow, and then with challenging, large-scale three-dimensional Lagrangian particle tracking data behind a bluff body at high Reynolds number. The tested technique proves to be successful at identifying coherence with realistic experimental data. Specifically, it is shown that coherent tracer motion is identifiable for mean inter-particle distances of the order of the largest length scales in the flow.
△ Less
Submitted 28 July, 2021; v1 submitted 17 March, 2021;
originally announced March 2021.
-
Study of energy response and resolution of the ATLAS Tile Calorimeter to hadrons of energies from 16 to 30 GeV
Authors:
Jalal Abdallah,
Stylianos Angelidakis,
Giorgi Arabidze,
Nikolay Atanov,
Johannes Bernhard,
Romeo Bonnefoy,
Jonathan Bossio,
Ryan Bouabid,
Fernando Carrio,
Tomas Davidek,
Michal Dubovsky,
Luca Fiorini,
Francisco Brandan Garcia Aparisi,
Tancredi Carli,
Alexander Gerbershagen,
Hazal Goksu,
Haleh Hadavand,
Siarhei Harkusha,
Dingane Hlaluku,
Michael James Hibbard,
Kevin Hildebrand,
Juansher Jejelava,
Andrey Kamenshchikov,
Stergios Kazakos,
Tomas Kello
, et al. (46 additional authors not shown)
Abstract:
Three spare modules of the ATLAS Tile Calorimeter were exposed to test beams from the Super Proton Synchrotron accelerator at CERN in 2017. The measurements of the energy response and resolution of the detector to positive pions and kaons and protons with energy in the range 16 to 30 GeV are reported. The results have uncertainties of few percent. They were compared to the predictions of the Geant…
▽ More
Three spare modules of the ATLAS Tile Calorimeter were exposed to test beams from the Super Proton Synchrotron accelerator at CERN in 2017. The measurements of the energy response and resolution of the detector to positive pions and kaons and protons with energy in the range 16 to 30 GeV are reported. The results have uncertainties of few percent. They were compared to the predictions of the Geant4-based simulation program used in ATLAS to estimate the response of the detector to proton-proton events at Large Hadron Collider. The determinations obtained using experimental and simulated data agree within the uncertainties.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.