Search | arXiv e-print repository

I Have an Attention Bridge to Sell You: Generalization Capabilities of Modular Translation Architectures

Authors: Timothee Mickus, Raúl Vázquez, Joseph Attieh

Abstract: Modularity is a paradigm of machine translation with the potential of bringing forth models that are large at training time and small during inference. Within this field of study, modular approaches, and in particular attention bridges, have been argued to improve the generalization capabilities of models by fostering language-independent representations. In the present paper, we study whether mod… ▽ More Modularity is a paradigm of machine translation with the potential of bringing forth models that are large at training time and small during inference. Within this field of study, modular approaches, and in particular attention bridges, have been argued to improve the generalization capabilities of models by fostering language-independent representations. In the present paper, we study whether modularity affects translation quality; as well as how well modular architectures generalize across different evaluation scenarios. For a given computational budget, we find non-modular architectures to be always comparable or preferable to all modular designs we study. △ Less

Submitted 30 April, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

arXiv:2403.07726 [pdf, other]

SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

Authors: Timothee Mickus, Elaine Zosa, Raúl Vázquez, Teemu Vahtola, Jörg Tiedemann, Vincent Segonne, Alessandro Raganato, Marianna Apidianaki

Abstract: This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate. Such cases of overgeneration put in jeopardy many NLG applications, where correctness is often mission-critical. The shared task was conducted with a newly constructed dataset of 4000 model outputs labeled by 5 ann… ▽ More This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate. Such cases of overgeneration put in jeopardy many NLG applications, where correctness is often mission-critical. The shared task was conducted with a newly constructed dataset of 4000 model outputs labeled by 5 annotators each, spanning 3 NLP tasks: machine translation, paraphrase generation and definition modeling. The shared task was tackled by a total of 58 different users grouped in 42 teams, out of which 27 elected to write a system description paper; collectively, they submitted over 300 prediction sets on both tracks of the shared task. We observe a number of key trends in how this approach was tackled -- many participants rely on a handful of model, and often rely either on synthetic data for fine-tuning or zero-shot prompting strategies. While a majority of the teams did outperform our proposed baseline system, the performances of top-scoring systems are still consistent with a random handling of the more challenging items. △ Less

Submitted 29 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: SemEval 2024 shared task. Pre-review version

arXiv:2403.07544 [pdf, other]

MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki

Authors: Timothee Mickus, Stig-Arne Grönroos, Joseph Attieh, Michele Boggia, Ona De Gibert, Shaoxiong Ji, Niki Andreas Lopi, Alessandro Raganato, Raúl Vázquez, Jörg Tiedemann

Abstract: NLP in the age of monolithic large language models is approaching its limits in terms of size and information that can be handled. The trend goes to modularization, a necessary step into the direction of designing smaller sub-networks and components with specialized functionality. In this paper, we present the MAMMOTH toolkit: a framework designed for training massively multilingual modular machin… ▽ More NLP in the age of monolithic large language models is approaching its limits in terms of size and information that can be handled. The trend goes to modularization, a necessary step into the direction of designing smaller sub-networks and components with specialized functionality. In this paper, we present the MAMMOTH toolkit: a framework designed for training massively multilingual modular machine translation systems at scale, initially derived from OpenNMT-py and then adapted to ensure efficient training across computation clusters. We showcase its efficiency across clusters of A100 and V100 NVIDIA GPUs, and discuss our design philosophy and plans for future information. The toolkit is publicly available online. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: Presented as a demo at EACL 2024

arXiv:2401.02511 [pdf, other]

Gain Scheduling with a Neural Operator for a Transport PDE with Nonlinear Recirculation

Authors: Maxence Lamarque, Luke Bhan, Rafael Vazquez, Miroslav Krstic

Abstract: To stabilize PDE models, control laws require space-dependent functional gains mapped by nonlinear operators from the PDE functional coefficients. When a PDE is nonlinear and its "pseudo-coefficient" functions are state-dependent, a gain-scheduling (GS) nonlinear design is the simplest approach to the design of nonlinear feedback. The GS version of PDE backstep** employs gains obtained by solvin… ▽ More To stabilize PDE models, control laws require space-dependent functional gains mapped by nonlinear operators from the PDE functional coefficients. When a PDE is nonlinear and its "pseudo-coefficient" functions are state-dependent, a gain-scheduling (GS) nonlinear design is the simplest approach to the design of nonlinear feedback. The GS version of PDE backstep** employs gains obtained by solving a PDE at each value of the state. Performing such PDE computations in real time may be prohibitive. The recently introduced neural operators (NO) can be trained to produce the gain functions, rapidly in real time, for each state value, without requiring a PDE solution. In this paper we introduce NOs for GS-PDE backstep**. GS controllers act on the premise that the state change is slow and, as a result, guarantee only local stability, even for ODEs. We establish local stabilization of hyperbolic PDEs with nonlinear recirculation using both a "full-kernel" approach and the "gain-only" approach to gain operator approximation. Numerical simulations illustrate stabilization and demonstrate speedup by three orders of magnitude over traditional PDE gain-scheduling. Code (Github) for the numerical implementation is published to enable exploration. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 16 pages, 5 figures

arXiv:2310.14313 [pdf, other]

Arbitrary order spline representation of cohomology generators for isogeometric analysis of eddy current problems

Authors: Bernard Kapidani, Melina Merkel, Sebastian Schöps, Rafael Vázquez

Abstract: The eddy current problem has many relevant practical applications in science, ranging from non-destructive testing to magnetic confinement of plasma in fusion reactors. It arises when electrical conductors are immersed in an external time-varying magnetic field operating at frequencies for which electromagnetic wave propagation effects can be neglected. Popular formulations of the eddy current p… ▽ More The eddy current problem has many relevant practical applications in science, ranging from non-destructive testing to magnetic confinement of plasma in fusion reactors. It arises when electrical conductors are immersed in an external time-varying magnetic field operating at frequencies for which electromagnetic wave propagation effects can be neglected. Popular formulations of the eddy current problem either use the magnetic vector potential or the magnetic scalar potential. They have individual advantages and disadvantages. One challenge is related to differential geometry: Scalar potential based formulations run into trouble when conductors are present in non-trivial topology, as approximation spaces must be then augmented with generators of the first cohomology group of the non-conducting domain. For all existing algorithms based on lowest order methods it is assumed that the extension of the graph-based algorithms to high-order approximations requires hierarchical bases for the curl-conforming discrete spaces. However, building on insight on de Rham complexes approximation with splines, we will show in the present submission that the hierarchical basis condition is not necessary. Algorithms based on spanning tree techniques can instead be adapted to work on an underlying hexahedral mesh arising from isomorphisms between spline spaces of differential forms and de Rham complexes on an auxiliary control mesh. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.06977 [pdf, other]

Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings

Authors: Timothee Mickus, Raúl Vázquez

Abstract: A recent body of work has demonstrated that Transformer embeddings can be linearly decomposed into well-defined sums of factors, that can in turn be related to specific network inputs or components. There is however still a dearth of work studying whether these mathematical reformulations are empirically meaningful. In the present work, we study representations from machine-translation decoders us… ▽ More A recent body of work has demonstrated that Transformer embeddings can be linearly decomposed into well-defined sums of factors, that can in turn be related to specific network inputs or components. There is however still a dearth of work studying whether these mathematical reformulations are empirically meaningful. In the present work, we study representations from machine-translation decoders using two of such embedding decomposition methods. Our results indicate that, while decomposition-derived indicators effectively correlate with model performance, variation across different runs suggests a more nuanced take on this question. The high variability of our measurements indicate that geometry reflects model-specific characteristics more than it does sentence-specific computations, and that similar training conditions do not guarantee similar vector spaces. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted to BlackBoxNLP 2023

arXiv:2212.01936 [pdf, other]

Democratizing Neural Machine Translation with OPUS-MT

Authors: Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

Abstract: This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. We discuss our on-going mission of increasing language coverage and translation quality, and also describe on-going work on the development of modular translation models and speed-opt… ▽ More This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. We discuss our on-going mission of increasing language coverage and translation quality, and also describe on-going work on the development of modular translation models and speed-optimized compact solutions for real-time translation on regular desktops and small devices. △ Less

Submitted 4 July, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

arXiv:2202.05771 [pdf, other]

doi 10.1109/TMAG.2022.3186247

Torque Computation with the Isogeometric Mortar Method for the Simulation of Electric Machines

Authors: Melina Merkel, Bernard Kapidani, Sebastian Schöps, Rafael Vázquez

Abstract: In this work isogeometric mortaring is used for the simulation of a six pole permanent magnet synchronous machine. Isogeometric mortaring is especially well suited for the efficient computation of rotating electric machines as it allows for an exact geometry representation for arbitrary rotation angles without the need of remeshing. The appropriate B-spline spaces needed for the solution of Maxwel… ▽ More In this work isogeometric mortaring is used for the simulation of a six pole permanent magnet synchronous machine. Isogeometric mortaring is especially well suited for the efficient computation of rotating electric machines as it allows for an exact geometry representation for arbitrary rotation angles without the need of remeshing. The appropriate B-spline spaces needed for the solution of Maxwell's equations and the corresponding mortar spaces are introduced. Unlike in classical finite element methods their construction is straightforward in the isogeometric case. The torque in the machine is computed using two different methods, i.e., Arkkio's method and by using the Lagrange multipliers from the mortaring. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Journal ref: IEEE Transactions on Magnetics, vol. 58, no. 9, Sept. 2022, Art no. 8107604

arXiv:2110.15860 [pdf, other]

doi 10.1016/j.cma.2022.114949

Tree-Cotree Decomposition of Isogeometric Mortared Spaces in H(curl) on Multi-Patch Domains

Authors: Bernard Kapidani, Melina Merkel, Sebastian Schöps, Rafael Vázquez

Abstract: When applying isogeometric analysis to engineering problems, one often deals with multi-patch spline spaces that have incompatible discretisations, e.g. in the case of moving objects. In such cases mortaring has been shown to be advantageous. This contribution discusses the appropriate B-spline spaces needed for the solution of Maxwell's equations in the functions space H(curl) and the correspondi… ▽ More When applying isogeometric analysis to engineering problems, one often deals with multi-patch spline spaces that have incompatible discretisations, e.g. in the case of moving objects. In such cases mortaring has been shown to be advantageous. This contribution discusses the appropriate B-spline spaces needed for the solution of Maxwell's equations in the functions space H(curl) and the corresponding mortar spaces. The main contribution of this paper is to show that in formulations requiring gauging, as in the vector potential formulation of magnetostatic equations, one can remove the discrete kernel subspace from the mortared spaces by the graph-theoretical concept of a tree-cotree decomposition. The tree-cotree decomposition is done based on the control mesh, it works for non-contractible domains, and it can be straightforwardly applied independently of the degree of the B-spline bases. Finally, the simulation workflow is demonstrated using a realistic model of a rotating permanent magnet synchronous machine. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Journal ref: Computer Methods in Applied Mechanics and Engineering, Vol. 395, pp. 114949, 2022

arXiv:1906.04040 [pdf, other]

The University of Helsinki submissions to the WMT19 news translation task

Authors: Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann

Abstract: In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both senten… ▽ More In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German, we trained both sentence-level transformer models and compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches, and we also included a rule-based system for English-Finnish. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: To appear in WMT19

arXiv:1901.00759 [pdf, other]

Isogeometric Mortar Coupling for Electromagnetic Problems

Authors: Annalisa Buffa, Jacopo Corno, Carlo de Falco, Sebastian Schöps, Rafael Vázquez

Abstract: This paper discusses and analyses two domain decomposition approaches for electromagnetic problems that allow the combination of domains discretised by either Nédélec-type polynomial finite elements or spline-based isogeometric analysis. The first approach is a new isogeometric mortar method and the second one is based on a modal basis for the Lagrange multiplier space, called state-space concaten… ▽ More This paper discusses and analyses two domain decomposition approaches for electromagnetic problems that allow the combination of domains discretised by either Nédélec-type polynomial finite elements or spline-based isogeometric analysis. The first approach is a new isogeometric mortar method and the second one is based on a modal basis for the Lagrange multiplier space, called state-space concatenation in the engineering literature. Spectral correctness and in particular inf-sup stability of both approaches are analytically and numerically investigated. The new mortar method is shown to be unconditionally stable. Its construction of the discrete Lagrange multiplier space takes advantage of the high continuity of splines, and does not have an analogue for Nédélec finite elements. On the other hand, the approach with modal basis is easier to implement but relies on application knowledge to ensure stability and correctness. △ Less

Submitted 27 December, 2018; originally announced January 2019.

MSC Class: 35Q60; 49M27; 65D07; 68Q25; 68R10; 68U05; 78M10

arXiv:1811.00498 [pdf, other]

doi 10.18653/v1/W19-4305

Multilingual NMT with a language-independent attention bridge

Authors: Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann, Mathias Creutz

Abstract: In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate {\em attention bridge} that is shared across all languages. That is, we train the model with language-specific encoders and decoders that are connected via self-attention with a shared layer that we call attention bridge. This la… ▽ More In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate {\em attention bridge} that is shared across all languages. That is, we train the model with language-specific encoders and decoders that are connected via self-attention with a shared layer that we call attention bridge. This layer exploits the semantics from each language for performing translation and develops into a language-independent meaning representation that can efficiently be used for transfer learning. We present a new framework for the efficient development of multilingual NMT using this model and scheduled training. We have tested the approach in a systematic way with a multi-parallel data set. We show that the model achieves substantial improvements over strong bilingual models and that it also works well for zero-shot translation, which demonstrates its ability of abstraction and transfer learning. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Journal ref: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019) Pages 33-39

arXiv:1811.00111 [pdf, other]

doi 10.1080/00207179.2018.1543896

On finite-time and fixed-time consensus algorithms for dynamic networks switching among disconnected digraphs

Authors: David Gómez-Gutiérrez, Carlos Renato Vázquez, Sergej Čelikovský, Juan Diego Sánchez-Torres, Javier Ruiz León

Abstract: The aim of this paper is to analyze a class of consensus algorithms with finite-time or fixed-time convergence for dynamic networks formed by agents with first-order dynamics. In particular, in the analyzed class a single evaluation of a nonlinear function of the consensus error is performed per each node. The classical assumption of switching among connected graphs is dropped here, allowing to re… ▽ More The aim of this paper is to analyze a class of consensus algorithms with finite-time or fixed-time convergence for dynamic networks formed by agents with first-order dynamics. In particular, in the analyzed class a single evaluation of a nonlinear function of the consensus error is performed per each node. The classical assumption of switching among connected graphs is dropped here, allowing to represent failures and intermittent communications between agents. Thus, conditions to guarantee finite and fixed-time convergence, even while switching among disconnected graphs, are provided. Moreover, the algorithms of the considered class are shown to be computationally simpler than previously proposed finite-time consensus algorithms for dynamic networks, which is an important feature in scenarios with computationally limited nodes and energy efficiency requirements such as in sensor networks. The performance of the considered consensus algorithms is illustrated through simulations, comparing it to existing approaches for dynamic networks with finite-time and fixed-time convergence. It is shown that the settling time of the considered algorithms grows slower when the number of nodes increases than with other consensus algorithms for dynamic networks. △ Less

Submitted 25 June, 2021; v1 submitted 31 October, 2018; originally announced November 2018.

Comments: Please cite the publisher's version}. For the publisher's version and full citation details see: https://doi.org/10.1080/00207179.2018.1543896 The following links provide access, for a limited time, to a free copy of the publisher's version: https://www.tandfonline.com/eprint/FSW8JJRVPHMXJ3XUUXZH/full?target=10.1080/00207179.2018.1543896

Journal ref: International Journal of Control, 93(9), 2120-2134, 2020

arXiv:1808.10802 [pdf, other]

The MeMAD Submission to the WMT18 Multimodal Translation Task

Authors: Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy, Raúl Vázquez

Abstract: This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and E… ▽ More This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective. △ Less

Submitted 3 September, 2018; v1 submitted 31 August, 2018; originally announced August 2018.

Comments: To appear in WMT18

arXiv:1709.06004 [pdf, other]

Recent Advances of Isogeometric Analysis in Computational Electromagnetics

Authors: Zeger Bontinck, Jacopo Corno, Herbert De Gersem, Stefan Kurz, Andreas Pels, Sebastian Schöps, Felix Wolf, Carlo de Falco, Jürgen Dölz, Rafael Vázquez, Ulrich Römer

Abstract: In this communication the advantages and drawbacks of the isogeometric analysis (IGA) are reviewed in the context of electromagnetic simulations. IGA extends the set of polynomial basis functions, commonly employed by the classical Finite Element Method (FEM). While identical to FEM with Nédélec's basis functions in the lowest order case, it is based on B-spline and Non-Uniform Rational B-spline b… ▽ More In this communication the advantages and drawbacks of the isogeometric analysis (IGA) are reviewed in the context of electromagnetic simulations. IGA extends the set of polynomial basis functions, commonly employed by the classical Finite Element Method (FEM). While identical to FEM with Nédélec's basis functions in the lowest order case, it is based on B-spline and Non-Uniform Rational B-spline basis functions. The main benefit of this is the exact representation of the geometry in the language of computer aided design (CAD) tools. This simplifies the meshing as the computational mesh is implicitly created by the engineer using the CAD tool. The curl- and div-conforming spline function spaces are recapitulated and the available software is discussed. Finally, several non-academic benchmark examples in two and three dimensions are shown which are used in optimization and uncertainty quantification workflows. △ Less

Submitted 18 September, 2017; originally announced September 2017.

Comments: submitted to the ICS Newsletter

MSC Class: 78A30; 78A40; 74F15; 65N30; 65N25 ACM Class: G.1.8; F.2.1; J.2

Showing 1–15 of 15 results for author: Vazquez, R