Skip to main content

Showing 1–21 of 21 results for author: Martins, P H

.
  1. arXiv:2402.17733  [pdf, other

    cs.CL

    Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

    Authors: Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, André F. T. Martins

    Abstract: While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and pa… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  2. arXiv:2402.00786  [pdf, other

    cs.CL cs.LG

    CroissantLLM: A Truly Bilingual French-English Language Model

    Authors: Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F. T. Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo

    Abstract: We introduce CroissantLLM, a 1.3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware. To that end, we pioneer the approach of training an intrinsically bilingual model with a 1:1 English-to-French pretraining data ratio, a cust… ▽ More

    Submitted 29 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  3. arXiv:2305.00955  [pdf, other

    cs.CL cs.AI cs.LG

    Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

    Authors: Patrick Fernandes, Aman Madaan, Emmy Liu, António Farinhas, Pedro Henrique Martins, Amanda Bertsch, José G. C. de Souza, Shuyan Zhou, Tongshuang Wu, Graham Neubig, André F. T. Martins

    Abstract: Many recent advances in natural language generation have been fueled by training large language models on internet-scale data. However, this paradigm can lead to models that generate toxic, inaccurate, and unhelpful content, and automatic evaluation metrics often fail to identify these behaviors. As models become more capable, human feedback is an invaluable signal for evaluating and improving mod… ▽ More

    Submitted 31 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Work in Progress

  4. arXiv:2211.04622  [pdf, ps, other

    cond-mat.stat-mech

    Percolation in two-species antagonistic random sequential adsorption in two dimensions

    Authors: Paulo H. L. Martins, Ronald Dickman, Robert M. Ziff

    Abstract: We consider two-species random sequential adsorption (RSA) in which species A and B adsorb randomly on a lattice with the restriction that opposite species cannot occupy nearest-neighbor sites. When the probability $x_A$ of choosing an A particle for an adsorption trial reaches a critical value $0.626441(1)$, the A species percolates and/or the blocked sites X (those with at least one A and one B… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  5. arXiv:2209.00099  [pdf, other

    cs.CL

    Efficient Methods for Natural Language Processing: A Survey

    Authors: Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

    Abstract: Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few… ▽ More

    Submitted 24 March, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: Accepted at TACL, pre publication version

  6. arXiv:2205.12230  [pdf, other

    cs.CL

    Chunk-based Nearest Neighbor Machine Translation

    Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins

    Abstract: Semi-parametric models, which augment generation with retrieval, have led to impressive results in language modeling and machine translation, due to their ability to retrieve fine-grained information from a datastore of examples. One of the most prominent approaches, $k$NN-MT, exhibits strong domain adaptation capabilities by retrieving tokens from domain-specific datastores \citep{khandelwal2020n… ▽ More

    Submitted 7 November, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

  7. arXiv:2204.12608  [pdf, other

    cs.CL

    Efficient Machine Translation Domain Adaptation

    Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins

    Abstract: Machine translation models struggle when translating out-of-domain text, which makes domain adaptation a topic of critical importance. However, most domain adaptation methods focus on fine-tuning or training the entire or part of the model on every new domain, which can be costly. On the other hand, semi-parametric models have been shown to successfully perform domain adaptation by retrieving exam… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: Workshop Semiparametric Methods in NLP: Decoupling Logic from Knowledge

  8. arXiv:2109.00301  [pdf, other

    cs.CL

    $\infty$-former: Infinite Memory Transformer

    Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins

    Abstract: Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length. While variations of efficient transformers have been proposed, they all have a finite memory capacity and are forced to drop old information. In this paper, we propose the $\infty$-former, which extends the vanilla transformer with an unbounded long-t… ▽ More

    Submitted 25 March, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: ACL 2022

  9. arXiv:2102.01672  [pdf, other

    cs.CL cs.AI cs.LG

    The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

    Authors: Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Aremu Anuoluwapo, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna Clinciu, Dipanjan Das, Kaustubh D. Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak , et al. (31 additional authors not shown)

    Abstract: We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it… ▽ More

    Submitted 1 April, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

  10. arXiv:2008.13037  [pdf, ps, other

    cond-mat.stat-mech

    An entropic simulational study of the spin-$1$ Baxter-Wu model in a crystal field

    Authors: L. N. Jorge, P. H. L. Martins, C. J. DaSilva, L. S. Ferreira, A. A. Caparica

    Abstract: We investigate the critical behavior of the two-dimensional spin-$1$ Baxter-Wu model in a crystal field using entropic sampling simulations with the joint density of states. We obtain the temperature-crystal field phase diagram, which includes a tetracritical line ending at a pentacritical point. A finite-size scaling analysis of the maximum of the specific heat, while changing the crystal field a… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

    Comments: 7 pages. 7 figures

  11. arXiv:2004.02644  [pdf, other

    cs.CL

    Sparse Text Generation

    Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins

    Abstract: Current state-of-the-art text generators build on powerful language models such as GPT-2, achieving impressive performance. However, to avoid degenerate text, they require sampling from a modified softmax, via temperature parameters or ad-hoc truncation techniques, as in top-$k$ or nucleus sampling. This creates a mismatch between training and testing conditions. In this paper, we use the recently… ▽ More

    Submitted 5 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

  12. arXiv:2002.05556  [pdf, other

    cs.CL cs.CV

    Sparse and Structured Visual Attention

    Authors: Pedro Henrique Martins, Vlad Niculae, Zita Marinho, André Martins

    Abstract: Visual attention mechanisms are widely used in multimodal tasks, as visual question answering (VQA). One drawback of softmax-based attention mechanisms is that they assign some probability mass to all image regions, regardless of their adjacency structure and of their relevance to the text. In this paper, to better link the image structure with the text, we replace the traditional softmax attentio… ▽ More

    Submitted 8 July, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

  13. arXiv:1907.08243  [pdf, other

    cs.CL

    Joint Learning of Named Entity Recognition and Entity Linking

    Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins

    Abstract: Named entity recognition (NER) and entity linking (EL) are two fundamentally related tasks, since in order to perform EL, first the mentions to entities have to be detected. However, most entity linking approaches disregard the mention detection part, assuming that the correct mentions have been previously detected. In this paper, we perform joint learning of NER and EL to leverage their relatedne… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

  14. arXiv:1807.03053  [pdf, other

    cs.CL cs.RO

    A deep learning approach for understanding natural language commands for mobile service robots

    Authors: Pedro Henrique Martins, Luís Custódio, Rodrigo Ventura

    Abstract: Using natural language to give instructions to robots is challenging, since natural language understanding is still largely an open problem. In this paper we address this problem by restricting our attention to commands modeled as one action, plus arguments (also known as slots). For action detection (also called intent detection) and slot filling various architectures of Recurrent Neural Networks… ▽ More

    Submitted 9 July, 2018; originally announced July 2018.

  15. arXiv:1805.11459  [pdf, ps, other

    cond-mat.stat-mech

    Adsorption of flexible polymer chains on a surface: Effects of different solvent conditions

    Authors: P. H. L. Martins, J. A. Plascak, M. Bachmann

    Abstract: Polymer chains undergoing a continuous adsorption-desorption transition are studied through extensive computer simulations. A three-dimensional self-avoiding walk lattice model of a polymer chain grafted onto a surface has been treated for different solvent conditions. We have used an advanced contact-density chain-growth algorithm, in which the density of contacts can be directly obtained. From t… ▽ More

    Submitted 25 May, 2018; originally announced May 2018.

    Comments: 10 pages, 11 figures. arXiv admin note: text overlap with arXiv:1705.02645

    Journal ref: The Journal of Chemical Physics 148, 204901 (2018)

  16. Solvent-Dependent Critical Properties of Polymer Adsorption

    Authors: J. A. Plascak, Paulo H. L. Martins, Michael Bachmann

    Abstract: Advanced chain-growth computer simulation methodologies have been employed for a systematic statistical analysis of the critical behavior of a polymer adsorbing at a substrate. We use finitesize scaling techniques to investigate the solvent-quality dependence of critical exponents, critical temperature, and the structure of the phase diagram. Our study covers all solvent effects from the limit of… ▽ More

    Submitted 7 May, 2017; originally announced May 2017.

    Comments: 6 pages, 5 figures

    Journal ref: Physical Review E 95, 050501(R) (2017)

  17. arXiv:1209.1818  [pdf, ps, other

    cond-mat.stat-mech physics.comp-ph

    Probability distribution of the order parameter in the directed percolation universality class

    Authors: P. H. L. Martins

    Abstract: The probability distributions of the order parameter for two models in the directed percolation universality class were evaluated. Monte Carlo simulations have been performed for the one-dimensional generalized contact process and the Domany-Kinzel cellular automaton. In both cases, the density of active sites was chosen as the order parameter. The criticality of those models was obtained by solel… ▽ More

    Submitted 9 September, 2012; originally announced September 2012.

    Comments: 6 pages, 4 figures

    Journal ref: P. H. L. Martins, Phys. Rev. E 85, 041110 (2012)

  18. arXiv:1209.1815  [pdf, ps, other

    cond-mat.stat-mech physics.comp-ph

    Probability Distribution Function of the Order Parameter: Mixing Fields and Universality

    Authors: J. A. Plascak, P. H. L. Martins

    Abstract: We briefly review the use of the order parameter probability distribution function as a useful tool to obtain the critical properties of statistical mechanical models using computer Monte Carlo simulations. Some simple discrete spin magnetic systems on a lattice, such as Ising, general spin-$S$ Blume-Capel and Baxter-Wu, $Q$-state Potts, among other models, will be considered as examples. The impo… ▽ More

    Submitted 9 September, 2012; originally announced September 2012.

    Comments: 14 pages, 13 figures, accepted for publication (Computer Physics Communications)

  19. Probability distribution of the order parameter

    Authors: P. H. L. Martins, J. A. Plascak

    Abstract: The probability distribution of the order parameter is exploited in order to obtain the criticality of magnetic systems. Monte Carlo simulations have been employed by using single spin flip Metropolis algorithm aided by finite-size scaling and histogram reweighting techniques. A method is proposed to obtain this probability distribution even when the transition temperature of the model is unknow… ▽ More

    Submitted 9 April, 2004; originally announced April 2004.

    Comments: 5 pages, 7 figures, to appear in Braz. J. Phys. 34, June 2004

  20. arXiv:cond-mat/0404230  [pdf, ps, other

    cond-mat.stat-mech cond-mat.dis-nn

    Percolation model for structural phase transitions in Li$_{1-x}$H$_x$IO$_3$ mixed crystals

    Authors: P. H. L. Martins, J. A. Plascak, M. A. Pimenta

    Abstract: A percolation model is proposed to explain the structural phase transitions found in Li$_{1-x}$H$_x$IO$_3$ mixed crystals as a function of the concentration parameter $x$. The percolation thresholds are obtained from Monte Carlo simulations on the specific lattices occupied by lithium atoms and hydrogen bonds. The theoretical results strongly suggest that percolating lithium vacancies and hydrog… ▽ More

    Submitted 9 April, 2004; originally announced April 2004.

    Comments: 4 pages, 2 figures

    Journal ref: Phys. Rev. B 69, 092107 (2004)

  21. arXiv:cond-mat/0304024  [pdf, ps, other

    cond-mat.stat-mech cond-mat.dis-nn

    Percolation on two- and three-dimensional lattices

    Authors: P. H. L. Martins, J. A. Plascak

    Abstract: In this work we apply a highly efficient Monte Carlo algorithm recently proposed by Newman and Ziff to treat percolation problems. The site and bond percolation are studied on a number of lattices in two and three dimensions. Quite good results for the wrap** probabilities, correlation length critical exponent and critical concentration are obtained for the square, simple cubic, HCP and hexago… ▽ More

    Submitted 1 April, 2003; originally announced April 2003.

    Comments: 15 pages, 6 figures, 3 tables

    Journal ref: Phys. Rev. E 67, 046119 (2003)