Skip to main content

Showing 1–26 of 26 results for author: Villegas, M

.
  1. arXiv:2307.14308  [pdf, other

    quant-ph cs.SE

    QPLEX: Realizing the Integration of Quantum Computing into Combinatorial Optimization Software

    Authors: Juan Giraldo, José Ossorio, Norha M. Villegas, Gabriel Tamura, Ulrike Stege

    Abstract: Quantum computing has the potential to surpass the capabilities of current classical computers when solving complex problems. Combinatorial optimization has emerged as one of the key target areas for quantum computers as problems found in this field play a critical role in many different industrial application sectors (e.g., enhancing manufacturing operations or improving decision processes). Curr… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted for the IEEE International Conference on Quantum Computing and Engineering (QCE) 2023

  2. arXiv:2301.03858  [pdf, other

    stat.AP

    Replicating and extending chain-ladder via an age-period-cohort structure on the claim development in a run-off triangle

    Authors: Gabriele Pittarello, Munir Hiabu, Andrés M. Villegas

    Abstract: This paper introduces yet another stochastic model replicating chain-ladder estimates and furthermore considers extensions that add flexibility to the modeling. In its simplest form, the proposed model replicates the chain-ladder's development factors using a GLM model with averaged hazard rates running in reversed development time as response. This is in contrast to the existing reserving literat… ▽ More

    Submitted 10 November, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

  3. arXiv:2208.14562  [pdf, other

    astro-ph.IM astro-ph.CO astro-ph.GA astro-ph.HE astro-ph.SR

    The Athena X-ray Integral Field Unit: a consolidated design for the system requirement review of the preliminary definition phase

    Authors: Didier Barret, Vincent Albouys, Jan-Willem den Herder, Luigi Piro, Massimo Cappi, Juhani Huovelin, Richard Kelley, J. Miguel Mas-Hesse, Stéphane Paltani, Gregor Rauw, Agata Rozanska, Jiri Svoboda, Joern Wilms, Noriko Yamasaki, Marc Audard, Simon Bandler, Marco Barbera, Xavier Barcons, Enrico Bozzo, Maria Teresa Ceballos, Ivan Charles, Elisa Costantini, Thomas Dauser, Anne Decourchelle, Lionel Duband , et al. (274 additional authors not shown)

    Abstract: The Athena X-ray Integral Unit (X-IFU) is the high resolution X-ray spectrometer, studied since 2015 for flying in the mid-30s on the Athena space X-ray Observatory, a versatile observatory designed to address the Hot and Energetic Universe science theme, selected in November 2013 by the Survey Science Committee. Based on a large format array of Transition Edge Sensors (TES), it aims to provide sp… ▽ More

    Submitted 28 November, 2022; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 48 pages, 29 figures, Accepted for publication in Experimental Astronomy with minor editing

  4. Content and Style Aware Generation of Text-line Images for Handwriting Recognition

    Authors: Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

    Abstract: Handwritten Text Recognition has achieved an impressive performance in public benchmarks. However, due to the high inter- and intra-class variability between handwriting styles, such recognizers need to be trained using huge volumes of manually labeled training data. To alleviate this labor-consuming problem, synthetic data produced with TrueType fonts has been often used in the training loop to g… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: Accepted to TPAMI

  5. arXiv:2112.07035  [pdf

    cs.CL cs.SI

    Framework para Caracterizar Fake News en Terminos de Emociones

    Authors: Luis Rojas Rubio, Claudio Meneses Villegas

    Abstract: Social networks have become one of the main information channels for human beings due to the immediate and social interactivity they offer, allowing in some cases to publish what each user considers relevant. This has brought with it the generation of false news or Fake News, publications that only seek to generate uncertainty, misinformation or skew the opinion of readers. It has been shown that… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: in Spanish

  6. arXiv:2112.01894  [pdf, other

    cs.CL cs.AI

    The Catalan Language CLUB

    Authors: Carlos Rodriguez-Penagos, Carme Armentano-Oller, Marta Villegas, Maite Melero, Aitor Gonzalez, Ona de Gibert Bonet, Casimiro Carrino Pio

    Abstract: The Catalan Language Understanding Benchmark (CLUB) encompasses various datasets representative of different NLU tasks that enable accurate evaluations of language models, following the General Language Understanding Evaluation (GLUE) example. It is part of AINA and PlanTL, two public funding initiatives to empower the Catalan language in the Artificial Intelligence era.

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: OpenCor Forum 2021. arXiv admin note: text overlap with arXiv:2107.07903

    MSC Class: 91F20 ACM Class: I.2.7

  7. arXiv:2110.12201  [pdf, ps, other

    cs.CL cs.AI

    Spanish Legalese Language Model and Corpora

    Authors: Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Aitor Gonzalez-Agirre, Marta Villegas

    Abstract: There are many Language Models for the English language according to its worldwide relevance. However, for the Spanish language, even if it is a widely spoken language, there are very few Spanish Language Models which result to be small and too general. Legal slang could be think of a Spanish variant on its own as it is very complicated in vocabulary, semantics and phrase understanding. For this w… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

  8. arXiv:2109.07765  [pdf, ps, other

    cs.CL

    Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models

    Authors: Casimiro Pio Carrino, Jordi Armengol-Estapé, Ona de Gibert Bonet, Asier Gutiérrez-Fandiño, Aitor Gonzalez-Agirre, Martin Krallinger, Marta Villegas

    Abstract: We introduce CoWeSe (the Corpus Web Salud Español), the largest Spanish biomedical corpus to date, consisting of 4.5GB (about 750M tokens) of clean plain text. CoWeSe is the result of a massive crawler on 3000 Spanish domains executed in 2020. The corpus is openly available and already preprocessed. CoWeSe is an important resource for biomedical and health NLP in Spanish and has already been emplo… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

  9. arXiv:2109.03570  [pdf, other

    cs.CL

    Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario

    Authors: Casimiro Pio Carrino, Jordi Armengol-Estapé, Asier Gutiérrez-Fandiño, Joan Llop-Palao, Marc Pàmies, Aitor Gonzalez-Agirre, Marta Villegas

    Abstract: This work presents biomedical and clinical language models for Spanish by experimenting with different pretraining choices, such as masking at word and subword level, varying the vocabulary size and testing with domain data, looking for better language representations. Interestingly, in the absence of enough clinical data to train a model from scratch, we applied mixed-domain pretraining and cross… ▽ More

    Submitted 17 September, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: 9 pages

  10. arXiv:2107.07903  [pdf, other

    cs.CL

    Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? A Comprehensive Assessment for Catalan

    Authors: Jordi Armengol-Estapé, Casimiro Pio Carrino, Carlos Rodriguez-Penagos, Ona de Gibert Bonet, Carme Armentano-Oller, Aitor Gonzalez-Agirre, Maite Melero, Marta Villegas

    Abstract: Multilingual language models have been a crucial breakthrough as they considerably reduce the need of data for under-resourced languages. Nevertheless, the superiority of language-specific models has already been proven for languages having access to large amounts of data. In this work, we focus on Catalan with the aim to explore to what extent a medium-sized monolingual language model is competit… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

    Comments: Accepted into Findings of ACL-IJCNLP 2021

  11. MarIA: Spanish Language Models

    Authors: Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marc Pàmies, Joan Llop-Palao, Joaquín Silveira-Ocampo, Casimiro Pio Carrino, Aitor Gonzalez-Agirre, Carme Armentano-Oller, Carlos Rodriguez-Penagos, Marta Villegas

    Abstract: This work presents MarIA, a family of Spanish language models and associated resources made available to the industry and the research community. Currently, MarIA includes RoBERTa-base, RoBERTa-large, GPT2 and GPT2-large Spanish language models, which can arguably be presented as the largest and most proficient language models in Spanish. The models were pretrained using a massive corpus of 570GB… ▽ More

    Submitted 5 April, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

    Journal ref: Procesamiento del Lenguaje Natural, v. 68, p. 39-60, mar. 2022. ISSN 1989-7553

  12. Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

    Authors: Anastasios Nentidis, Anastasia Krithara, Konstantinos Bougiatiotis, Martin Krallinger, Carlos Rodriguez-Penagos, Marta Villegas, Georgios Paliouras

    Abstract: In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different te… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: 21 pages, 10 tables, 3 figures

    Journal ref: Arampatzis A. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2020. Lecture Notes in Computer Science, vol 12260. Springer, Cham

  13. arXiv:2106.00012  [pdf, other

    cs.LG cs.AI math.AT

    Persistent Homology Captures the Generalization of Neural Networks Without A Validation Set

    Authors: Asier Gutiérrez-Fandiño, David Pérez-Fernández, Jordi Armengol-Estapé, Marta Villegas

    Abstract: The training of neural networks is usually monitored with a validation (holdout) set to estimate the generalization of the model. This is done instead of measuring intrinsic properties of the model to determine whether it is learning appropriately. In this work, we suggest studying the training of neural networks with Algebraic Topology, specifically Persistent Homology (PH). Using simplicial comp… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

  14. Collagenase Nanocapsules: An Approach to Fibrosis Treatment

    Authors: MR Villegas, A Baeza, A Usategui, PL Ortiz-Romero, JL Pablos, M Vallet-Regi

    Abstract: Fibrosis is a common lesion in different pathologic diseases and is defined by the excessive accumulation of collagen. Different approaches have been used to treat different conditions characterized by fibrosis. FDA and EMA approved collagenase to treat palmar fibromatosis, Dupuyten disease. EMA approved additionally its use in severe Peyronie disease, but it has been used off label in other condi… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: 32 pages, 5 figures

    Journal ref: Acta Biomaterialia. 74, 430-438 (2018)

  15. arXiv:2102.12843  [pdf, ps, other

    cs.CL cs.AI

    Spanish Biomedical and Clinical Language Embeddings

    Authors: Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Casimiro Pio Carrino, Ona De Gibert, Aitor Gonzalez-Agirre, Marta Villegas

    Abstract: We computed both Word and Sub-word Embeddings using FastText. For Sub-word embeddings we selected Byte Pair Encoding (BPE) algorithm to represent the sub-words. We evaluated the Biomedical Word Embeddings obtaining better results than previous versions showing the implication that with more data, we obtain better representations.

    Submitted 25 February, 2021; originally announced February 2021.

  16. arXiv:2101.07752  [pdf, other

    cs.LG math.AT

    Characterizing and Measuring the Similarity of Neural Networks with Persistent Homology

    Authors: David Pérez-Fernández, Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas

    Abstract: Characterizing the structural properties of neural networks is crucial yet poorly understood, and there are no well-established similarity measures between networks. In this work, we observe that neural networks can be represented as abstract simplicial complex and analyzed using their topological 'fingerprints' via Persistent Homology (PH). We then describe a PH-based representation proposed for… ▽ More

    Submitted 31 May, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

  17. arXiv:2012.11699  [pdf, other

    cs.CR cs.SI

    A Vulnerability Study on Academic Collaboration Networks Based on Network Dynamics

    Authors: Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Marta Villegas

    Abstract: Researchers that work for the same institution use their email as the main communication tool. Email can be one of the most fruitful attack vectors of research institutions as they also contain access to all accounts and thus to all private information. We propose an approach for analyzing in terms of security research institutions' communication networks. We first obtained institutions' communica… ▽ More

    Submitted 31 March, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

  18. arXiv:2005.13044  [pdf, other

    cs.CV

    Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition

    Authors: Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

    Abstract: The advent of recurrent neural networks for handwriting recognition marked an important milestone reaching impressive recognition accuracies despite the great variability that we observe across different writing styles. Sequential architectures are a perfect fit to model text lines, not only because of the inherent temporal aspect of text, but also to learn probability distributions over sequences… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

  19. arXiv:2003.02567  [pdf, other

    cs.CV

    GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images

    Authors: Lei Kang, Pau Riba, Yaxing Wang, Marçal Rusiñol, Alicia Fornés, Mauricio Villegas

    Abstract: Although current image generation methods have reached impressive quality levels, they are still unable to produce plausible yet diverse images of handwritten words. On the contrary, when writing by hand, a great variability is observed across different writers, and even when analyzing words scribbled by the same individual, involuntary variations are conspicuous. In this work, we take a step clos… ▽ More

    Submitted 21 July, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: Accepted to ECCV2020

  20. arXiv:1912.10308  [pdf, other

    cs.CV cs.CL

    Candidate Fusion: Integrating Language Modelling into a Sequence-to-Sequence Handwritten Word Recognition Architecture

    Authors: Lei Kang, Pau Riba, Mauricio Villegas, Alicia Fornés, Marçal Rusiñol

    Abstract: Sequence-to-sequence models have recently become very popular for tackling handwritten word recognition problems. However, how to effectively integrate an external language model into such recognizer is still a challenging problem. The main challenge faced when training a language model is to deal with the language model corpus which is usually different to the one used for training the handwritte… ▽ More

    Submitted 21 December, 2019; originally announced December 2019.

  21. arXiv:1912.10016  [pdf, other

    cs.CV

    A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages

    Authors: Manuel Carbonell, Alicia Fornés, Mauricio Villegas, Josep Lladós

    Abstract: In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propo… ▽ More

    Submitted 4 May, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

    Comments: To be published in Pattern Recognition Letters

  22. arXiv:1912.04995  [pdf

    physics.app-ph cond-mat.mtrl-sci

    Optically Cooling Cesium Lead Tribromide Nanoparticles

    Authors: Benjamin J. Roman, Noel Mireles Villegas, Kylie Lytle, Matthew T. Sheldon

    Abstract: One photon up-conversion photoluminescence is an optical phenomenon whereby the thermal energy of a fluorescent material increases the energy of an emitted photon compared with the energy of the photon that was absorbed. When this occurs with near unity efficiency, the emitting material undergoes a net decrease in temperature--so called optical cooling. Because the up-conversion mechanism is therm… ▽ More

    Submitted 14 September, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

  23. Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition

    Authors: Lei Kang, Marçal Rusiñol, Alicia Fornés, Pau Riba, Mauricio Villegas

    Abstract: Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions… ▽ More

    Submitted 26 May, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

    Comments: Accepted to WACV 2020

  24. arXiv:1803.06252  [pdf, other

    cs.CV cs.CL

    Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model

    Authors: Manuel Carbonell, Mauricio Villegas, Alicia Fornés, Josep Lladós

    Abstract: When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognitio… ▽ More

    Submitted 22 March, 2018; v1 submitted 16 March, 2018; originally announced March 2018.

    Comments: To appear in IAPR International Workshop on Document Analysis Systems 2018 (DAS 2018)

  25. Search for neutral Higgs bosons decaying into four taus at LEP2

    Authors: ALEPH Collaboration, S. Schael, R. Barate, R. Brunelière, I. De Bonis, D. Decamp, C. Goy, S. Jézéquel, J. -P. Lees, F. Martin, E. Merle, M. -N. Minard, B. Pietrzyk, B. Trocmé S. Bravo, M. P. Casado, M. Chmeissani, J. M. Crespo, E. Fernandez, M. Fernandez-Bosman, Ll. Garrido, M. Martinez, A. Pacheco, H. Ruiz, A. Colaleo, D. Creanza , et al. (236 additional authors not shown)

    Abstract: A search for the production and non-standard decay of a Higgs boson, h, into four taus through intermediate pseudoscalars, a, is conducted on 683 pb-1 of data collected by the ALEPH experiment at centre-of-mass energies from 183 to 209 GeV. No excess of events above background is observed, and exclusion limits are placed on the combined production cross section times branching ratio, ξ^2 = σ(e+e… ▽ More

    Submitted 19 April, 2010; v1 submitted 2 March, 2010; originally announced March 2010.

    Comments: 18 pages, 16 figures

    Journal ref: JHEP 1005:049,2010

  26. arXiv:quant-ph/0307051  [pdf, ps, other

    quant-ph

    On Discrete Quasiprobability Distributions

    Authors: C. A. Munoz Villegas, A. Chavez Chavez, S. Chumakov, Yu. Fofanov, A. B. Klimov

    Abstract: We analyze quasi probability distributions in discrete phase space related to the discrete Heisenberg-Weyl group. In particular, we discuss the relation between the Discrete Wigner and Q- functions.

    Submitted 7 July, 2003; originally announced July 2003.