Skip to main content

Showing 1–50 of 75 results for author: Gómez, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18809  [pdf, other

    cs.CV

    Divide, Ensemble and Conquer: The Last Mile on Unsupervised Domain Adaptation for On-Board Semantic Segmentation

    Authors: Tao Lian, Jose L. Gómez, Antonio M. López

    Abstract: The last mile of unsupervised domain adaptation (UDA) for semantic segmentation is the challenge of solving the syn-to-real domain gap. Recent UDA methods have progressed significantly, yet they often rely on strategies customized for synthetic single-source datasets (e.g., GTA5), which limits their generalisation to multi-source datasets. Conversely, synthetic multi-source datasets hold promise f… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.14343  [pdf, other

    cs.AI

    IWISDM: Assessing instruction following in multimodal models at scale

    Authors: Xiaoxuan Lei, Lucas Gomez, Hao Yuan Bai, Pouya Bashivan

    Abstract: The ability to perform complex tasks from detailed instructions is a key to many remarkable achievements of our species. As humans, we are not only capable of performing a wide variety of tasks but also very complex ones that may entail hundreds or thousands of steps to complete. Large language models and their more recent multimodal counterparts that integrate textual and visual inputs have achie… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.02380  [pdf, other

    cs.CV

    EUFCC-340K: A Faceted Hierarchical Dataset for Metadata Annotation in GLAM Collections

    Authors: Francesc Net, Marc Folia, Pep Casals, Andrew D. Bagdanov, Lluis Gomez

    Abstract: In this paper, we address the challenges of automatic metadata annotation in the domain of Galleries, Libraries, Archives, and Museums (GLAMs) by introducing a novel dataset, EUFCC340K, collected from the Europeana portal. Comprising over 340,000 images, the EUFCC340K dataset is organized across multiple facets: Materials, Object Types, Disciplines, and Subjects, following a hierarchical structure… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 23 pages, 13 figures

    ACM Class: I.4.9

  4. arXiv:2404.19031  [pdf, other

    cs.CV cs.AI

    Machine Unlearning for Document Classification

    Authors: Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Document understanding models have recently demonstrated remarkable performance by leveraging extensive collections of user documents. However, since documents often contain large amounts of personal data, their usage can pose a threat to user privacy and weaken the bonds of trust between humans and AI services. In response to these concerns, legislation advocating ``the right to be forgotten" has… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted to ICDAR2024

  5. arXiv:2403.13103  [pdf

    cs.CY cs.NI

    IEEE-GDL CCD Smart Buildings Introduction

    Authors: Victor Manuel Larios, José Guadalupe Robledo, Leopoldo Gómez, R. Rincón

    Abstract: As part of the activities of the IEEE-GDL CCD working group of physical infrastructure, this whitepaper is intented to be an initial guide to understand the layers, taxonomy of services and best practices for the development of smart buildings. Open standards are claimed in order to increase interoperability between layers and services. Moreover, two buildings in Guadalajara city, one new and anot… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 5 pages, 4 figures, 4 Tables

    ACM Class: H.4.m

  6. arXiv:2312.12176  [pdf, other

    cs.CV

    All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes

    Authors: Jose L. Gómez, Manuel Silva, Antonio Seoane, Agnès Borrás, Mario Noriega, Germán Ros, Jose A. Iglesias-Guitian, Antonio M. López

    Abstract: We introduce UrbanSyn, a photorealistic dataset acquired through semi-procedurally generated synthetic urban driving scenarios. Developed using high-quality geometry and materials, UrbanSyn provides pixel-level ground truth, including depth, semantic segmentation, and instance segmentation with object bounding boxes and occlusion degree. It complements GTAV and Synscapes datasets to form what we c… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: The UrbanSyn Dataset is available in http://urbansyn.org/

  7. PAD-Phys: Exploiting Physiology for Presentation Attack Detection in Face Biometrics

    Authors: Luis F. Gomez, Julian Fierrez, Aythami Morales, Mahdi Ghafourian, Ruben Tolosana, Imanol Solano, Alejandro Garcia, Francisco Zamora-Martinez

    Abstract: Presentation Attack Detection (PAD) is a crucial stage in facial recognition systems to avoid leakage of personal information or spoofing of identity to entities. Recently, pulse detection based on remote photoplethysmography (rPPG) has been shown to be effective in face presentation attack detection. This work presents three different approaches to the presentation attack detection based on rPP… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Preprint of the paper presented to the Workshop on IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC, 2023)

  8. arXiv:2308.03554  [pdf, other

    cs.CR

    TemporalFED: Detecting Cyberattacks in Industrial Time-Series Data Using Decentralized Federated Learning

    Authors: Ángel Luis Perales Gómez, Enrique Tomás Martínez Beltrán, Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán

    Abstract: Industry 4.0 has brought numerous advantages, such as increasing productivity through automation. However, it also presents major cybersecurity issues such as cyberattacks affecting industrial processes. Federated Learning (FL) combined with time-series analysis is a promising cyberattack detection mechanism proposed in the literature. However, the fact of having a single point of failure and netw… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  9. arXiv:2306.09750  [pdf, other

    cs.LG cs.AI cs.DC cs.NI

    Fedstellar: A Platform for Decentralized Federated Learning

    Authors: Enrique Tomás Martínez Beltrán, Ángel Luis Perales Gómez, Chao Feng, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán

    Abstract: In 2016, Google proposed Federated Learning (FL) as a novel paradigm to train Machine Learning (ML) models across the participants of a federation while preserving data privacy. Since its birth, Centralized FL (CFL) has been the most used approach, where a central entity aggregates participants' models to create a global one. However, CFL presents limitations such as communication bottlenecks, sin… ▽ More

    Submitted 8 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

  10. arXiv:2305.16809  [pdf

    cs.CL cs.AI cs.HC

    GenQ: Automated Question Generation to Support Caregivers While Reading Stories with Children

    Authors: Arun Balajiee Lekshmi Narayanan, Ligia E. Gomez, Martha Michelle Soto Fernandez, Tri Nguyen, Chris Blais, M. Adelaida Restrepo, Art Glenberg

    Abstract: When caregivers ask open--ended questions to motivate dialogue with children, it facilitates the child's reading comprehension skills.Although there is scope for use of technological tools, referred here as "intelligent tutoring systems", to scaffold this process, it is currently unclear whether existing intelligent systems that generate human--language like questions is beneficial. Additionally,… ▽ More

    Submitted 25 September, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  11. arXiv:2302.03657  [pdf, other

    cs.CV cs.CR cs.LG

    Toward Face Biometric De-identification using Adversarial Examples

    Authors: Mahdi Ghafourian, Julian Fierrez, Luis Felipe Gomez, Ruben Vera-Rodriguez, Aythami Morales, Zohra Rezgui, Raymond Veldhuis

    Abstract: The remarkable success of face recognition (FR) has endangered the privacy of internet users particularly in social media. Recently, researchers turned to use adversarial examples as a countermeasure. In this paper, we assess the effectiveness of using two widely known adversarial methods (BIM and ILLC) for de-identifying personal images. We discovered, unlike previous claims in the literature, th… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: Accepted at the AAAI-23 workshop on Artificial Intelligence for Cyber Security (AICS)

  12. arXiv:2301.09174  [pdf, other

    cs.CV cs.HC cs.LG

    MATT: Multimodal Attention Level Estimation for e-learning Platforms

    Authors: Roberto Daza, Luis F. Gomez, Aythami Morales, Julian Fierrez, Ruben Tolosana, Ruth Cobos, Javier Ortega-Garcia

    Abstract: This work presents a new multimodal system for remote attention level estimation based on multimodal face analysis. Our multimodal approach uses different parameters and signals obtained from the behavior and physiological processes that have been related to modeling cognitive load such as faces gestures (e.g., blink rate, facial actions units) and user actions (e.g., head pose, distance to the ca… ▽ More

    Submitted 22 January, 2023; originally announced January 2023.

    Comments: Preprint of the paper presented to the Workshop on Artificial Intelligence for Education (AI4EDU) of AAAI 2023

  13. arXiv:2301.02668  [pdf, other

    cs.DC

    A Framework for Large Scale Particle Filters Validated with Data Assimilation for Weather Simulation

    Authors: Sebastian Friedemann, Kai Keller, Yen-Sen Lu, Bruno Raffin, Leonardo Bautista Gomez

    Abstract: Particle filters are a group of algorithms to solve inverse problems through statistical Bayesian methods when the model does not comply with the linear and Gaussian hypothesis. Particle filters are used in domains like data assimilation, probabilistic programming, neural networkoptimization, localization and navigation. Particle filters estimate the probabilitydistribution of model state… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

  14. arXiv:2211.09210  [pdf, other

    cs.HC cs.CV

    edBB-Demo: Biometrics and Behavior Analysis for Online Educational Platforms

    Authors: Roberto Daza, Aythami Morales, Ruben Tolosana, Luis F. Gomez, Julian Fierrez, Javier Ortega-Garcia

    Abstract: We present edBB-Demo, a demonstrator of an AI-powered research platform for student monitoring in remote education. The edBB platform aims to study the challenges associated to user recognition and behavior understanding in digital platforms. This platform has been developed for data collection, acquiring signals from a variety of sensors including keyboard, mouse, webcam, microphone, smartwatch,… ▽ More

    Submitted 5 December, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted in "AAAI-23 Conference on Artificial Intelligence (Demonstration Program)"

  15. arXiv:2209.10474  [pdf, other

    cs.CV

    Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia

    Authors: Khanh Nguyen, Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

    Abstract: Humans exploit prior knowledge to describe images, and are able to adapt their explanation to specific contextual information, even to the extent of inventing plausible explanations when contextual information and images do not match. In this work, we propose the novel task of captioning Wikipedia images by integrating contextual knowledge. Specifically, we produce models that jointly reason over… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

  16. arXiv:2209.06730  [pdf, other

    cs.CV

    MUST-VQA: MUltilingual Scene-text VQA

    Authors: Emanuele Vivoli, Ali Furkan Biten, Andres Mafla, Dimosthenis Karatzas, Lluis Gomez

    Abstract: In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question can be asked in different languages and it is not necessarily aligned to the scene text language. Thus, we first introduce a natural step towards a m… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: To be appeared in Text In Everything Workshop in ECCV 2022

  17. arXiv:2206.10343  [pdf, other

    cs.CL

    Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo

    Authors: Roberto Zariquiey, Claudia Alvarado, Ximena Echevarria, Luisa Gomez, Rosa Gonzales, Mariana Illescas, Sabina Oporto, Frederic Blum, Arturo Oncevay, Javier Vera

    Abstract: In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-spe… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted to LREC 2022

  18. Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models

    Authors: Jose L. Gómez, Gabriel Villalonga, Antonio M. López

    Abstract: Semantic image segmentation is a central and challenging task in autonomous driving, addressed by training deep models. Since this training draws to a curse of human-based image labeling, using synthetic images with automatically generated labels together with unlabeled real-world images is a promising alternative. This implies to address an unsupervised domain adaptation (UDA) problem. In this pa… ▽ More

    Submitted 30 January, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: Code available at https://github.com/JoseLGomez/Co-training_SemSeg_UDA. Paper accepted on Sensors at https://www.mdpi.com/1424-8220/23/2/621

    Journal ref: Sensors, Special Issue Machine Learning for Autonomous Driving Perception and Prediction (2023)

  19. arXiv:2205.01858  [pdf

    q-bio.QM cs.LG eess.SP

    DeeptDCS: Deep Learning-Based Estimation of Currents Induced During Transcranial Direct Current Stimulation

    Authors: Xiaofan Jia, Sadeed Bin Sayed, Nahian Ibn Hasan, Luis J. Gomez, Guang-Bin Huang, Abdulkadir C. Yucel

    Abstract: Objective: Transcranial direct current stimulation (tDCS) is a non-invasive brain stimulation technique used to generate conduction currents in the head and disrupt brain functions. To rapidly evaluate the tDCS-induced current density in near real-time, this paper proposes a deep learning-based emulator, named DeeptDCS. Methods: The emulator leverages Attention U-net taking the volume conductor mo… ▽ More

    Submitted 6 October, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

  20. arXiv:2204.04028  [pdf, other

    cs.CV cs.DL cs.IR

    A Generic Image Retrieval Method for Date Estimation of Historical Document Collections

    Authors: Adrià Molina, Lluis Gomez, Oriol Ramos Terrades, Josep Lladós

    Abstract: Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. we use a ranking loss function named smooth-nDCG to train a Convolutional… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Preprint of paper accepted at DAS2022

  21. arXiv:2203.11361  [pdf, ps, other

    cs.DM cs.CC

    Complexity of limit cycles with block-sequential update schedules in conjunctive networks

    Authors: Julio Aracena, Florian Bridoux, Luis Gómez, Lilian Salinas

    Abstract: In this paper, we deal the following decision problem: given a conjunctive Boolean network defined by its interaction digraph, does it have a limit cycle of a given length k? We prove that this problem is NP-complete in general if k is a parameter of the problem and in P if the interaction digraph is strongly connected. The case where $k$ is a constant, but the interaction digraph is not strongly… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

  22. arXiv:2203.04814  [pdf, other

    cs.CV

    Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

    Authors: Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, Dimosthenis Karatzas

    Abstract: In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a transformer-based architecture that incorporates three pretext tasks as learning objectives to be optimized during pre-training without the usage of labeled data. E… ▽ More

    Submitted 18 August, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: Preprint

  23. arXiv:2202.12985  [pdf, other

    cs.CV cs.AI

    OCR-IDL: OCR Annotations for Industry Document Library Dataset

    Authors: Ali Furkan Biten, Rubèn Tito, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas

    Abstract: Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later to be finetuned on downstream tasks. One of the problems of the pretraining approaches is the inconsistent usage of pretraining data with different OCR engines leading to incomparable results between models. In other words, it is not obvious whether the performance… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

  24. arXiv:2111.02078  [pdf, ps, other

    cs.CV cs.LG

    FaceQvec: Vector Quality Assessment for Face Biometrics based on ISO Compliance

    Authors: Javier Hernandez-Ortega, Julian Fierrez, Luis F. Gomez, Aythami Morales, Jose Luis Gonzalez-de-Suso, Francisco Zamora-Martinez

    Abstract: In this paper we develop FaceQvec, a software component for estimating the conformity of facial images with each of the points contemplated in the ISO/IEC 19794-5, a quality standard that defines general quality guidelines for face images that would make them acceptable or unacceptable for use in official documents such as passports or ID cards. This type of tool for quality assessment can help to… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

  25. A Butterfly-Accelerated Volume Integral Equation Solver for Broad Permittivity and Large-Scale Electromagnetic Analysis

    Authors: Sadeed B. Sayed, Yang Liu, Luis J. Gomez, Abdulkadir C. Yucel

    Abstract: A butterfly-accelerated volume integral equation (VIE) solver is proposed for fast and accurate electromagnetic (EM) analysis of scattering from heterogeneous objects. The proposed solver leverages the hierarchical off-diagonal butterfly (HOD-BF) scheme to construct the system matrix and obtain its approximate inverse, used as a preconditioner. Complexity analysis and numerical experiments validat… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

  26. arXiv:2110.02623  [pdf, other

    cs.CV cs.AI

    Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

    Authors: Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

    Abstract: The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning datasets that offer a very limited set of relationships between images and sentences in their ground-truth annotations. This limited ground truth information forc… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Accepted WACV 2022

  27. arXiv:2110.01705  [pdf, other

    cs.CV

    Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

    Authors: Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

    Abstract: Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or inc… ▽ More

    Submitted 2 November, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: Accepted to WACV 2022

  28. Asking questions on handwritten document collections

    Authors: Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar

    Abstract: This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritt… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: pre-print version

    Journal ref: journal = {Int. J. Document Anal. Recognit.}, volume = {24}, number = {3}, pages = {235--249}, year = {2021}

  29. arXiv:2106.05618  [pdf, other

    cs.CV

    Date Estimation in the Wild of Scanned Historical Photos: An Image Retrieval Approach

    Authors: Adrià Molina, Pau Riba, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós

    Abstract: This paper presents a novel method for date estimation of historical photographs from archival sources. The main contribution is to formulate the date estimation as a retrieval task, where given a query, the retrieved images are ranked in terms of the estimated date similarity. The closer are their embedded representations the closer are their dates. Contrary to the traditional models that design… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted at ICDAR 2021

  30. arXiv:2106.05144  [pdf, other

    cs.CV cs.IR

    Learning to Rank Words: Optimizing Ranking Metrics for Word Spotting

    Authors: Pau Riba, Adrià Molina, Lluis Gomez, Oriol Ramos-Terrades, Josep Lladós

    Abstract: In this paper, we explore and evaluate the use of ranking-based objective functions for learning simultaneously a word string and a word image encoder. We consider retrieval frameworks in which the user expects a retrieval list ranked according to a defined relevance score. In the context of a word spotting problem, the relevance score has been set according to the string edit distance from the qu… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted at ICDAR 2021

  31. SuperVoxHenry Tucker-Enhanced and FFT-Accelerated Inductance Extraction for Voxelized Superconducting Structures

    Authors: Mingyu Wang, Cheng Qian, Enrico Di Lorenzo, Luis J. Gomez, Vladimir Okhmatovski, Abdulkadir C. Yucel

    Abstract: This paper introduces SuperVoxHenry, an inductance extraction simulator for analyzing voxelized superconducting structures. SuperVoxHenry extends the capabilities of the inductance extractor VoxHenry for analyzing the superconducting structures by incorporating the following enhancements. 1. SuperVoxHenry utilizes a two-fluid model to account for normal currents and supercurrents. 2. SuperVoxHenry… ▽ More

    Submitted 18 August, 2021; v1 submitted 26 April, 2021; originally announced May 2021.

  32. arXiv:2105.05300  [pdf, other

    cs.CV

    One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

    Authors: Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Lladós

    Abstract: Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and the very limited linguistic information (dictionaries and language models). For example, in the case of historical ciphered manuscripts, which are usually written with invented alphabets to hide the message contents. Thus, in this paper we address this problem through a data generation technique… ▽ More

    Submitted 5 October, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: Accepted in WACV 2022

  33. Co-training for Deep Object Detection: Comparing Single-modal and Multi-modal Approaches

    Authors: Jose L. Gómez, Gabriel Villalonga, Antonio M. López

    Abstract: Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale as we wish. This data labeling bottleneck may be intensified due to domain shifts among image senso… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Report number: sensors-1185064

    Journal ref: special issue of Sensors (ISSN 1424-8220) "Feature Papers in Physical Sensors Section 2020"

  34. An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks

    Authors: Albert Njoroge Kahira, Truong Thao Nguyen, Leonardo Bautista Gomez, Ryousei Takano, Rosa M Badia, Mohamed Wahib

    Abstract: Deep Neural Network (DNN) frameworks use distributed training to enable faster time to convergence and alleviate memory capacity limitations when training large models and/or using high dimension inputs. With the steady increase in datasets and model sizes, model/hybrid parallelism is deemed to have an important role in the future of distributed training of DNNs. We analyze the compute, communicat… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: The International ACM Symposium on High-Performance Parallel and Distributed Computing 2021 (HPDC'21)

  35. arXiv:2012.04329  [pdf, other

    cs.CV

    StacMR: Scene-Text Aware Cross-Modal Retrieval

    Authors: Andrés Mafla, Rafael Sampaio de Rezende, Lluís Gómez, Diane Larlus, Dimosthenis Karatzas

    Abstract: Recent models for cross-modal retrieval have benefited from an increasingly rich understanding of visual scenes, afforded by scene graphs and object interactions to mention a few. This has resulted in an improved matching between the visual representation of an image and the textual representation of its caption. Yet, current visual representations overlook a key aspect: the text appearing in imag… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  36. arXiv:2012.00825  [pdf, other

    cs.DC

    A Study of Checkpointing in Large Scale Training of Deep Neural Networks

    Authors: Elvis Rojas, Albert Njoroge Kahira, Esteban Meneses, Leonardo Bautista Gomez, Rosa M Badia

    Abstract: Deep learning (DL) applications are increasingly being deployed on HPC systems, to leverage the massive parallelism and computing power of those systems for DL model training. While significant effort has been put to facilitate distributed training by DL frameworks, fault tolerance has been largely ignored. In this work, we evaluate checkpoint-restart, a common fault tolerance technique in HPC wor… ▽ More

    Submitted 29 March, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Journal ref: 2020 International Conference on High Performance Computing & Simulation (HPCS20)

  37. arXiv:2009.09809  [pdf, other

    cs.CV

    Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

    Authors: Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

    Abstract: Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval. First, we obtain the text instances from images by employing a text… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  38. arXiv:2007.03375  [pdf, other

    cs.CV

    Location Sensitive Image Retrieval and Tagging

    Authors: Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas

    Abstract: People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that l… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

    MSC Class: 68T07 ACM Class: I.2.10

    Journal ref: ECCV 2020

  39. arXiv:2007.03098  [pdf, other

    cs.CV

    Text Recognition -- Real World Data and Where to Find Them

    Authors: Klára Janoušková, Jiri Matas, Lluis Gomez, Dimosthenis Karatzas

    Abstract: We present a method for exploiting weakly annotated images to improve text extraction pipelines. The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions. The proposed method includes matching of imprecise transcription to weak annotations and edit distance guided neighbourhood search. It produces nearly error-f… ▽ More

    Submitted 17 July, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: 10 pages

  40. arXiv:2006.00923  [pdf, other

    cs.CV

    Multimodal grid features and cell pointers for Scene Text Visual Question Answering

    Authors: Lluís Gómez, Ali Furkan Biten, Rubèn Tito, Andrés Mafla, Marçal Rusiñol, Ernest Valveny, Dimosthenis Karatzas

    Abstract: This paper presents a new model for the task of scene text visual question answering, in which questions about a given image can only be answered by reading and understanding scene text that is present in it. The proposed model is based on an attention mechanism that attends to multi-modal features conditioned to the question, allowing it to reason jointly about the textual and visual modalities i… ▽ More

    Submitted 25 June, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: This paper is under consideration at Pattern Recognition Letters

  41. arXiv:2005.09496  [pdf, other

    cs.CV

    RoadText-1K: Text Detection & Recognition Dataset for Driving Videos

    Authors: Sangeeth Reddy, Minesh Mathew, Lluis Gomez, Marcal Rusinol, Dimosthenis Karatzas., C. V. Jawahar

    Abstract: Perceiving text is crucial to understand semantics of outdoor scenes and hence is a critical requirement to build intelligent systems for driver assistance and self-driving. Most of the existing datasets for text detection and recognition comprise still images and are mostly compiled kee** text in mind. This paper introduces a new "RoadText-1K" dataset for text in driving videos. The dataset is… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

    Comments: to be published in ICRA 2020

  42. arXiv:2001.04732  [pdf, other

    cs.CV

    Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

    Authors: Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

    Abstract: Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: Winter Conference on Applications of Computer Vision (WACV 2020) Accepted paper

  43. arXiv:1912.00154  [pdf, other

    cs.PF cs.AR

    Hardware Versus Software Fault Injection of Modern Undervolted SRAMs

    Authors: Muhammet Abdullah Soyturk, Konstantinos Parasyris, Behzad Salami, Osman Unsal, Gulay Yalcin, Leonardo Bautista Gomez

    Abstract: To improve power efficiency, researchers are experimenting with dynamically adjusting the supply voltage of systems below the nominal operating points. However, production systems are typically not allowed to function on voltage settings that is below the reliable limit. Consequently, existing software fault tolerance studies are based on fault models, which inject faults on random fault locations… ▽ More

    Submitted 30 November, 2019; originally announced December 2019.

  44. arXiv:1910.03814  [pdf, other

    cs.CV cs.CL

    Exploring Hate Speech Detection in Multimodal Publications

    Authors: Raul Gomez, Jaume Gibert, Lluis Gomez, Dimosthenis Karatzas

    Abstract: In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the c… ▽ More

    Submitted 9 October, 2019; originally announced October 2019.

  45. arXiv:1909.01216  [pdf, other

    cs.DB

    Online Analytical Processsing on Graph Data

    Authors: Leticia Gómez, Bart Kuijpers, Alejandro Vaisman

    Abstract: Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a Big Data scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: This is a draft version of the work that will appear in Volume 24(2) of the Intelligent Data Analysis Journal, in early 2020

  46. arXiv:1907.04246  [pdf, other

    cs.CR cs.LG

    Security for Distributed Deep Neural Networks Towards Data Confidentiality & Intellectual Property Protection

    Authors: Laurent Gomez, Marcus Wilhelm, José Márquez, Patrick Duverger

    Abstract: Current developments in Enterprise Systems observe a paradigm shift, moving the needle from the backend to the edge sectors of those; by distributing data, decentralizing applications and integrating novel components seamlessly to the central systems. Distributively deployed AI capabilities will thrust this transition. Several non-functional requirements arise along with these developments, securi… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

    Journal ref: Proceedings of the 16th International Joint Conference on e-Business and Telecommunications, ICETE 2019

  47. arXiv:1907.03343  [pdf, other

    cs.LG math.OC stat.ML

    Fast and Provable ADMM for Learning with Generative Priors

    Authors: Fabian Latorre Gómez, Armin Eftekhari, Volkan Cevher

    Abstract: In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (G… ▽ More

    Submitted 7 July, 2019; originally announced July 2019.

  48. arXiv:1907.00490  [pdf, other

    cs.CV

    ICDAR 2019 Competition on Scene Text Visual Question Answering

    Authors: Ali Furkan Biten, Rubèn Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Minesh Mathew, C. V. Jawahar, Ernest Valveny, Dimosthenis Karatzas

    Abstract: This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any Visual Question Answering system up to date, namely the incorporation of scene text to answer questions asked about an image. The competition introduces a new dataset comprising 23,038 images annotated with 31,791 question/ans… ▽ More

    Submitted 30 June, 2019; originally announced July 2019.

    Comments: 15th International Conference on Document Analysis and Recognition (ICDAR 2019)

  49. arXiv:1906.05038  [pdf, other

    cs.DC

    Application-Level Differential Checkpointing for HPC Applications with Dynamic Datasets

    Authors: Kai Keller, Leonardo Bautista Gomez

    Abstract: High-performance computing (HPC) requires resilience techniques such as checkpointing in order to tolerate failures in supercomputers. As the number of nodes and memory in supercomputers keeps on increasing, the size of checkpoint data also increases dramatically, sometimes causing an I/O bottleneck. Differential checkpointing (dCP) aims to minimize the checkpointing overhead by only writing data… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: This project has received funding from the European Unions Seventh Framework Programme (FP7/2007-2013) and the Horizon 2020 (H2020) funding framework under grant agreement no. H2020-FETHPC-754304 (DEEP-EST); and the LEGaTO Project (legato- project.eu), grant agreement No 780681

  50. arXiv:1906.01466  [pdf, other

    cs.CV

    Selective Style Transfer for Text

    Authors: Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas

    Abstract: This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means transferring style to on… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: Accepted in ICDAR 2019