Skip to main content

Showing 1–50 of 69 results for author: Giro-i-Nieto, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.14335  [pdf, other

    cs.LG cs.AI stat.ML

    HyperFast: Instant Classification for Tabular Data

    Authors: David Bonet, Daniel Mas Montserrat, Xavier Giró-i-Nieto, Alexander G. Ioannidis

    Abstract: Training deep learning models and performing hyperparameter tuning can be computationally demanding and time-consuming. Meanwhile, traditional machine learning methods like gradient-boosting algorithms remain the preferred choice for most tabular data applications, while neural network alternatives require extensive hyperparameter tuning or work only in toy datasets under limited settings. In this… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 21 pages, 9 figures, AAAI 2024

  2. arXiv:2312.04546  [pdf, other

    cs.LG cs.AI stat.AP stat.ML

    Adversarial Learning for Feature Shift Detection and Correction

    Authors: Miriam Barrabes, Daniel Mas Montserrat, Margarita Geleta, Xavier Giro-i-Nieto, Alexander G. Ioannidis

    Abstract: Data shift is a phenomenon present in many real-world applications, and while there are multiple methods attempting to detect shifts, the task of localizing and correcting the features originating such shifts has not been studied in depth. Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in tabular and structured data, including b… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  3. arXiv:2304.06371  [pdf, other

    cs.CL cs.CV

    Sign Language Translation from Instructional Videos

    Authors: Laia Tarrés, Gerard I. Gállego, Amanda Duarte, Jordi Torres, Xavier Giró-i-Nieto

    Abstract: The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead o… ▽ More

    Submitted 14 April, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Paper accepted at WiCV @CVPR23

  4. arXiv:2303.05007  [pdf, other

    cs.CR cs.CV cs.MM cs.SD eess.AS

    Towards Robust Image-in-Audio Deep Steganography

    Authors: Jaume Ros, Margarita Geleta, Jordi Pons, Xavier Giro-i-Nieto

    Abstract: The field of steganography has experienced a surge of interest due to the recent advancements in AI-powered techniques, particularly in the context of multimodal setups that enable the concealment of signals within signals of a different nature. The primary objectives of all steganographic methods are to achieve perceptual transparency, robustness, and large embedding capacity - which often presen… ▽ More

    Submitted 14 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: 8 pages, 5 figures, 2 tables

    MSC Class: 68T99 ACM Class: I.4.9; I.2.m

  5. arXiv:2212.01140  [pdf, other

    cs.CL cs.CV

    Tackling Low-Resourced Sign Language Translation: UPC at WMT-SLT 22

    Authors: Laia Tarrés, Gerard I. Gàllego, Xavier Giró-i-Nieto, Jordi Torres

    Abstract: This paper describes the system developed at the Universitat Politècnica de Catalunya for the Workshop on Machine Translation 2022 Sign Language Translation Task, in particular, for the sign-to-text direction. We use a Transformer model implemented with the Fairseq modeling toolkit. We have experimented with the vocabulary size, data augmentation techniques and pretraining the model with the PHOEN… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  6. arXiv:2209.14764  [pdf, other

    cs.LG

    Model Zoos: A Dataset of Diverse Populations of Neural Network Models

    Authors: Konstantin Schürholt, Diyar Taskiran, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

    Abstract: In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (i.e., their weights and biases) evolve on unique trajectories in weight space during training. Following, a population of such neural network models (referred to as model zoo) would form structures in weight space. We think that the ge… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

  7. arXiv:2209.14733  [pdf, other

    cs.LG cs.CV

    Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights

    Authors: Konstantin Schürholt, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

    Abstract: Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022). arXiv admin note: text overlap with arXiv:2207.10951

  8. arXiv:2209.03027  [pdf, other

    cs.CV cs.AI eess.IV

    SIRA: Relightable Avatars from a Single Image

    Authors: Pol Caselles, Eduard Ramon, Jaime Garcia, Xavier Giro-i-Nieto, Francesc Moreno-Noguer, Gil Triginer

    Abstract: Recovering the geometry of a human head from a single image, while factorizing the materials and illumination is a severely ill-posed problem that requires prior information to be solved. Methods based on 3D Morphable Models (3DMM), and their combination with differentiable renderers, have shown promising results. However, the expressiveness of 3DMMs is limited, and they typically yield over-smoot… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  9. arXiv:2209.02402  [pdf, other

    cs.CV cs.AI

    Topic Detection in Continuous Sign Language Videos

    Authors: Alvaro Budria, Laia Tarres, Gerard I. Gallego, Francesc Moreno-Noguer, Jordi Torres, Xavier Giro-i-Nieto

    Abstract: Significant progress has been made recently on challenging tasks in automatic sign language understanding, such as sign language recognition, translation and production. However, these works have focused on datasets with relatively few samples, short recordings and limited vocabulary and signing space. In this work, we introduce the novel task of sign language topic detection. We base our experime… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Presented as an extended abstract in the "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop

    Journal ref: "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop

  10. arXiv:2207.10951  [pdf, other

    cs.LG

    Hyper-Representations for Pre-Training and Transfer Learning

    Authors: Konstantin Schürholt, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

    Abstract: Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Journal ref: First Workshop of Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022, Baltimore, Maryland, USA, PMLR 162, 2022

  11. arXiv:2201.02495  [pdf, other

    cs.CV cs.AI cs.CL

    Sign Language Video Retrieval with Free-Form Textual Queries

    Authors: Amanda Duarte, Samuel Albanie, Xavier Giró-i-Nieto, Gül Varol

    Abstract: Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology. However, the problem of searching videos beyond individual keywords has received limited attention in the literature. To address this gap, in this work we introduce the task of sign language retrieval with free-form textual queries: given a written quer… ▽ More

    Submitted 15 September, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  12. arXiv:2107.12512  [pdf, other

    cs.CV cs.AI

    H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

    Authors: Eduard Ramon, Gil Triginer, Janna Escur, Albert Pumarola, Jaime Garcia, Xavier Giro-i-Nieto, Francesc Moreno-Noguer

    Abstract: Recent learning approaches that implicitly represent surface geometry using coordinate-based neural representations have shown impressive results in the problem of multi-view 3D reconstruction. The effectiveness of these techniques is, however, subject to the availability of a large number (several tens) of input views of the scene, and computationally demanding optimizations. In this paper, we ta… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

  13. arXiv:2107.08398  [pdf, other

    cs.AI cs.LG

    Unsupervised Skill-Discovery and Skill-Learning in Minecraft

    Authors: Juan José Nieto, Roger Creus, Xavier Giro-i-Nieto

    Abstract: Pre-training Reinforcement Learning agents in a task-agnostic manner has shown promising results. However, previous works still struggle in learning and discovering meaningful skills in high-dimensional state-spaces, such as pixel-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations. In our work, we learn a compact latent… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

    Comments: Accepted at ICML Unsupervised RL Workshop, 8 pages

  14. arXiv:2106.10950  [pdf, other

    cs.CV

    Multiple Object Tracking with Mixture Density Networks for Trajectory Estimation

    Authors: Andreu Girbau, Xavier Giró-i-Nieto, Ignasi Rius, Ferran Marqués

    Abstract: Multiple object tracking faces several challenges that may be alleviated with trajectory information. Knowing the posterior locations of an object helps disambiguating and solving situations such as occlusions, re-identification, and identity switching. In this work, we show that trajectory estimation can become a key factor for tracking, and present TrajE, a trajectory estimator based on recurren… ▽ More

    Submitted 21 June, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

    Comments: Best paper runner up on CVPR 2021 RVSU workshop

  15. arXiv:2106.09814  [pdf, other

    cs.MM cs.SD eess.AS

    PixInWav: Residual Steganography for Hiding Pixels in Audio

    Authors: Margarita Geleta, Cristina Punti, Kevin McGuinness, Jordi Pons, Cristian Canton, Xavier Giro-i-Nieto

    Abstract: Steganography comprises the mechanics of hiding data in a host media that may be publicly available. While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spe… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Extended abstract presented in CVPR 2021 Women in Computer Vision Workshop

  16. arXiv:2106.04403  [pdf, other

    cs.CV cs.CL cs.MM

    SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

    Authors: Ioannis Kazakos, Carles Ventura, Miriam Bellver, Carina Silberer, Xavier Giro-i-Nieto

    Abstract: Recent advances in deep learning have brought significant progress in visual grounding tasks such as language-guided video object segmentation. However, collecting large datasets for these tasks is expensive in terms of annotation time, which represents a bottleneck. To this end, we propose a novel method, namely SynthRef, for generating synthetic referring expressions for target objects in an ima… ▽ More

    Submitted 9 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted as poster at the NAACL 2021 Visually Grounded Interaction and Language (ViGIL) Workshop. 4 pages. Project website: https://imatge-upc.github.io/synthref/

  17. arXiv:2103.16607  [pdf, other

    cs.CV

    Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

    Authors: Oscar Mañas, Alexandre Lacoste, Xavier Giro-i-Nieto, David Vazquez, Pau Rodriguez

    Abstract: Remote sensing and automatic earth monitoring are key to solve global-scale challenges such as disaster prevention, land use monitoring, or tackling climate change. Although there exist vast amounts of remote sensing data, most of it remains unlabeled and thus inaccessible for supervised learning algorithms. Transfer learning approaches can reduce the data requirements of deep learning algorithms.… ▽ More

    Submitted 3 May, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

  18. arXiv:2012.10941  [pdf, other

    cs.CV cs.AI

    Can Everybody Sign Now? Exploring Sign Language Video Generation from 2D Poses

    Authors: Lucas Ventura, Amanda Duarte, Xavier Giro-i-Nieto

    Abstract: Recent work have addressed the generation of human poses represented by 2D/3D coordinates of human joints for sign language. We use the state of the art in Deep Learning for motion transfer and evaluate them on How2Sign, an American Sign Language dataset, to generate videos of signers performing sign language given a 2D pose skeleton. We evaluate the generated videos quantitatively and qualitative… ▽ More

    Submitted 4 January, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

    Comments: Video here: https://youtu.be/4ve1sGzWl2g

  19. arXiv:2010.00263  [pdf, other

    cs.CV

    RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

    Authors: Miriam Bellver, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, Xavier Giro-i-Nieto

    Abstract: The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers. Our work argues that existing benchmarks used for this task are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

  20. arXiv:2008.11073  [pdf, other

    cs.CV

    Mask-guided sample selection for Semi-Supervised Instance Segmentation

    Authors: Miriam Bellver, Amaia Salvador, Jordi Torres, Xavier Giro-i-Nieto

    Abstract: Image segmentation methods are usually trained with pixel-level annotations, which require significant human effort to collect. The most common solution to address this constraint is to implement weakly-supervised pipelines trained with lower forms of supervision, such as bounding boxes or scribbles. Another option are semi-supervised methods, which leverage a large amount of unlabeled data and a… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: Preprint submitted to Multimedia Tools and Applications

  21. arXiv:2008.08143  [pdf, other

    cs.CV

    How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

    Authors: Amanda Duarte, Shruti Palaskar, Lucas Ventura, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, Xavier Giro-i-Nieto

    Abstract: One of the factors that have hindered progress in the areas of sign language recognition, translation, and production is the absence of large annotated datasets. Towards this end, we introduce How2Sign, a multimodal and multiview continuous American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities inclu… ▽ More

    Submitted 1 April, 2021; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: Accepted at CVPR 2021. Dataset website: http://how2sign.github.io/

  22. arXiv:2008.06698  [pdf, other

    cs.CV

    Curriculum Learning for Recurrent Video Object Segmentation

    Authors: Maria Gonzalez-i-Calabuig, Carles Ventura, Xavier Giró-i-Nieto

    Abstract: Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skip** variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: Extended abstract accepted at ECCV 2020 Women in Computer Vision (WiCV) & Perception for Autonomous Driving (PAD) Workshops

    ACM Class: I.4.6

  23. arXiv:2006.00785  [pdf, ps, other

    cs.CV cs.CL cs.IR cs.MM

    Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos

    Authors: Benet Oriol, Jordi Luque, Ferran Diego, Xavier Giro-i-Nieto

    Abstract: In this work, we propose an effective approach for training unique embedding representations by combining three simultaneous modalities: image and spoken and textual narratives. The proposed methodology departs from a baseline system that spawns a embedding space trained with only spoken narratives and image cues. Our experiments on the EPIC-Kitchen and Places Audio Caption datasets show that intr… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: Accepted for presentation at EPIC@CVPR2020 workshop

  24. arXiv:2002.03647  [pdf, other

    cs.LG cs.AI stat.ML

    Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

    Authors: Víctor Campos, Alexander Trott, Caiming Xiong, Richard Socher, Xavier Giro-i-Nieto, Jordi Torres

    Abstract: Acquiring abilities in the absence of a task-oriented reward function is at the frontier of reinforcement learning research. This problem has been studied through the lens of empowerment, which draws a connection between option discovery and information theory. Information-theoretic skill discovery methods have garnered much interest from the community, but little research has been conducted in un… ▽ More

    Submitted 3 August, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: 17 pages, 11 figures. Code is publicly available at https://github.com/victorcampos7/edl

  25. arXiv:1911.02103  [pdf, other

    cs.CV cs.CL cs.MM

    Recurrent Instance Segmentation using Sequences of Referring Expressions

    Authors: Alba Herrera-Palacio, Carles Ventura, Carina Silberer, Ionut-Teodor Sorodoc, Gemma Boleda, Xavier Giro-i-Nieto

    Abstract: The goal of this work is to segment the objects in an image that are referred to by a sequence of linguistic descriptions (referring expressions). We propose a deep neural network with recurrent layers that output a sequence of binary masks, one for each referring expression provided by the user. The recurrent layers in the architecture allow the model to condition each predicted mask on the previ… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: 3rd NeurIPS Workshop on Visually Grounded Interaction and Language (ViGIL, 2019)

  26. arXiv:1910.11949  [pdf, other

    cs.MM cs.CL cs.CV

    Automatic Reminiscence Therapy for Dementia

    Authors: Mariona Caros, Maite Garolera, Petia Radeva, Xavier Giro-i-Nieto

    Abstract: With people living longer than ever, the number of cases with dementia such as Alzheimer's disease increases steadily. It affects more than 46 million people worldwide, and it is estimated that in 2050 more than 100 million will be affected. While there are not effective treatments for these terminal diseases, therapies such as reminiscence, that stimulate memories from the past are recommended. C… ▽ More

    Submitted 19 January, 2021; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: MSc thesis at TelecomBCN, Universitat Politecnica de Catalunya 2019

  27. arXiv:1910.02334  [pdf, other

    cs.MM cs.CL cs.CV

    Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation

    Authors: Benet Oriol Sabat, Cristian Canton Ferrer, Xavier Giro-i-Nieto

    Abstract: This work addresses the challenge of hate speech detection in Internet memes, and attempts using visual information to automatically detect hate speech, unlike any previous work of our knowledge. Memes are pixel-based multimedia documents that contain photos or illustrations together with phrases which, when combined, usually adopt a funny meaning. However, hate memes are also used to spread hate… ▽ More

    Submitted 5 October, 2019; originally announced October 2019.

    Comments: AI for Social Good Workshop at NeurIPS 2019 (short paper)

  28. arXiv:1908.09001  [pdf, other

    cs.CV cs.AI

    Hyperparameter-Free Losses for Model-Based Monocular Reconstruction

    Authors: Eduard Ramon, Guillermo Ruiz, Thomas Batard, Xavier Giró-i-Nieto

    Abstract: This work proposes novel hyperparameter-free losses for single view 3D reconstruction with morphable models (3DMM). We dispense with the hyperparameters used in other works by exploiting geometry, so that the shape of the object and the camera pose are jointly optimized in a sole term expression. This simplification reduces the optimization time and its complexity. Moreover, we propose a novel imp… ▽ More

    Submitted 16 August, 2019; originally announced August 2019.

  29. arXiv:1908.08856  [pdf, other

    eess.IV cs.CV cs.LG

    Assessing Knee OA Severity with CNN attention-based end-to-end architectures

    Authors: Marc Górriz, Joseph Antony, Kevin McGuinness, Xavier Giró-i-Nieto, Noel E. O'Connor

    Abstract: This work proposes a novel end-to-end convolutional neural network (CNN) architecture to automatically quantify the severity of knee osteoarthritis (OA) using X-Ray images, which incorporates trainable attention modules acting as unsupervised fine-grained detectors of the region of interest (ROI). The proposed attention modules can be applied at different levels and scales across any CNN pipeline… ▽ More

    Submitted 23 August, 2019; originally announced August 2019.

    Comments: Proceedings of the 2nd International Conference on Medical Imaging with Deep Learning

    Journal ref: Proceedings of The 2nd International Conference on Medical Imaging with Deep Learning, PMLR 102:197-214, 2019

  30. arXiv:1907.01869  [pdf, other

    cs.CV cs.LG

    Simple vs complex temporal recurrences for video saliency prediction

    Authors: Panagiotis Linardos, Eva Mohedano, Juan Jose Nieto, Noel E. O'Connor, Xavier Giro-i-Nieto, Kevin McGuinness

    Abstract: This paper investigates modifying an existing neural network architecture for static saliency prediction using two types of recurrences that integrate information from the temporal domain. The first modification is the addition of a ConvLSTM within the architecture, while the second is a conceptually simple exponential moving average of an internal convolutional state. We use weights pre-trained o… ▽ More

    Submitted 16 July, 2019; v1 submitted 3 July, 2019; originally announced July 2019.

    Comments: Accepted at BMVC 2019

  31. arXiv:1905.05880  [pdf, other

    cs.CV

    Budget-aware Semi-Supervised Semantic and Instance Segmentation

    Authors: Miriam Bellver, Amaia Salvador, Jordi Torres, Xavier Giro-i-Nieto

    Abstract: Methods that move towards less supervised scenarios are key for image segmentation, as dense labels demand significant human intervention. Generally, the annotation burden is mitigated by labeling datasets with weaker forms of supervision, e.g. image-level labels or bounding boxes. Another option are semi-supervised settings, that commonly leverage a few strong annotations and a huge number of unl… ▽ More

    Submitted 23 May, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR-W 2019 (DeepVision workshop)

  32. arXiv:1903.10195  [pdf, other

    cs.MM cs.CV

    Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks

    Authors: Amanda Duarte, Francisco Roldan, Miquel Tubau, Janna Escur, Santiago Pascual, Amaia Salvador, Eva Mohedano, Kevin McGuinness, Jordi Torres, Xavier Giro-i-Nieto

    Abstract: Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the… ▽ More

    Submitted 25 March, 2019; originally announced March 2019.

    Comments: ICASSP 2019. Projevct website at https://imatge-upc.github.io/wav2pix/

  33. arXiv:1903.05612  [pdf, other

    cs.CV

    RVOS: End-to-End Recurrent Network for Video Object Segmentation

    Authors: Carles Ventura, Miriam Bellver, Andreu Girbau, Amaia Salvador, Ferran Marques, Xavier Giro-i-Nieto

    Abstract: Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two dif… ▽ More

    Submitted 21 May, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: CVPR 2019 camera ready. Project website: https://imatge-upc.github.io/rvos/

  34. The Liver Tumor Segmentation Benchmark (LiTS)

    Authors: Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, Fabian Lohöfer, Julian Walter Holch, Wieland Sommer, Felix Hofmann, Alexandre Hostettler, Naama Lev-Cohain, Michal Drozdzal, Michal Marianne Amitai, Refael Vivantik, Jacob Sosna, Ivan Ezhov, Anjany Sekuboyina, Fernando Navarro, Florian Kofler, Johannes C. Paetzold , et al. (84 additional authors not shown)

    Abstract: In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with… ▽ More

    Submitted 25 November, 2022; v1 submitted 13 January, 2019; originally announced January 2019.

    Comments: Patrick Bilic, Patrick Christ, Hongwei Bran Li, and Eugene Vorontsov made equal contributions to this work. Published in Medical Image Analysis

    Journal ref: Medical Image Analysis (2022) Pg. 102680

  35. arXiv:1812.06164  [pdf, other

    cs.CV

    Inverse Cooking: Recipe Generation from Food Images

    Authors: Amaia Salvador, Michal Drozdzal, Xavier Giro-i-Nieto, Adriana Romero

    Abstract: People enjoy food photography because they appreciate food. Behind each meal there is a story described in a complex recipe and, unfortunately, by simply looking at a food image we do not have access to its preparation process. Therefore, in this paper we introduce an inverse cooking system that recreates cooking recipes given food images. Our system predicts ingredients as sets by means of a nove… ▽ More

    Submitted 15 June, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

    Comments: CVPR 2019

  36. arXiv:1811.04624  [pdf, other

    stat.ML cs.LG cs.NE

    Importance Weighted Evolution Strategies

    Authors: Víctor Campos, Xavier Giro-i-Nieto, Jordi Torres

    Abstract: Evolution Strategies (ES) emerged as a scalable alternative to popular Reinforcement Learning (RL) techniques, providing an almost perfect speedup when distributed across hundreds of CPU cores thanks to a reduced communication overhead. Despite providing large improvements in wall-clock time, ES is data inefficient when compared to competing RL methods. One of the main causes of such inefficiency… ▽ More

    Submitted 12 November, 2018; originally announced November 2018.

    Comments: NIPS Deep Reinforcement Learning Workshop 2018

  37. arXiv:1809.00567  [pdf, other

    cs.CV cs.AI

    PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

    Authors: Marc Assens, Xavier Giro-i-Nieto, Kevin McGuinness, Noel E. O'Connor

    Abstract: We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: ECCV 2018 Workshop on Egocentric Perception, Interaction and Computing (EPIC). This work obtained the 2nd award in Prediction of Head-gaze Scan-paths for Images, and the 2nd award in Prediction of Eye-gaze Scan-paths for Images at the IEEE ICME 2018 Salient360! Challenge

  38. arXiv:1808.09559  [pdf, other

    cs.CV

    Temporal Saliency Adaptation in Egocentric Videos

    Authors: Panagiotis Linardos, Eva Mohedano, Monica Cherto, Cathal Gurrin, Xavier Giro-i-Nieto

    Abstract: This work adapts a deep neural model for image saliency prediction to the temporal domain of egocentric video. We compute the saliency map for each video frame, firstly with an off-the-shelf model trained from static images, secondly by adding a a convolutional or conv-LSTM layers trained with a dataset for video saliency prediction. We study each configuration on EgoMon, a new dataset made of sev… ▽ More

    Submitted 4 September, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

    Comments: Extended abstract at the ECCV 2018 Workshop on Egocentric Perception, Interaction and Computing (EPIC)

  39. arXiv:1803.08165  [pdf, other

    cs.NE cs.LG

    Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks

    Authors: Daniel Fojo, Víctor Campos, Xavier Giro-i-Nieto

    Abstract: Adaptive Computation Time for Recurrent Neural Networks (ACT) is one of the most promising architectures for variable computation. ACT adapts to the input sequence by being able to look at each sample more than once, and learn how many times it should do it. In this paper, we compare ACT to Repeat-RNN, a novel architecture based on repeating each sample a fixed number of times. We found surprising… ▽ More

    Submitted 21 March, 2018; originally announced March 2018.

    Comments: Accepted as workshop paper at ICLR 2018

  40. arXiv:1802.06822  [pdf, other

    cs.CV

    Online Detection of Action Start in Untrimmed, Streaming Videos

    Authors: Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, Shih-Fu Chang

    Abstract: We aim to tackle a novel task in action detection - Online Detection of Action Start (ODAS) in untrimmed, streaming videos. The goal of ODAS is to detect the start of an action instance, with high categorization accuracy and low detection latency. ODAS is important in many applications such as early alert generation to allow timely security or emergency response. We propose three novel methods to… ▽ More

    Submitted 23 July, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

    Comments: Accepted by ECCV'18

  41. arXiv:1801.02200  [pdf, other

    cs.IR cs.CV cs.SD eess.AS

    Cross-modal Embeddings for Video and Audio Retrieval

    Authors: Didac Surís, Amanda Duarte, Amaia Salvador, Jordi Torres, Xavier Giró-i-Nieto

    Abstract: The increasing amount of online videos brings several opportunities for training self-supervised neural networks. The creation of large scale datasets of videos such as the YouTube-8M allows us to deal with this large amount of data in manageable way. In this work, we find new ways of exploiting this dataset by taking advantage of the multi-modal information it provides. By means of a neural netwo… ▽ More

    Submitted 7 January, 2018; originally announced January 2018.

    Comments: 6 pages, 3 figures

  42. arXiv:1712.00617  [pdf, other

    cs.CV

    Recurrent Neural Networks for Semantic Instance Segmentation

    Authors: Amaia Salvador, Miriam Bellver, Victor Campos, Manel Baradad, Ferran Marques, Jordi Torres, Xavier Giro-i-Nieto

    Abstract: We present a recurrent model for semantic instance segmentation that sequentially generates binary masks and their associated class probabilities for every object in an image. Our proposed system is trainable end-to-end from an input image to a sequence of labeled masks and, compared to methods relying on object proposals, does not require post-processing steps on its output. We study the suitabil… ▽ More

    Submitted 12 April, 2019; v1 submitted 2 December, 2017; originally announced December 2017.

  43. arXiv:1711.11069  [pdf, other

    cs.CV

    Detection-aided liver lesion segmentation using deep learning

    Authors: Miriam Bellver, Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Xavier Giro-i-Nieto, Jordi Torres, Luc Van Gool

    Abstract: A fully automatic technique for segmenting the liver and localizing its unhealthy tissues is a convenient tool in order to diagnose hepatic diseases and assess the response to the according treatments. In this work we propose a method to segment the liver and its lesions from Computed Tomography (CT) scans using Convolutional Neural Networks (CNNs), that have proven good results in a variety of co… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: NIPS 2017 Workshop on Machine Learning for Health (ML4H)

  44. arXiv:1711.10795  [pdf, other

    cs.CV cs.AI cs.IR

    Saliency Weighted Convolutional Features for Instance Search

    Authors: Eva Mohedano, Kevin McGuinness, Xavier Giro-i-Nieto, Noel E. O'Connor

    Abstract: This work explores attention models to weight the contribution of local convolutional representations for the instance search task. We present a retrieval framework based on bags of local convolutional features (BLCF) that benefits from saliency weighting to build an efficient image representation. The use of human visual attention models (saliency) allows significant improvements in retrieval per… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

  45. arXiv:1711.09168  [pdf, other

    cs.CV

    Cost-Effective Active Learning for Melanoma Segmentation

    Authors: Marc Gorriz, Axel Carlier, Emmanuel Faure, Xavier Giro-i-Nieto

    Abstract: We propose a novel Active Learning framework capable to train effectively a convolutional neural network for semantic segmentation of medical imaging, with a limited amount of training labeled data. Our contribution is a practical Cost-Effective Active Learning approach using dropout at test time as Monte Carlo sampling to model the pixel-wise uncertainty and to analyze the image information to im… ▽ More

    Submitted 28 November, 2017; v1 submitted 24 November, 2017; originally announced November 2017.

    Comments: NIPS ML4H 2017 workshop

  46. arXiv:1708.06834  [pdf, other

    cs.AI cs.CV

    Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

    Authors: Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, Shih-Fu Chang

    Abstract: Recurrent Neural Networks (RNNs) continue to show outstanding performance in sequence modeling tasks. However, training RNNs on long sequences often face challenges like slow inference, vanishing gradients and difficulty in capturing long term dependencies. In backpropagation through time settings, these issues are tightly coupled with the large, sequential computational graph resulting from unfol… ▽ More

    Submitted 5 February, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

    Comments: Accepted as conference paper at ICLR 2018

  47. arXiv:1708.06039  [pdf, other

    cs.CV cs.AI cs.MM

    More cat than cute? Interpretable Prediction of Adjective-Noun Pairs

    Authors: Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang

    Abstract: The increasing availability of affect-rich multimedia resources has bolstered interest in understanding sentiment and emotions in and from visual content. Adjective-noun pairs (ANP) are a popular mid-level semantic construct for capturing affect via visually detectable concepts such as "cute dog" or "beautiful landscape". Current state-of-the-art methods approach ANP prediction by considering each… ▽ More

    Submitted 20 August, 2017; originally announced August 2017.

    Comments: Oral paper at ACM Multimedia 2017 Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes (MUSA2)

  48. arXiv:1707.04092  [pdf, other

    cs.CV cs.AI cs.MM

    Disentangling Motion, Foreground and Background Features in Videos

    Authors: Xunyu Lin, Victor Campos, Xavier Giro-i-Nieto, Jordi Torres, Cristian Canton Ferrer

    Abstract: This paper introduces an unsupervised framework to extract semantically rich features for video representation. Inspired by how the human visual system groups objects based on motion cues, we propose a deep convolutional neural network that disentangles motion, foreground and background information. The proposed architecture consists of a 3D convolutional feature encoder for blocks of 16 frames, w… ▽ More

    Submitted 17 July, 2017; v1 submitted 13 July, 2017; originally announced July 2017.

    Comments: Poster presented at the CVPR 2017 Workshop Brave New Ideas for Motion Representations in Videos

  49. arXiv:1707.03123  [pdf, other

    cs.CV cs.MM

    SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes

    Authors: Marc Assens, Kevin McGuinness, Xavier Giro-i-Nieto, Noel E. O'Connor

    Abstract: We introduce SaltiNet, a deep neural network for scanpath prediction trained on 360-degree images. The model is based on a temporal-aware novel representation of saliency information named the saliency volume. The first part of the network consists of a model trained to generate saliency volumes, whose parameters are fit by back-propagation computed from a binary cross entropy (BCE) loss over down… ▽ More

    Submitted 17 August, 2017; v1 submitted 11 July, 2017; originally announced July 2017.

    Comments: Winner of the Best Scan-path Award at the Salient360!: Visual attention modeling for 360 degrees Images Grand Challenge of ICME 2017. Presented at the ICCV 2017 Workshop on Egocentric Perception, Interaction and Computing (EPIC)

  50. arXiv:1707.02581  [pdf, other

    cs.CV cs.IR cs.MM

    Class-Weighted Convolutional Features for Visual Instance Search

    Authors: Albert Jimenez, Jose M. Alvarez, Xavier Giro-i-Nieto

    Abstract: Image retrieval in realistic scenarios targets large dynamic datasets of unlabeled images. In these cases, training or fine-tuning a model every time new images are added to the database is neither efficient nor scalable. Convolutional neural networks trained for image classification over large datasets have been proven effective feature extractors for image retrieval. The most successful approach… ▽ More

    Submitted 9 July, 2017; originally announced July 2017.

    Comments: To appear in the British Machine Vision Conference (BMVC), September 2017