Search | arXiv e-print repository

arXiv:2405.19876 [pdf, other]

IReNe: Instant Recoloring of Neural Radiance Fields

Authors: Alessio Mazzucchelli, Adrian Garcia-Garcia, Elena Garces, Fernando Rivas-Manzaneque, Francesc Moreno-Noguer, Adrian Penate-Sanchez

Abstract: Advances in NERFs have allowed for 3D scene reconstructions and novel view synthesis. Yet, efficiently editing these representations while retaining photorealism is an emerging challenge. Recent methods face three primary limitations: they're slow for interactive use, lack precision at object boundaries, and struggle to ensure multi-view consistency. We introduce IReNe to address these limitations… ▽ More Advances in NERFs have allowed for 3D scene reconstructions and novel view synthesis. Yet, efficiently editing these representations while retaining photorealism is an emerging challenge. Recent methods face three primary limitations: they're slow for interactive use, lack precision at object boundaries, and struggle to ensure multi-view consistency. We introduce IReNe to address these limitations, enabling swift, near real-time color editing in NeRF. Leveraging a pre-trained NeRF model and a single training image with user-applied color edits, IReNe swiftly adjusts network parameters in seconds. This adjustment allows the model to generate new scene views, accurately representing the color changes from the training image while also controlling object boundaries and view-specific effects. Object boundary control is achieved by integrating a trainable segmentation module into the model. The process gains efficiency by retraining only the weights of the last network layer. We observed that neurons in this layer can be classified into those responsible for view-dependent appearance and those contributing to diffuse appearance. We introduce an automated classification approach to identify these neuron types and exclusively fine-tune the weights of the diffuse neurons. This further accelerates training and ensures consistent color edits across different views. A thorough validation on a new dataset, with edited object colors, shows significant quantitative and qualitative advancements over competitors, accelerating speeds by 5x to 500x. △ Less

Submitted 10 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2402.07578 [pdf, other]

doi 10.1145/3337821.3337925

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores

Authors: Adrián García-García, Juan Carlos Sáez, Fernando Castro, Manuel Prieto-Matías

Abstract: Multicore processors constitute the main architecture choice for modern computing systems in different market segments. Despite their benefits, the contention that naturally appears when multiple applications compete for the use of shared resources among cores, such as the last-level cache (LLC), may lead to substantial performance degradation. This may have a negative impact on key system aspects… ▽ More Multicore processors constitute the main architecture choice for modern computing systems in different market segments. Despite their benefits, the contention that naturally appears when multiple applications compete for the use of shared resources among cores, such as the last-level cache (LLC), may lead to substantial performance degradation. This may have a negative impact on key system aspects such as throughput and fairness. Assigning the various applications in the workload to separate LLC partitions with possibly different sizes, has been proven effective to mitigate shared-resource contention effects. In this article we propose LFOC, a clustering-based cache partitioning scheme that strives to deliver fairness while providing acceptable system throughput. LFOC leverages the Intel Cache Allocation Technology (CAT), which enables the system software to divide the LLC into different partitions. To accomplish its goals, LFOC tries to mimic the behavior of the optimal cache-clustering solution, which we could approximate by means of a simulator in different scenarios. To this end, LFOC effectively identifies streaming aggressor programs and cache sensitive applications, which are then assigned to separate cache partitions. We implemented LFOC in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of two state-of-the-art policies that optimize fairness and throughput, respectively. Our experimental analysis reveals that LFOC is able to bring a higher reduction in unfairness by leveraging a lightweight algorithm suitable for adoption in a real OS. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Journal ref: 48th International Conference on Parallel Processing (ICPP 2019), August 5--8, 2019, Kyoto, Japan

arXiv:2104.11776 [pdf, other]

UnrealROX+: An Improved Tool for Acquiring Synthetic Data from Virtual 3D Environments

Authors: Pablo Martinez-Gonzalez, Sergiu Oprea, John Alejandro Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Markus Vincze

Abstract: Synthetic data generation has become essential in last years for feeding data-driven algorithms, which surpassed traditional techniques performance in almost every computer vision problem. Gathering and labelling the amount of data needed for these data-hungry models in the real world may become unfeasible and error-prone, while synthetic data give us the possibility of generating huge amounts of… ▽ More Synthetic data generation has become essential in last years for feeding data-driven algorithms, which surpassed traditional techniques performance in almost every computer vision problem. Gathering and labelling the amount of data needed for these data-hungry models in the real world may become unfeasible and error-prone, while synthetic data give us the possibility of generating huge amounts of data with pixel-perfect annotations. However, most synthetic datasets lack from enough realism in their rendered images. In that context UnrealROX generation tool was presented in 2019, allowing to generate highly realistic data, at high resolutions and framerates, with an efficient pipeline based on Unreal Engine, a cutting-edge videogame engine. UnrealROX enabled robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual gras**, and navigation. Nevertheless, its workflow was very tied to generate image sequences from a robotic on-board camera, making hard to generate data for other purposes. In this work, we present UnrealROX+, an improved version of UnrealROX where its decoupled and easy-to-use data acquisition system allows to quickly design and generate data in a much more flexible and customizable way. Moreover, it is packaged as an Unreal plug-in, which makes it more comfortable to use with already existing Unreal projects, and it also includes new features such as generating albedo or a Python API for interacting with the virtual environment from Deep Learning frameworks. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: Accepted at International Joint Conference on Neural Networks (IJCNN) 2021

arXiv:2103.15017 [pdf, other]

H-GAN: the power of GANs in your Hands

Authors: Sergiu Oprea, Giorgos Karvounas, Pablo Martinez-Gonzalez, Nikolaos Kyriazis, Sergio Orts-Escolano, Iason Oikonomidis, Alberto Garcia-Garcia, Aggeliki Tsoli, Jose Garcia-Rodriguez, Antonis Argyros

Abstract: We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators. It is designed to translate synthetic images of hands to the real domain. Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data. We strive to provide the perfect blend of a realistic hand… ▽ More We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators. It is designed to translate synthetic images of hands to the real domain. Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data. We strive to provide the perfect blend of a realistic hand appearance with synthetic annotations. Relying on image-to-image translation, we improve the appearance of synthetic hands to approximate the statistical distribution underlying a collection of real images of hands. H-GAN tackles not only the cross-domain tone map** but also structural differences in localized areas such as shading discontinuities. Results are evaluated on a qualitative and quantitative basis improving previous works. Furthermore, we relied on the hand classification task to claim our generated hands are statistically similar to the real domain of hands. △ Less

Submitted 21 April, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

Comments: Paper accepted at The International Joint Conference on Neural Networks (IJCNN) 2021

arXiv:2005.12783 [pdf, other]

CoronaSurveys: Using Surveys with Indirect Reporting to Estimate the Incidence and Evolution of Epidemics

Authors: Oluwasegun Ojo, Augusto García-Agundez, Benjamin Girault, Harold Hernández, Elisa Cabana, Amanda García-García, Payman Arabshahi, Carlos Baquero, Paolo Casari, Ednaldo José Ferreira, Davide Frey, Chryssis Georgiou, Mathieu Goessens, Anna Ishchenko, Ernesto Jiménez, Oleksiy Kebkal, Rosa Lillo, Raquel Menezes, Nicolas Nicolaou, Antonio Ortega, Paul Patras, Julian C Roberts, Efstathios Stavrakis, Yuichi Tanaka, Antonio Fernández Anta

Abstract: The world is suffering from a pandemic called COVID-19, caused by the SARS-CoV-2 virus. National governments have problems evaluating the reach of the epidemic, due to having limited resources and tests at their disposal. This problem is especially acute in low and middle-income countries (LMICs). Hence, any simple, cheap and flexible means of evaluating the incidence and evolution of the epidemic… ▽ More The world is suffering from a pandemic called COVID-19, caused by the SARS-CoV-2 virus. National governments have problems evaluating the reach of the epidemic, due to having limited resources and tests at their disposal. This problem is especially acute in low and middle-income countries (LMICs). Hence, any simple, cheap and flexible means of evaluating the incidence and evolution of the epidemic in a given country with a reasonable level of accuracy is useful. In this paper, we propose a technique based on (anonymous) surveys in which participants report on the health status of their contacts. This indirect reporting technique, known in the literature as network scale-up method, preserves the privacy of the participants and their contacts, and collects information from a larger fraction of the population (as compared to individual surveys). This technique has been deployed in the CoronaSurveys project, which has been collecting reports for the COVID-19 pandemic for more than two months. Results obtained by CoronaSurveys show the power and flexibility of the approach, suggesting that it could be an inexpensive and powerful tool for LMICs. △ Less

Submitted 26 June, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

Comments: Presented at The KDD Workshop on Humanitarian Map**, San Diego, California USA, August 24, 2020

arXiv:2004.05214 [pdf, other]

doi 10.1109/TPAMI.2020.3045007

A Review on Deep Learning Techniques for Video Prediction

Authors: Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Antonis Argyros

Abstract: The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demo… ▽ More The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demonstrated potential capabilities for extracting meaningful representations of the underlying patterns in natural videos. Motivated by the increasing interest in this task, we provide a review on the deep learning methods for prediction in video sequences. We firstly define the video prediction fundamentals, as well as mandatory background concepts and the most used datasets. Next, we carefully analyze existing video prediction models organized according to a proposed taxonomy, highlighting their contributions and their significance in the field. The summary of the datasets and methods is accompanied with experimental results that facilitate the assessment of the state of the art on a quantitative basis. The paper is summarized by drawing some general conclusions, identifying open research challenges and by pointing out future research directions. △ Less

Submitted 14 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Submitted to TPAMI

arXiv:1903.05238 [pdf, other]

doi 10.1016/j.cag.2019.07.003

A Visually Plausible Gras** System for Object Manipulation and Interaction in Virtual Reality Environments

Authors: Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez

Abstract: Interaction in virtual reality (VR) environments is essential to achieve a pleasant and immersive experience. Most of the currently existing VR applications, lack of robust object gras** and manipulation, which are the cornerstone of interactive systems. Therefore, we propose a realistic, flexible and robust gras** system that enables rich and real-time interactions in virtual environments. It… ▽ More Interaction in virtual reality (VR) environments is essential to achieve a pleasant and immersive experience. Most of the currently existing VR applications, lack of robust object gras** and manipulation, which are the cornerstone of interactive systems. Therefore, we propose a realistic, flexible and robust gras** system that enables rich and real-time interactions in virtual environments. It is visually realistic because it is completely user-controlled, flexible because it can be used for different hand configurations, and robust because it allows the manipulation of objects regardless their geometry, i.e. hand is automatically fitted to the object shape. In order to validate our proposal, an exhaustive qualitative and quantitative performance analysis has been carried out. On the one hand, qualitative evaluation was used in the assessment of the abstract aspects such as: hand movement realism, interaction realism and motor control. On the other hand, for the quantitative evaluation a novel error metric has been proposed to visually analyze the performed grips. This metric is based on the computation of the distance from the finger phalanges to the nearest contact point on the object surface. These contact points can be used with different application purposes, mainly in the field of robotics. As a conclusion, system evaluation reports a similar performance between users with previous experience in virtual reality applications and inexperienced users, referring to a steep learning curve. △ Less

Submitted 12 March, 2019; originally announced March 2019.

arXiv:1901.06514 [pdf, other]

The RobotriX: An eXtremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions

Authors: Alberto Garcia-Garcia, Pablo Martinez-Gonzalez, Sergiu Oprea, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Alvaro Jover-Alvarez

Abstract: Enter the RobotriX, an extremely photorealistic indoor dataset designed to enable the application of deep learning techniques to a wide variety of robotic vision problems. The RobotriX consists of hyperrealistic indoor scenes which are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by U… ▽ More Enter the RobotriX, an extremely photorealistic indoor dataset designed to enable the application of deep learning techniques to a wide variety of robotic vision problems. The RobotriX consists of hyperrealistic indoor scenes which are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth labels. By taking this approach, we were able to generate a dataset of 38 semantic classes totaling 8M stills recorded at +60 frames per second with full HD resolution. For each frame, RGB-D and 3D information is provided with full annotations in both spaces. Thanks to the high quality and quantity of both raw information and annotations, the RobotriX will serve as a new milestone for investigating 2D and 3D robotic vision tasks with large-scale data-driven techniques. △ Less

Submitted 19 January, 2019; originally announced January 2019.

arXiv:1901.06181 [pdf, other]

TactileGCN: A Graph Convolutional Network for Predicting Grasp Stability with Tactile Sensors

Authors: Alberto Garcia-Garcia, Brayan Stiven Zapata-Impata, Sergio Orts-Escolano, Pablo Gil, Jose Garcia-Rodriguez

Abstract: Tactile sensors provide useful contact data during the interaction with an object which can be used to accurately learn to determine the stability of a grasp. Most of the works in the literature represented tactile readings as plain feature vectors or matrix-like tactile images, using them to train machine learning models. In this work, we explore an alternative way of exploiting tactile informati… ▽ More Tactile sensors provide useful contact data during the interaction with an object which can be used to accurately learn to determine the stability of a grasp. Most of the works in the literature represented tactile readings as plain feature vectors or matrix-like tactile images, using them to train machine learning models. In this work, we explore an alternative way of exploiting tactile information to predict grasp stability by leveraging graph-like representations of tactile data, which preserve the actual spatial arrangement of the sensor's taxels and their locality. In experimentation, we trained a Graph Neural Network to binary classify grasps as stable or slippery ones. To train such network and prove its predictive capabilities for the problem at hand, we captured a novel dataset of approximately 5000 three-fingered grasps across 41 objects for training and 1000 grasps with 10 unknown objects for testing. Our experiments prove that this novel approach can be effectively used to predict grasp stability. △ Less

Submitted 18 January, 2019; originally announced January 2019.

arXiv:1810.06936 [pdf, other]

UnrealROX: An eXtremely Photorealistic Virtual Reality Environment for Robotics Simulations and Synthetic Data Generation

Authors: Pablo Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, Alvaro Jover-Alvarez, Sergio Orts-Escolano, Jose Garcia-Rodriguez

Abstract: Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. Those problems limit scale and quality. Synthetic data generation has… ▽ More Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. Those problems limit scale and quality. Synthetic data generation has become increasingly popular since it is faster to generate and automatic to annotate. However, most of the current datasets and environments lack realism, interactions, and details from the real world. UnrealROX is an environment built over Unreal Engine 4 which aims to reduce that reality gap by leveraging hyperrealistic indoor scenes that are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth annotations. This virtual reality environment enables robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual gras**, and navigation. △ Less

Submitted 8 November, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

Comments: Published in Virtual Reality journal

arXiv:1704.06857 [pdf, other]

A Review on Deep Learning Techniques Applied to Semantic Segmentation

Authors: Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez

Abstract: Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application… ▽ More Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques. △ Less

Submitted 22 April, 2017; originally announced April 2017.

Comments: Submitted to TPAMI on Apr. 22, 2017

Showing 1–11 of 11 results for author: Garcia-Garcia, A