Search | arXiv e-print repository

On the Dual-Phase-Lag thermal response in the Pulsed Photoacoustic effect: 1D approach

Authors: L. F. Escamilla-Herrera, J. M. Domínguez-Derramadero, J. E. Alba-Rosales, F. J. García-Rodríguez, O. M. Medina-Cázares, G. Gutiérrez-Juárez

Abstract: The Photoacoustic (PA) effect is the mechanism to convert non-ionizing electromagnetic energy into mechanical one; its main application is PA imaging in biomedicine. In this work, and in order to consider the heat flux non-heuristically, we obtained exact solutions of a 1D boundary value problem of the Dual-Phase-Lag (DPL) heat conduction equation for a three-layer system in the frequency domain;… ▽ More The Photoacoustic (PA) effect is the mechanism to convert non-ionizing electromagnetic energy into mechanical one; its main application is PA imaging in biomedicine. In this work, and in order to consider the heat flux non-heuristically, we obtained exact solutions of a 1D boundary value problem of the Dual-Phase-Lag (DPL) heat conduction equation for a three-layer system in the frequency domain; once the thermal boundary problem was solved via the DPL model, the second derivative of temperature solutions were considered as the PA source in the respective 1D boundary value problem for the pressures, via the PA wave equation. Temperature and pressure solutions were explored by assuming two considerations; being the first one, that the characteristic thermal lag response time $τ_{_T}$ related with the DPL heat conduction model, is a free parameter on this effective model. The second one is that for the sake of simplicity, the whole is system is assumed to have the same value for the $τ_{_T}$, i.e., the three-layer system relax thermally in the same way. Both assumptions are made, since up to our knowledge, there are not first-principles proposals for the values of this parameter, nor experimental measurements available in the literature. By varying $τ_{_T}$, theoretical solutions for pressure can be adjusted to reproduce experimental results accurately; we have found that under this assumption if $τ_{_T}$ is close to the laser pulse time $τ_p$, acoustic multiple reflections are accurately reproduced in the frequency domain and, via Fast Fourier transforms, PA pulses are also reproduced accurately in the time domain. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 32 pages, 11 figures

arXiv:2406.07542 [pdf, other]

Cognitive Insights Across Languages: Enhancing Multimodal Interview Analysis

Authors: David Ortiz-Perez, Jose Garcia-Rodriguez, David Tomás

Abstract: Cognitive decline is a natural process that occurs as individuals age. Early diagnosis of anomalous decline is crucial for initiating professional treatment that can enhance the quality of life of those affected. To address this issue, we propose a multimodal model capable of predicting Mild Cognitive Impairment and cognitive scores. The TAUKADIAL dataset is used to conduct the evaluation, which c… ▽ More Cognitive decline is a natural process that occurs as individuals age. Early diagnosis of anomalous decline is crucial for initiating professional treatment that can enhance the quality of life of those affected. To address this issue, we propose a multimodal model capable of predicting Mild Cognitive Impairment and cognitive scores. The TAUKADIAL dataset is used to conduct the evaluation, which comprises audio recordings of clinical interviews. The proposed model demonstrates the ability to transcribe and differentiate between languages used in the interviews. Subsequently, the model extracts audio and text features, combining them into a multimodal architecture to achieve robust and generalized results. Our approach involves in-depth research to implement various features obtained from the proposed modalities. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: GitHub repository: https://github.com/davidorp/taukadial

arXiv:2404.09988 [pdf, other]

in2IN: Leveraging individual Information to Generate Human INteractions

Authors: Pablo Ruiz Ponce, German Barquero, Cristina Palmero, Sergio Escalera, Jose Garcia-Rodriguez

Abstract: Generating human-human motion interactions conditioned on textual descriptions is a very useful application in many areas such as robotics, gaming, animation, and the metaverse. Alongside this utility also comes a great difficulty in modeling the highly dimensional inter-personal dynamics. In addition, properly capturing the intra-personal diversity of interactions has a lot of challenges. Current… ▽ More Generating human-human motion interactions conditioned on textual descriptions is a very useful application in many areas such as robotics, gaming, animation, and the metaverse. Alongside this utility also comes a great difficulty in modeling the highly dimensional inter-personal dynamics. In addition, properly capturing the intra-personal diversity of interactions has a lot of challenges. Current methods generate interactions with limited diversity of intra-person dynamics due to the limitations of the available datasets and conditioning strategies. For this, we introduce in2IN, a novel diffusion model for human-human motion generation which is conditioned not only on the textual description of the overall interaction but also on the individual descriptions of the actions performed by each person involved in the interaction. To train this model, we use a large language model to extend the InterHuman dataset with individual descriptions. As a result, in2IN achieves state-of-the-art performance in the InterHuman dataset. Furthermore, in order to increase the intra-personal diversity on the existing interaction datasets, we propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D. As a result, DualMDM generates motions with higher individual diversity and improves control over the intra-person dynamics while maintaining inter-personal coherence. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Project page: https://pabloruizponce.github.io/in2IN/

arXiv:2307.12172 [pdf, ps, other]

Challenges for Monocular 6D Object Pose Estimation in Robotics

Authors: Stefan Thalhammer, Dominik Bauer, Peter Hönig, Jean-Baptiste Weibel, José García-Rodríguez, Markus Vincze

Abstract: Object pose estimation is a core perception task that enables, for example, object gras** and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state… ▽ More Object pose estimation is a core perception task that enables, for example, object gras** and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved. △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2302.11827

arXiv:2306.00129 [pdf, ps, other]

Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

Authors: Stefan Thalhammer, Jean-Baptiste Weibel, Markus Vincze, Jose Garcia-Rodriguez

Abstract: Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is objects unseen during training. Such works use deep template matching strategies to retrieve the closest template connected to a query image. This template retrieval implicitl… ▽ More Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is objects unseen during training. Such works use deep template matching strategies to retrieve the closest template connected to a query image. This template retrieval implicitly provides object class and pose. Despite the recent success and improvements of Vision Transformers over CNNs for many vision tasks, the state of the art uses CNN-based approaches for novel object pose estimation. This work evaluates and demonstrates the differences between self-supervised CNNs and Vision Transformers for deep template matching. In detail, both types of approaches are trained using contrastive learning to match training images against rendered templates of isolated objects. At test time, such templates are matched against query images of known and novel objects under challenging settings, such as clutter, occlusion and object symmetries, using masked cosine similarity. The presented results not only demonstrate that Vision Transformers improve in matching accuracy over CNNs, but also that for some cases pre-trained Vision Transformers do not need fine-tuning to do so. Furthermore, we highlight the differences in optimization and network architecture when comparing these two types of network for deep template matching. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2210.02175 [pdf, other]

Boundary-safe PINNs extension: Application to non-linear parabolic PDEs in counterparty credit risk

Authors: Joel P. Villarino, Álvaro Leitao, José A. García-Rodríguez

Abstract: The goal of this work is to develop deep learning numerical methods for solving option XVA pricing problems given by non-linear PDE models. A novel strategy for the treatment of the boundary conditions is proposed, which allows to get rid of the heuristic choice of the weights for the different addends that appear in the loss function related to the training process. It is based on defining the lo… ▽ More The goal of this work is to develop deep learning numerical methods for solving option XVA pricing problems given by non-linear PDE models. A novel strategy for the treatment of the boundary conditions is proposed, which allows to get rid of the heuristic choice of the weights for the different addends that appear in the loss function related to the training process. It is based on defining the losses associated to the boundaries by means of the PDEs that arise from substituting the related conditions into the model equation itself. Further, automatic differentiation is employed to obtain accurate approximation of the partial derivatives. △ Less

Submitted 5 October, 2022; originally announced October 2022.

MSC Class: 68T07; 35Q91; 65M99; 91G20

arXiv:2107.03751 [pdf, other]

Exploiting the relationship between visual and textual features in social networks for image classification with zero-shot deep learning

Authors: Luis Lucas, David Tomas, Jose Garcia-Rodriguez

Abstract: One of the main issues related to unsupervised machine learning is the cost of processing and extracting useful information from large datasets. In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture in multimodal environments (image and text) from social media. For this purpose, we used the InstaNY100K dataset and pro… ▽ More One of the main issues related to unsupervised machine learning is the cost of processing and extracting useful information from large datasets. In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture in multimodal environments (image and text) from social media. For this purpose, we used the InstaNY100K dataset and proposed a validation approach based on sampling techniques. Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part, and then adding the associated texts as support. The results obtained demonstrated that trained neural networks such as CLIP can be successfully applied to image classification with little fine-tuning, and considering the associated texts to the images can help to improve the accuracy depending on the goal. The results demonstrated what seems to be a promising research direction. △ Less

Submitted 8 July, 2021; originally announced July 2021.

arXiv:2104.11776 [pdf, other]

UnrealROX+: An Improved Tool for Acquiring Synthetic Data from Virtual 3D Environments

Authors: Pablo Martinez-Gonzalez, Sergiu Oprea, John Alejandro Castro-Vargas, Alberto Garcia-Garcia, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Markus Vincze

Abstract: Synthetic data generation has become essential in last years for feeding data-driven algorithms, which surpassed traditional techniques performance in almost every computer vision problem. Gathering and labelling the amount of data needed for these data-hungry models in the real world may become unfeasible and error-prone, while synthetic data give us the possibility of generating huge amounts of… ▽ More Synthetic data generation has become essential in last years for feeding data-driven algorithms, which surpassed traditional techniques performance in almost every computer vision problem. Gathering and labelling the amount of data needed for these data-hungry models in the real world may become unfeasible and error-prone, while synthetic data give us the possibility of generating huge amounts of data with pixel-perfect annotations. However, most synthetic datasets lack from enough realism in their rendered images. In that context UnrealROX generation tool was presented in 2019, allowing to generate highly realistic data, at high resolutions and framerates, with an efficient pipeline based on Unreal Engine, a cutting-edge videogame engine. UnrealROX enabled robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual gras**, and navigation. Nevertheless, its workflow was very tied to generate image sequences from a robotic on-board camera, making hard to generate data for other purposes. In this work, we present UnrealROX+, an improved version of UnrealROX where its decoupled and easy-to-use data acquisition system allows to quickly design and generate data in a much more flexible and customizable way. Moreover, it is packaged as an Unreal plug-in, which makes it more comfortable to use with already existing Unreal projects, and it also includes new features such as generating albedo or a Python API for interacting with the virtual environment from Deep Learning frameworks. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: Accepted at International Joint Conference on Neural Networks (IJCNN) 2021

arXiv:2103.15017 [pdf, other]

H-GAN: the power of GANs in your Hands

Authors: Sergiu Oprea, Giorgos Karvounas, Pablo Martinez-Gonzalez, Nikolaos Kyriazis, Sergio Orts-Escolano, Iason Oikonomidis, Alberto Garcia-Garcia, Aggeliki Tsoli, Jose Garcia-Rodriguez, Antonis Argyros

Abstract: We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators. It is designed to translate synthetic images of hands to the real domain. Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data. We strive to provide the perfect blend of a realistic hand… ▽ More We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators. It is designed to translate synthetic images of hands to the real domain. Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data. We strive to provide the perfect blend of a realistic hand appearance with synthetic annotations. Relying on image-to-image translation, we improve the appearance of synthetic hands to approximate the statistical distribution underlying a collection of real images of hands. H-GAN tackles not only the cross-domain tone map** but also structural differences in localized areas such as shading discontinuities. Results are evaluated on a qualitative and quantitative basis improving previous works. Furthermore, we relied on the hand classification task to claim our generated hands are statistically similar to the real domain of hands. △ Less

Submitted 21 April, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

Comments: Paper accepted at The International Joint Conference on Neural Networks (IJCNN) 2021

arXiv:2004.05214 [pdf, other]

doi 10.1109/TPAMI.2020.3045007

A Review on Deep Learning Techniques for Video Prediction

Authors: Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Antonis Argyros

Abstract: The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demo… ▽ More The ability to predict, anticipate and reason about future outcomes is a key component of intelligent decision-making systems. In light of the success of deep learning in computer vision, deep-learning-based video prediction emerged as a promising research direction. Defined as a self-supervised learning task, video prediction represents a suitable framework for representation learning, as it demonstrated potential capabilities for extracting meaningful representations of the underlying patterns in natural videos. Motivated by the increasing interest in this task, we provide a review on the deep learning methods for prediction in video sequences. We firstly define the video prediction fundamentals, as well as mandatory background concepts and the most used datasets. Next, we carefully analyze existing video prediction models organized according to a proposed taxonomy, highlighting their contributions and their significance in the field. The summary of the datasets and methods is accompanied with experimental results that facilitate the assessment of the state of the art on a quantitative basis. The paper is summarized by drawing some general conclusions, identifying open research challenges and by pointing out future research directions. △ Less

Submitted 14 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Submitted to TPAMI

arXiv:1903.05238 [pdf, other]

doi 10.1016/j.cag.2019.07.003

A Visually Plausible Gras** System for Object Manipulation and Interaction in Virtual Reality Environments

Authors: Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez

Abstract: Interaction in virtual reality (VR) environments is essential to achieve a pleasant and immersive experience. Most of the currently existing VR applications, lack of robust object gras** and manipulation, which are the cornerstone of interactive systems. Therefore, we propose a realistic, flexible and robust gras** system that enables rich and real-time interactions in virtual environments. It… ▽ More Interaction in virtual reality (VR) environments is essential to achieve a pleasant and immersive experience. Most of the currently existing VR applications, lack of robust object gras** and manipulation, which are the cornerstone of interactive systems. Therefore, we propose a realistic, flexible and robust gras** system that enables rich and real-time interactions in virtual environments. It is visually realistic because it is completely user-controlled, flexible because it can be used for different hand configurations, and robust because it allows the manipulation of objects regardless their geometry, i.e. hand is automatically fitted to the object shape. In order to validate our proposal, an exhaustive qualitative and quantitative performance analysis has been carried out. On the one hand, qualitative evaluation was used in the assessment of the abstract aspects such as: hand movement realism, interaction realism and motor control. On the other hand, for the quantitative evaluation a novel error metric has been proposed to visually analyze the performed grips. This metric is based on the computation of the distance from the finger phalanges to the nearest contact point on the object surface. These contact points can be used with different application purposes, mainly in the field of robotics. As a conclusion, system evaluation reports a similar performance between users with previous experience in virtual reality applications and inexperienced users, referring to a steep learning curve. △ Less

Submitted 12 March, 2019; originally announced March 2019.

arXiv:1901.06514 [pdf, other]

The RobotriX: An eXtremely Photorealistic and Very-Large-Scale Indoor Dataset of Sequences with Robot Trajectories and Interactions

Authors: Alberto Garcia-Garcia, Pablo Martinez-Gonzalez, Sergiu Oprea, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez, Alvaro Jover-Alvarez

Abstract: Enter the RobotriX, an extremely photorealistic indoor dataset designed to enable the application of deep learning techniques to a wide variety of robotic vision problems. The RobotriX consists of hyperrealistic indoor scenes which are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by U… ▽ More Enter the RobotriX, an extremely photorealistic indoor dataset designed to enable the application of deep learning techniques to a wide variety of robotic vision problems. The RobotriX consists of hyperrealistic indoor scenes which are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth labels. By taking this approach, we were able to generate a dataset of 38 semantic classes totaling 8M stills recorded at +60 frames per second with full HD resolution. For each frame, RGB-D and 3D information is provided with full annotations in both spaces. Thanks to the high quality and quantity of both raw information and annotations, the RobotriX will serve as a new milestone for investigating 2D and 3D robotic vision tasks with large-scale data-driven techniques. △ Less

Submitted 19 January, 2019; originally announced January 2019.

arXiv:1901.06181 [pdf, other]

TactileGCN: A Graph Convolutional Network for Predicting Grasp Stability with Tactile Sensors

Authors: Alberto Garcia-Garcia, Brayan Stiven Zapata-Impata, Sergio Orts-Escolano, Pablo Gil, Jose Garcia-Rodriguez

Abstract: Tactile sensors provide useful contact data during the interaction with an object which can be used to accurately learn to determine the stability of a grasp. Most of the works in the literature represented tactile readings as plain feature vectors or matrix-like tactile images, using them to train machine learning models. In this work, we explore an alternative way of exploiting tactile informati… ▽ More Tactile sensors provide useful contact data during the interaction with an object which can be used to accurately learn to determine the stability of a grasp. Most of the works in the literature represented tactile readings as plain feature vectors or matrix-like tactile images, using them to train machine learning models. In this work, we explore an alternative way of exploiting tactile information to predict grasp stability by leveraging graph-like representations of tactile data, which preserve the actual spatial arrangement of the sensor's taxels and their locality. In experimentation, we trained a Graph Neural Network to binary classify grasps as stable or slippery ones. To train such network and prove its predictive capabilities for the problem at hand, we captured a novel dataset of approximately 5000 three-fingered grasps across 41 objects for training and 1000 grasps with 10 unknown objects for testing. Our experiments prove that this novel approach can be effectively used to predict grasp stability. △ Less

Submitted 18 January, 2019; originally announced January 2019.

arXiv:1810.06936 [pdf, other]

UnrealROX: An eXtremely Photorealistic Virtual Reality Environment for Robotics Simulations and Synthetic Data Generation

Authors: Pablo Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, Alvaro Jover-Alvarez, Sergio Orts-Escolano, Jose Garcia-Rodriguez

Abstract: Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. Those problems limit scale and quality. Synthetic data generation has… ▽ More Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. Those problems limit scale and quality. Synthetic data generation has become increasingly popular since it is faster to generate and automatic to annotate. However, most of the current datasets and environments lack realism, interactions, and details from the real world. UnrealROX is an environment built over Unreal Engine 4 which aims to reduce that reality gap by leveraging hyperrealistic indoor scenes that are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth annotations. This virtual reality environment enables robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual gras**, and navigation. △ Less

Submitted 8 November, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

Comments: Published in Virtual Reality journal

arXiv:1708.01143 [pdf, other]

doi 10.1016/j.asoc.2015.05.007

Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC

Authors: Marcelo Saval-Calvo, Jorge Azorin-Lopez, Andres Fuster-Guillo, Jose Garcia-Rodriguez

Abstract: Plane model extraction from three-dimensional point clouds is a necessary step in many different applications such as planar object reconstruction, indoor map** and indoor localization. Different RANdom SAmple Consensus (RANSAC)-based methods have been proposed for this purpose in recent years. In this study, we propose a novel method-based on RANSAC called Multiplane Model Estimation, which can… ▽ More Plane model extraction from three-dimensional point clouds is a necessary step in many different applications such as planar object reconstruction, indoor map** and indoor localization. Different RANdom SAmple Consensus (RANSAC)-based methods have been proposed for this purpose in recent years. In this study, we propose a novel method-based on RANSAC called Multiplane Model Estimation, which can estimate multiple plane models simultaneously from a noisy point cloud using the knowledge extracted from a scene (or an object) in order to reconstruct it accurately. This method comprises two steps: first, it clusters the data into planar faces that preserve some constraints defined by knowledge related to the object (e.g., the angles between faces); and second, the models of the planes are estimated based on these data using a novel multi-constraint RANSAC. We performed experiments in the clustering and RANSAC stages, which showed that the proposed method performed better than state-of-the-art methods. △ Less

Submitted 3 August, 2017; originally announced August 2017.

Journal ref: Applied Soft Computing, Vol. 34, p. 572-586 (2015)

arXiv:1704.06857 [pdf, other]

A Review on Deep Learning Techniques Applied to Semantic Segmentation

Authors: Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, Jose Garcia-Rodriguez

Abstract: Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application… ▽ More Image semantic segmentation is more and more being of interest for computer vision and machine learning researchers. Many applications on the rise need accurate and efficient segmentation mechanisms: autonomous driving, indoor navigation, and even virtual or augmented reality systems to name a few. This demand coincides with the rise of deep learning approaches in almost every field or application target related to computer vision, including semantic segmentation or scene understanding. This paper provides a review on deep learning methods for semantic segmentation applied to various application areas. Firstly, we describe the terminology of this field as well as mandatory background concepts. Next, the main datasets and challenges are exposed to help researchers decide which are the ones that best suit their needs and their targets. Then, existing methods are reviewed, highlighting their contributions and their significance in the field. Finally, quantitative results are given for the described methods and the datasets in which they were evaluated, following up with a discussion of the results. At last, we point out a set of promising future works and draw our own conclusions about the state of the art of semantic segmentation using deep learning techniques. △ Less

Submitted 22 April, 2017; originally announced April 2017.

Comments: Submitted to TPAMI on Apr. 22, 2017

arXiv:1612.06178 [pdf, ps, other]

Representation Growth

Authors: Javier García-Rodríguez

Abstract: The main results in this thesis deal with the representation growth of certain classes of groups. In chapter $1$ we present the required preliminary theory. In chapter $2$ we introduce the Congruence Subgroup Problem for an algebraic group $G$ defined over a global field $k$. In chapter $3$ we consider $Γ=G(\mathcal{O}_S)$ an arithmetic subgroup of a semisimple algebraic $k$-group for some global… ▽ More The main results in this thesis deal with the representation growth of certain classes of groups. In chapter $1$ we present the required preliminary theory. In chapter $2$ we introduce the Congruence Subgroup Problem for an algebraic group $G$ defined over a global field $k$. In chapter $3$ we consider $Γ=G(\mathcal{O}_S)$ an arithmetic subgroup of a semisimple algebraic $k$-group for some global field $k$ with ring of $S$-integers $\mathcal{O}_S$. If the Lie algebra of $G$ is perfect, Lubotzky and Martin showed that if $Γ$ has the weak Congruence Subgroup Property then $Γ$ has Polynomial Representation Growth, that is, $r_n(Γ)\leq p(n)$ for some polynomial $p$. By using a different approach, we show that the same holds for any semisimple algebraic group $G$ including those with a non-perfect Lie algebra. In chapter $4$ we show that if $Γ$ has the weak Congruence Subgroup Property then $s_n(Γ)\leq n^{D\log n}$ for some constant $D$, where $s_n(Γ)$ denotes the number of subgroups of $Γ$ of index at most $n$. In chapter $5$ we consider $Γ=1+J$, where $J$ is a finite nilpotent associative algebra, this is called an algebra group. We provide counterexamples for any prime $p$ for the Fake Degree Conjecture by looking at groups of the form $Γ=1+I_{\mathbb{F}_q}$, where $I_{\mathbb{F}_q}$ is the augmentation ideal of the group algebra $\mathbb{F}_q[π]$ for some $p$-group $π$. Moreover, we show that for such groups $r_1(Γ)=q^{K(π)-1}|B_0(π)|$, where $B_0(π)$ is the Bogomolov multiplier of $π$. Finally in chapter $6$, we consider $Γ=\prod_{i\in I} S_i$, where the $S_i$ are nonabelian finite simple group. We show that within this class one can obtain any rate of representation growth, i.e., for any $α>0$ there exists $Γ=\prod_{i\in I}S_i$ such that $r_n(Γ)\sim n^α$. △ Less

Submitted 19 December, 2016; originally announced December 2016.

Comments: Ph.D. Thesis of the author

arXiv:1502.03242 [pdf, ps, other]

Units of group rings, the Bogomolov multiplier, and the fake degree conjecture

Authors: Javier Garcia-Rodriguez, Andrei Jaikin-Zapirain, Urban Jezernik

Abstract: Let $π$ be a finite $p$-group and $\mathbb{F}_q$ a finite field with $q=p^n$ elements. Denote by $\mathrm{I}_{\mathbb{F}_q}$ the augmentation ideal of the group ring $\mathbb{F}_q[π]$. We have found a surprising relation between the abelianization of $1+\mathrm{I}_{\mathbb{F}_q}$, the Bogomolov multiplier $\mathrm{B}_0(π)$ of $π$ and the number of conjugacy classes $\mathrm{k}(π)$ of $π$: \[ | (1+… ▽ More Let $π$ be a finite $p$-group and $\mathbb{F}_q$ a finite field with $q=p^n$ elements. Denote by $\mathrm{I}_{\mathbb{F}_q}$ the augmentation ideal of the group ring $\mathbb{F}_q[π]$. We have found a surprising relation between the abelianization of $1+\mathrm{I}_{\mathbb{F}_q}$, the Bogomolov multiplier $\mathrm{B}_0(π)$ of $π$ and the number of conjugacy classes $\mathrm{k}(π)$ of $π$: \[ | (1+\mathrm{I}_{\mathbb{F}_q})_{\mathrm{ab}} |=q^{\mathrm{k}(π)-1}|\mathrm{B}_0(π)|. \] In particular, if $π$ is a finite $p$-group with a non-trivial Bogomolov multiplier, then $1+\mathrm{I}_{\mathbb{F}_q}$ is a counterexample to the fake degree conjecture proposed by M. Isaacs. △ Less

Submitted 11 February, 2015; originally announced February 2015.

Comments: 9 pages

Showing 1–18 of 18 results for author: Garcia-Rodriguez, J