-
Gradient-based Class Weighting for Unsupervised Domain Adaptation in Dense Prediction Visual Tasks
Authors:
Roberto Alcover-Couso,
Marcos Escudero-Viñolo,
Juan C. SanMiguel,
Jesus Bescós
Abstract:
In unsupervised domain adaptation (UDA), where models are trained on source data (e.g., synthetic) and adapted to target data (e.g., real-world) without target annotations, addressing the challenge of significant class imbalance remains an open issue. Despite considerable progress in bridging the domain gap, existing methods often experience performance degradation when confronted with highly imba…
▽ More
In unsupervised domain adaptation (UDA), where models are trained on source data (e.g., synthetic) and adapted to target data (e.g., real-world) without target annotations, addressing the challenge of significant class imbalance remains an open issue. Despite considerable progress in bridging the domain gap, existing methods often experience performance degradation when confronted with highly imbalanced dense prediction visual tasks like semantic and panoptic segmentation. This discrepancy becomes especially pronounced due to the lack of equivalent priors between the source and target domains, turning class imbalanced techniques used for other areas (e.g., image classification) ineffective in UDA scenarios. This paper proposes a class-imbalance mitigation strategy that incorporates class-weights into the UDA learning losses, but with the novelty of estimating these weights dynamically through the loss gradient, defining a Gradient-based class weighting (GBW) learning. GBW naturally increases the contribution of classes whose learning is hindered by large-represented classes, and has the advantage of being able to automatically and quickly adapt to the iteration training outcomes, avoiding explicitly curricular learning patterns common in loss-weighing strategies. Extensive experimentation validates the effectiveness of GBW across architectures (convolutional and transformer), UDA strategies (adversarial, self-training and entropy minimization), tasks (semantic and panoptic segmentation), and datasets (GTA and Synthia). Analysing the source of advantage, GBW consistently increases the recall of low represented classes.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Spacecraft Pose Estimation Based on Unsupervised Domain Adaptation and on a 3D-Guided Loss Combination
Authors:
Juan Ignacio Bravo Pérez-Villar,
Álvaro García-Martín,
Jesús Bescós
Abstract:
Spacecraft pose estimation is a key task to enable space missions in which two spacecrafts must navigate around each other. Current state-of-the-art algorithms for pose estimation employ data-driven techniques. However, there is an absence of real training data for spacecraft imaged in space conditions due to the costs and difficulties associated with the space environment. This has motivated the…
▽ More
Spacecraft pose estimation is a key task to enable space missions in which two spacecrafts must navigate around each other. Current state-of-the-art algorithms for pose estimation employ data-driven techniques. However, there is an absence of real training data for spacecraft imaged in space conditions due to the costs and difficulties associated with the space environment. This has motivated the introduction of 3D data simulators, solving the issue of data availability but introducing a large gap between the training (source) and test (target) domains. We explore a method that incorporates 3D structure into the spacecraft pose estimation pipeline to provide robustness to intensity domain shift and we present an algorithm for unsupervised domain adaptation with robust pseudo-labelling. Our solution has ranked second in the two categories of the 2021 Pose Estimation Challenge organised by the European Space Agency and the Stanford University, achieving the lowest average error over the two categories.
△ Less
Submitted 27 December, 2022;
originally announced December 2022.
-
Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss
Authors:
Alejandro López-Cifuentes,
Marcos Escudero-Viñolo,
Jesús Bescós,
Juan C. SanMiguel
Abstract:
Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduced via maximum activation maps, as the source knowledge. In this paper, we propose and analyse the us…
▽ More
Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduced via maximum activation maps, as the source knowledge. In this paper, we propose and analyse the use of a 2D frequency transform of the activation maps before transferring them. We pose that\textemdash by using global image cues rather than pixel estimates, this strategy enhances knowledge transferability in tasks such as scene recognition, defined by strong spatial and contextual relationships between multiple and varied concepts. To validate the proposed method, an extensive evaluation of the state-of-the-art in scene recognition is presented. Experimental results provide strong evidences that the proposed strategy enables the student network to better focus on the relevant image areas learnt by the teacher network, hence leading to better descriptive features and higher transferred performance than every other state-of-the-art alternative. We publicly release the training and evaluation framework used along this paper at http://www-vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
△ Less
Submitted 6 June, 2022; v1 submitted 4 May, 2022;
originally announced May 2022.
-
A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion Compensation for Action Recognition in the EPIC-Kitchens Dataset
Authors:
Alejandro López-Cifuentes,
Marcos Escudero-Viñolo,
Jesús Bescós
Abstract:
Action recognition is currently one of the top-challenging research fields in computer vision. Convolutional Neural Networks (CNNs) have significantly boosted its performance but rely on fixed-size spatio-temporal windows of analysis, reducing CNNs temporal receptive fields. Among action recognition datasets, egocentric recorded sequences have become of important relevance while entailing an addit…
▽ More
Action recognition is currently one of the top-challenging research fields in computer vision. Convolutional Neural Networks (CNNs) have significantly boosted its performance but rely on fixed-size spatio-temporal windows of analysis, reducing CNNs temporal receptive fields. Among action recognition datasets, egocentric recorded sequences have become of important relevance while entailing an additional challenge: ego-motion is unavoidably transferred to these sequences. The proposed method aims to cope with it by estimating this ego-motion or camera motion. The estimation is used to temporally partition video sequences into motion-compensated temporal \textit{chunks} showing the action under stable backgrounds and allowing for a content-driven temporal sampling. A CNN trained in an end-to-end fashion is used to extract temporal features from each \textit{chunk}, which are late fused. This process leads to the extraction of features from the whole temporal range of an action, increasing the temporal receptive field of the network.
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Semantic-Aware Scene Recognition
Authors:
Alejandro López-Cifuentes,
Marcos Escudero-Viñolo,
Jesús Bescós,
Álvaro García-Martín
Abstract:
Scene recognition is currently one of the top-challenging research fields in computer vision. This may be due to the ambiguity between classes: images of several scene classes may share similar objects, which causes confusion among them. The problem is aggravated when images of a particular scene class are notably different. Convolutional Neural Networks (CNNs) have significantly boosted performan…
▽ More
Scene recognition is currently one of the top-challenging research fields in computer vision. This may be due to the ambiguity between classes: images of several scene classes may share similar objects, which causes confusion among them. The problem is aggravated when images of a particular scene class are notably different. Convolutional Neural Networks (CNNs) have significantly boosted performance in scene recognition, albeit it is still far below from other recognition tasks (e.g., object or image recognition). In this paper, we describe a novel approach for scene recognition based on an end-to-end multi-modal CNN that combines image and context information by means of an attention module. Context information, in the shape of semantic segmentation, is used to gate features extracted from the RGB image by leveraging on information encoded in the semantic representation: the set of scene objects and stuff, and their relative locations. This gating process reinforces the learning of indicative scene content and enhances scene disambiguation by refocusing the receptive fields of the CNN towards them. Experimental results on four publicly available datasets show that the proposed approach outperforms every other state-of-the-art method while significantly reducing the number of network parameters. All the code and data used along this paper is available at https://github.com/vpulab/Semantic-Aware-Scene-Recognition
△ Less
Submitted 22 January, 2020; v1 submitted 5 September, 2019;
originally announced September 2019.
-
Semantic Driven Multi-Camera Pedestrian Detection
Authors:
Alejandro López-Cifuentes,
Marcos Escudero-Viñolo,
Jesús Bescós,
Pablo Carballeira
Abstract:
In the current worldwide situation, pedestrian detection has reemerged as a pivotal tool for intelligent video-based systems aiming to solve tasks such as pedestrian tracking, social distancing monitoring or pedestrian mass counting. Pedestrian detection methods, even the top performing ones, are highly sensitive to occlusions among pedestrians, which dramatically degrades their performance in cro…
▽ More
In the current worldwide situation, pedestrian detection has reemerged as a pivotal tool for intelligent video-based systems aiming to solve tasks such as pedestrian tracking, social distancing monitoring or pedestrian mass counting. Pedestrian detection methods, even the top performing ones, are highly sensitive to occlusions among pedestrians, which dramatically degrades their performance in crowded scenarios. The generalization of multi-camera set-ups permits to better confront occlusions by combining information from different viewpoints. In this paper, we present a multi-camera approach to globally combine pedestrian detections leveraging automatically extracted scene context. Contrarily to the majority of the methods of the state-of-the-art, the proposed approach is scene-agnostic, not requiring a tailored adaptation to the target scenario\textemdash e.g., via fine-tunning. This noteworthy attribute does not require \textit{ad hoc} training with labelled data, expediting the deployment of the proposed method in real-world situations. Context information, obtained via semantic segmentation, is used 1) to automatically generate a common Area of Interest for the scene and all the cameras, avoiding the usual need of manually defining it; and 2) to obtain detections for each camera by solving a global optimization problem that maximizes coherence of detections both in each 2D image and in the 3D scene. This process yields tightly-fitted bounding boxes that circumvent occlusions or miss-detections. Experimental results on five publicly available datasets show that the proposed approach outperforms state-of-the-art multi-camera pedestrian detectors, even some specifically trained on the target scenario, signifying the versatility and robustness of the proposed method without requiring ad-hoc annotations nor human-guided configuration.
△ Less
Submitted 7 April, 2022; v1 submitted 27 December, 2018;
originally announced December 2018.
-
Design and Processing of Invertible Orientation Scores of 3D Images for Enhancement of Complex Vasculature
Authors:
M. H. J. Janssen,
A. J. E. M. Janssen,
E. J. Bekkers,
J. Olivan Bescos,
R. Duits
Abstract:
The enhancement and detection of elongated structures in noisy image data is relevant for many biomedical imaging applications. To handle complex crossing structures in 2D images, 2D orientation scores $U: \mathbb{R} ^ 2\times S ^ 1 \rightarrow \mathbb{C}$ were introduced, which already showed their use in a variety of applications. Here we extend this work to 3D orientation scores…
▽ More
The enhancement and detection of elongated structures in noisy image data is relevant for many biomedical imaging applications. To handle complex crossing structures in 2D images, 2D orientation scores $U: \mathbb{R} ^ 2\times S ^ 1 \rightarrow \mathbb{C}$ were introduced, which already showed their use in a variety of applications. Here we extend this work to 3D orientation scores $U: \mathbb{R} ^ 3 \times S ^ 2\rightarrow \mathbb{C}$. First, we construct the orientation score from a given dataset, which is achieved by an invertible coherent state type of transform. For this transformation we introduce 3D versions of the 2D cake-wavelets, which are complex wavelets that can simultaneously detect oriented structures and oriented edges. Here we introduce two types of cake-wavelets, the first uses a discrete Fourier transform, the second is designed in the 3D generalized Zernike basis, allowing us to calculate analytical expressions for the spatial filters. Finally, we show two applications of the orientation score transformation. In the first application we propose an extension of crossing-preserving coherence enhancing diffusion via our invertible orientation scores of 3D images which we apply to real medical image data. In the second one we develop a new tubularity measure using 3D orientation scores and apply the tubularity measure to both artificial and real medical data.
△ Less
Submitted 27 November, 2017; v1 submitted 7 July, 2017;
originally announced July 2017.