Search | arXiv e-print repository

Hyperbolic Metric Learning for Visual Outlier Detection

Authors: Alvaro Gonzalez-Jimenez, Simone Lionetti, Dena Bazazian, Philippe Gottfrois, Fabian Gröger, Marc Pouly, Alexander Navarini

Abstract: Out-Of-Distribution (OOD) detection is critical to deploy deep learning models in safety-critical applications. However, the inherent hierarchical concept structure of visual data, which is instrumental to OOD detection, is often poorly captured by conventional methods based on Euclidean geometry. This work proposes a metric framework that leverages the strengths of Hyperbolic geometry for OOD det… ▽ More Out-Of-Distribution (OOD) detection is critical to deploy deep learning models in safety-critical applications. However, the inherent hierarchical concept structure of visual data, which is instrumental to OOD detection, is often poorly captured by conventional methods based on Euclidean geometry. This work proposes a metric framework that leverages the strengths of Hyperbolic geometry for OOD detection. Inspired by previous works that refine the decision boundary for OOD data with synthetic outliers, we extend this method to Hyperbolic space. Interestingly, we find that synthetic outliers do not benefit OOD detection in Hyperbolic space as they do in Euclidean space. Furthermore we explore the relationship between OOD detection performance and Hyperbolic embedding dimension, addressing practical concerns in resource-constrained environments. Extensive experiments show that our framework improves the FPR95 for OOD detection from 22\% to 15\% and from 49% to 28% on CIFAR-10 and CIFAR-100 respectively compared to Euclidean methods. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2304.06007 [pdf, other]

GPr-Net: Geometric Prototypical Network for Point Cloud Few-Shot Learning

Authors: Tejas Anvekar, Dena Bazazian

Abstract: In the realm of 3D-computer vision applications, point cloud few-shot learning plays a critical role. However, it poses an arduous challenge due to the sparsity, irregularity, and unordered nature of the data. Current methods rely on complex local geometric extraction techniques such as convolution, graph, and attention mechanisms, along with extensive data-driven pre-training tasks. These approac… ▽ More In the realm of 3D-computer vision applications, point cloud few-shot learning plays a critical role. However, it poses an arduous challenge due to the sparsity, irregularity, and unordered nature of the data. Current methods rely on complex local geometric extraction techniques such as convolution, graph, and attention mechanisms, along with extensive data-driven pre-training tasks. These approaches contradict the fundamental goal of few-shot learning, which is to facilitate efficient learning. To address this issue, we propose GPr-Net (Geometric Prototypical Network), a lightweight and computationally efficient geometric prototypical network that captures the intrinsic topology of point clouds and achieves superior performance. Our proposed method, IGI++ (Intrinsic Geometry Interpreter++) employs vector-based hand-crafted intrinsic geometry interpreters and Laplace vectors to extract and evaluate point cloud morphology, resulting in improved representations for FSL (Few-Shot Learning). Additionally, Laplace vectors enable the extraction of valuable features from point clouds with fewer points. To tackle the distribution drift challenge in few-shot metric learning, we leverage hyperbolic space and demonstrate that our approach handles intra and inter-class variance better than existing point cloud few-shot learning methods. Experimental results on the ModelNet40 dataset show that GPr-Net outperforms state-of-the-art methods in few-shot learning on point clouds, achieving utmost computational efficiency that is $170\times$ better than all existing works. The code is publicly available at https://github.com/TejasAnvekar/GPr-Net. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2204.09015 [pdf, other]

Dual-Domain Image Synthesis using Segmentation-Guided GAN

Authors: Dena Bazazian, Andrew Calway, Dima Damen

Abstract: We introduce a segmentation-guided approach to synthesise images that integrate features from two distinct domains. Images synthesised by our dual-domain model belong to one domain within the semantic mask, and to another in the rest of the image - smoothly integrated. We build on the successes of few-shot StyleGAN and single-shot semantic segmentation to minimise the amount of training required i… ▽ More We introduce a segmentation-guided approach to synthesise images that integrate features from two distinct domains. Images synthesised by our dual-domain model belong to one domain within the semantic mask, and to another in the rest of the image - smoothly integrated. We build on the successes of few-shot StyleGAN and single-shot semantic segmentation to minimise the amount of training required in utilising two domains. The method combines a few-shot cross-domain StyleGAN with a latent optimiser to achieve images containing features of two distinct domains. We use a segmentation-guided perceptual loss, which compares both pixel-level and activations between domain-specific and dual-domain synthetic images. Results demonstrate qualitatively and quantitatively that our model is capable of synthesising dual-domain images on a variety of objects (faces, horses, cats, cars), domains (natural, caricature, sketches) and part-based masks (eyes, nose, mouth, hair, car bonnet). The code is publicly available at: https://github.com/denabazazian/Dual-Domain-Synthesis. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: CVPR2022 Workshops. 14 pages, 19 figures

arXiv:2111.14762 [pdf, other]

Riemannian Functional Map Synchronization for Probabilistic Partial Correspondence in Shape Networks

Authors: Faria Huq, Adrish Dey, Sahra Yusuf, Dena Bazazian, Tolga Birdal, Nina Miolane

Abstract: We consider the problem of graph-matching on a network of 3D shapes with uncertainty quantification. We assume that the pairwise shape correspondences are efficiently represented as \emph{functional maps}, that match real-valued functions defined over pairs of shapes. By modeling functional maps between nearly isometric shapes as elements of the Lie group $SO(n)$, we employ \emph{synchronization}… ▽ More We consider the problem of graph-matching on a network of 3D shapes with uncertainty quantification. We assume that the pairwise shape correspondences are efficiently represented as \emph{functional maps}, that match real-valued functions defined over pairs of shapes. By modeling functional maps between nearly isometric shapes as elements of the Lie group $SO(n)$, we employ \emph{synchronization} to enforce cycle consistency of the collection of functional maps over the graph, hereby enhancing the accuracy of the individual maps. We further introduce a tempered Bayesian probabilistic inference framework on $SO(n)$. Our framework enables: (i) synchronization of functional maps as maximum-a-posteriori estimation on the Riemannian manifold of functional maps, (ii) sampling the solution space in our energy based model so as to quantify uncertainty in the synchronization problem. We dub the latter \emph{Riemannian Langevin Functional Map (RLFM) Sampler}. Our experiments demonstrate that constraining the synchronization on the Riemannian manifold $SO(n)$ improves the estimation of the functional maps, while our RLFM sampler provides for the first time an uncertainty quantification of the results. △ Less

Submitted 3 January, 2023; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: 16 pages

arXiv:1809.00854 [pdf, other]

Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images

Authors: Dena Bazazian, Dimosthenis Karatzas, Andrew D. Bagdanov

Abstract: Word spotting in natural scene images has many applications in scene understanding and visual assistance. In this paper we propose a technique to create and exploit an intermediate representation of images based on text attributes which are character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks t… ▽ More Word spotting in natural scene images has many applications in scene understanding and visual assistance. In this paper we propose a technique to create and exploit an intermediate representation of images based on text attributes which are character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise map** of the character distribution within candidate word regions. We call this representation the Soft-PHOC. Furthermore, we show how to use Soft-PHOC descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time War** (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera. △ Less

Submitted 11 October, 2019; v1 submitted 4 September, 2018; originally announced September 2018.

Comments: 9 pages, 10 figures, The Third International Workshop on Egocentric Perception, Interaction and Computing (EPIC) at ECCV2018

arXiv:1702.05089 [pdf, other]

Improving Text Proposals for Scene Images with Fully Convolutional Networks

Authors: Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, Andrew D. Bagdanov

Abstract: Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gom… ▽ More Text Proposals have emerged as a class-dependent version of object proposals - efficient approaches to reduce the search space of possible text object locations in an image. Combined with strong word classifiers, text proposals currently yield top state of the art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm of Gomez and Karatzas (2016), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the COCO-text datasets show superior performance over current state-of-the-art. △ Less

Submitted 16 February, 2017; originally announced February 2017.

Comments: 6 pages, 8 figures, International Conference on Pattern Recognition (ICPR) - DLPR (Deep Learning for Pattern Recognition) workshop

Showing 1–6 of 6 results for author: Bazazian, D