-
Composed Image Retrieval for Remote Sensing
Authors:
Bill Psomas,
Ioannis Kakogeorgiou,
Nikos Efthymiadis,
Giorgos Tolias,
Ondrej Chum,
Yannis Avrithis,
Konstantinos Karantzalos
Abstract:
This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or textual. Various attributes can be modified by the textual part, such as shape, color, or context. A novel method fusing image-to-image and text-to-image similarity is…
▽ More
This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or textual. Various attributes can be modified by the textual part, such as shape, color, or context. A novel method fusing image-to-image and text-to-image similarity is introduced. We demonstrate that a vision-language model possesses sufficient descriptive power and no further learning step or training data are necessary. We present a new evaluation benchmark focused on color, context, density, existence, quantity, and shape modifications. Our work not only sets the state-of-the-art for this task, but also serves as a foundational step in addressing a gap in the field of remote sensing image retrieval. Code at: https://github.com/billpsomas/rscir
△ Less
Submitted 29 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Label Propagation for Zero-shot Classification with Vision-Language Models
Authors:
Vladan Stojnić,
Yannis Kalantidis,
Giorgos Tolias
Abstract:
Vision-Language Models (VLMs) have demonstrated impressive performance on zero-shot classification, i.e. classification when provided merely with a list of class names. In this paper, we tackle the case of zero-shot classification in the presence of unlabeled data. We leverage the graph structure of the unlabeled data and introduce ZLaP, a method based on label propagation (LP) that utilizes geode…
▽ More
Vision-Language Models (VLMs) have demonstrated impressive performance on zero-shot classification, i.e. classification when provided merely with a list of class names. In this paper, we tackle the case of zero-shot classification in the presence of unlabeled data. We leverage the graph structure of the unlabeled data and introduce ZLaP, a method based on label propagation (LP) that utilizes geodesic distances for classification. We tailor LP to graphs containing both text and image features and further propose an efficient method for performing inductive inference based on a dual solution and a sparsification step. We perform extensive experiments to evaluate the effectiveness of our method on 14 common datasets and show that ZLaP outperforms the latest related works. Code: https://github.com/vladan-stojnic/ZLaP
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Training Ensembles with Inliers and Outliers for Semi-supervised Active Learning
Authors:
Vladan Stojnić,
Zakaria Laskar,
Giorgos Tolias
Abstract:
Deep active learning in the presence of outlier examples poses a realistic yet challenging scenario. Acquiring unlabeled data for annotation requires a delicate balance between avoiding outliers to conserve the annotation budget and prioritizing useful inlier examples for effective training. In this work, we present an approach that leverages three highly synergistic components, which are identifi…
▽ More
Deep active learning in the presence of outlier examples poses a realistic yet challenging scenario. Acquiring unlabeled data for annotation requires a delicate balance between avoiding outliers to conserve the annotation budget and prioritizing useful inlier examples for effective training. In this work, we present an approach that leverages three highly synergistic components, which are identified as key ingredients: joint classifier training with inliers and outliers, semi-supervised learning through pseudo-labeling, and model ensembling. Our work demonstrates that ensembling significantly enhances the accuracy of pseudo-labeling and improves the quality of data acquisition. By enabling semi-supervision through the joint training process, where outliers are properly handled, we observe a substantial boost in classifier accuracy through the use of all available unlabeled examples. Notably, we reveal that the integration of joint training renders explicit outlier detection unnecessary; a conventional component for acquisition in prior work. The three key components align seamlessly with numerous existing approaches. Through empirical evaluations, we showcase that their combined use leads to a performance increase. Remarkably, despite its simplicity, our proposed approach outperforms all other methods in terms of performance. Code: https://github.com/vladan-stojnic/active-outliers
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
The 2023 Video Similarity Dataset and Challenge
Authors:
Ed Pizzi,
Giorgos Kordopatis-Zilos,
Hiral Patel,
Gheorghe Postelnicu,
Sugosh Nagavara Ravindra,
Akshay Gupta,
Symeon Papadopoulos,
Giorgos Tolias,
Matthijs Douze
Abstract:
This work introduces a dataset, benchmark, and challenge for the problem of video copy detection and localization. The problem comprises two distinct but related tasks: determining whether a query video shares content with a reference video ("detection"), and additionally temporally localizing the shared content within each video ("localization"). The benchmark is designed to evaluate methods on t…
▽ More
This work introduces a dataset, benchmark, and challenge for the problem of video copy detection and localization. The problem comprises two distinct but related tasks: determining whether a query video shares content with a reference video ("detection"), and additionally temporally localizing the shared content within each video ("localization"). The benchmark is designed to evaluate methods on these two tasks, and simulates a realistic needle-in-haystack setting, where the majority of both query and reference videos are "distractors" containing no copied content. We propose a metric that reflects both detection and localization accuracy. The associated challenge consists of two corresponding tracks, each with restrictions that reflect real-world settings. We provide implementation code for evaluation and baselines. We also analyze the results and methods of the top submissions to the challenge. The dataset, baseline methods and evaluation code is publicly available and will be discussed at a dedicated CVPR'23 workshop.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
Authors:
Shuzhe Wang,
Zakaria Laskar,
Iaroslav Melekhov,
Xiaotian Li,
Yi Zhao,
Giorgos Tolias,
Juho Kannala
Abstract:
Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the map** between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly…
▽ More
Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the map** between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12 Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Self-Supervised Video Similarity Learning
Authors:
Giorgos Kordopatis-Zilos,
Giorgos Tolias,
Christos Tzelepis,
Ioannis Kompatsiaris,
Ioannis Patras,
Symeon Papadopoulos
Abstract:
We introduce S$^2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labe…
▽ More
We introduce S$^2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labeled data. This is achieved by learning via instance-discrimination with task-tailored augmentations and the widely used InfoNCE loss together with an additional loss operating jointly on self-similarity and hard-negative similarity. We benchmark our method on tasks where video relevance is defined with varying granularity, ranging from video copies to videos depicting the same incident or event. We learn a single universal model that achieves state-of-the-art performance on all tasks, surpassing previously proposed methods that use labeled data. The code and pretrained models are publicly available at: https://github.com/gkordo/s2vs
△ Less
Submitted 16 June, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
Rethinking matching-based few-shot action recognition
Authors:
Juliette Bertrand,
Yannis Kalantidis,
Giorgos Tolias
Abstract:
Few-shot action recognition, i.e. recognizing new action classes given only a few examples, benefits from incorporating temporal information. Prior work either encodes such information in the representation itself and learns classifiers at test time, or obtains frame-level features and performs pairwise temporal matching. We first evaluate a number of matching-based approaches using features from…
▽ More
Few-shot action recognition, i.e. recognizing new action classes given only a few examples, benefits from incorporating temporal information. Prior work either encodes such information in the representation itself and learns classifiers at test time, or obtains frame-level features and performs pairwise temporal matching. We first evaluate a number of matching-based approaches using features from spatio-temporal backbones, a comparison missing from the literature, and show that the gap in performance between simple baselines and more complicated methods is significantly reduced. Inspired by this, we propose Chamfer++, a non-temporal matching function that achieves state-of-the-art results in few-shot action recognition. We show that, when starting from temporal features, our parameter-free and interpretable approach can outperform all other matching-based and classifier methods for one-shot action recognition on three common datasets without using temporal information in the matching stage. Project page: https://jbertrand89.github.io/matching-based-fsar
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Large-to-small Image Resolution Asymmetry in Deep Metric Learning
Authors:
Pavel Suma,
Giorgos Tolias
Abstract:
Deep metric learning for vision is trained by optimizing a representation network to map (non-)matching image pairs to (non-)similar representations. During testing, which typically corresponds to image retrieval, both database and query examples are processed by the same network to obtain the representation used for similarity estimation and ranking. In this work, we explore an asymmetric setup b…
▽ More
Deep metric learning for vision is trained by optimizing a representation network to map (non-)matching image pairs to (non-)similar representations. During testing, which typically corresponds to image retrieval, both database and query examples are processed by the same network to obtain the representation used for similarity estimation and ranking. In this work, we explore an asymmetric setup by light-weight processing of the query at a small image resolution to enable fast representation extraction. The goal is to obtain a network for database examples that is trained to operate on large resolution images and benefits from fine-grained image details, and a second network for query examples that operates on small resolution images but preserves a representation space aligned with that of the database network. We achieve this with a distillation approach that transfers knowledge from a fixed teacher network to a student via a loss that operates per image and solely relies on coupled augmentations without the use of any labels. In contrast to prior work that explores such asymmetry from the point of view of different network architectures, this work uses the same architecture but modifies the image resolution. We conclude that resolution asymmetry is a better way to optimize the performance/efficiency trade-off than architecture asymmetry. Evaluation is performed on three standard deep metric learning benchmarks, namely CUB200, Cars196, and SOP. Code: https://github.com/pavelsuma/raml
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Edge Augmentation for Large-Scale Sketch Recognition without Sketches
Authors:
Nikos Efthymiadis,
Giorgos Tolias,
Ondrej Chum
Abstract:
This work addresses scaling up the sketch classification task into a large number of categories. Collecting sketches for training is a slow and tedious process that has so far precluded any attempts to large-scale sketch recognition. We overcome the lack of training sketch data by exploiting labeled collections of natural images that are easier to obtain. To bridge the domain gap we present a nove…
▽ More
This work addresses scaling up the sketch classification task into a large number of categories. Collecting sketches for training is a slow and tedious process that has so far precluded any attempts to large-scale sketch recognition. We overcome the lack of training sketch data by exploiting labeled collections of natural images that are easier to obtain. To bridge the domain gap we present a novel augmentation technique that is tailored to the task of learning sketch recognition from a training set of natural images. Randomization is introduced in the parameters of edge detection and edge selection. Natural images are translated to a pseudo-novel domain called "randomized Binary Thin Edges" (rBTE), which is used as a training domain instead of natural images. The ability to scale up is demonstrated by training CNN-based sketch recognition of more than 2.5 times larger number of categories than used previously. For this purpose, a dataset of natural images from 874 categories is constructed by combining a number of popular computer vision datasets. The categories are selected to be suitable for sketch recognition. To estimate the performance, a subset of 393 categories with sketches is also collected.
△ Less
Submitted 2 June, 2022; v1 submitted 26 February, 2022;
originally announced February 2022.
-
Results and findings of the 2021 Image Similarity Challenge
Authors:
Zoë Papakipos,
Giorgos Tolias,
Tomas Jenicek,
Ed Pizzi,
Shuhei Yokoo,
Wenhao Wang,
Yifan Sun,
Weipu Zhang,
Yi Yang,
Sanjay Addicam,
Sergio Manuel Papadakis,
Cristian Canton Ferrer,
Ondrej Chum,
Matthijs Douze
Abstract:
The 2021 Image Similarity Challenge introduced a dataset to serve as a new benchmark to evaluate recent image copy detection methods. There were 200 participants to the competition. This paper presents a quantitative and qualitative analysis of the top submissions. It appears that the most difficult image transformations involve either severe image crops or hiding into unrelated images, combined w…
▽ More
The 2021 Image Similarity Challenge introduced a dataset to serve as a new benchmark to evaluate recent image copy detection methods. There were 200 participants to the competition. This paper presents a quantitative and qualitative analysis of the top submissions. It appears that the most difficult image transformations involve either severe image crops or hiding into unrelated images, combined with local pixel perturbations. The key algorithmic elements in the winning submissions are: training on strong augmentations, self-supervised learning, score normalization, explicit overlay detection, and global descriptor matching followed by pairwise image comparison.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
The Met Dataset: Instance-level Recognition for Artworks
Authors:
Nikolaos-Antonios Ypsilantis,
Noa Garcia,
Guangxing Han,
Sarah Ibrahimi,
Nanne Van Noord,
Giorgos Tolias
Abstract:
This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhib…
▽ More
This work introduces a dataset for large-scale instance-level recognition in the domain of artworks. The proposed benchmark exhibits a number of different challenges such as large inter-class similarity, long tail distribution, and many classes. We rely on the open access collection of The Met museum to form a large training set of about 224k classes, where each class corresponds to a museum exhibit with photos taken under studio conditions. Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing. Testing is additionally performed on a set of images not related to Met exhibits making the task resemble an out-of-distribution detection problem. The proposed benchmark follows the paradigm of other recent datasets for instance-level recognition on different domains to encourage research on domain independent approaches. A number of suitable approaches are evaluated to offer a testbed for future comparisons. Self-supervised and supervised contrastive learning are effectively combined to train the backbone which is used for non-parametric classification that is shown as a promising direction. Dataset webpage: http://cmp.felk.cvut.cz/met/
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Recall@k Surrogate Loss with Large Batches and Similarity Mixup
Authors:
Yash Patel,
Giorgos Tolias,
Jiri Matas
Abstract:
This work focuses on learning deep visual representation models for retrieval by exploring the interplay between a new loss function, the batch size, and a new regularization approach. Direct optimization, by gradient descent, of an evaluation metric, is not possible when it is non-differentiable, which is the case for recall in retrieval. A differentiable surrogate loss for the recall is proposed…
▽ More
This work focuses on learning deep visual representation models for retrieval by exploring the interplay between a new loss function, the batch size, and a new regularization approach. Direct optimization, by gradient descent, of an evaluation metric, is not possible when it is non-differentiable, which is the case for recall in retrieval. A differentiable surrogate loss for the recall is proposed in this work. Using an implementation that sidesteps the hardware constraints of the GPU memory, the method trains with a very large batch size, which is essential for metrics computed on the entire retrieval database. It is assisted by an efficient mixup regularization approach that operates on pairwise scalar similarities and virtually increases the batch size further. The suggested method achieves state-of-the-art performance in several image retrieval benchmarks when used for deep metric learning. For instance-level recognition, the method outperforms similar approaches that train using an approximation of average precision.
△ Less
Submitted 25 March, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
The 2021 Image Similarity Dataset and Challenge
Authors:
Matthijs Douze,
Giorgos Tolias,
Ed Pizzi,
Zoë Papakipos,
Lowik Chanussot,
Filip Radenovic,
Tomas Jenicek,
Maxim Maximov,
Laura Leal-Taixé,
Ismail Elezi,
Ondřej Chum,
Cristian Canton Ferrer
Abstract:
This paper introduces a new benchmark for large-scale image similarity detection. This benchmark is used for the Image Similarity Challenge at NeurIPS'21 (ISC2021). The goal is to determine whether a query image is a modified copy of any image in a reference corpus of size 1~million. The benchmark features a variety of image transformations such as automated transformations, hand-crafted image edi…
▽ More
This paper introduces a new benchmark for large-scale image similarity detection. This benchmark is used for the Image Similarity Challenge at NeurIPS'21 (ISC2021). The goal is to determine whether a query image is a modified copy of any image in a reference corpus of size 1~million. The benchmark features a variety of image transformations such as automated transformations, hand-crafted image edits and machine-learning based manipulations. This mimics real-life cases appearing in social media, for example for integrity-related problems dealing with misinformation and objectionable content. The strength of the image manipulations, and therefore the difficulty of the benchmark, is calibrated according to the performance of a set of baseline approaches. Both the query and reference set contain a majority of "distractor" images that do not match, which corresponds to a real-life needle-in-haystack setting, and the evaluation metric reflects that. We expect the DISC21 benchmark to promote image copy detection as an important and challenging computer vision task and refresh the state of the art. Code and data are available at https://github.com/facebookresearch/isc2021
△ Less
Submitted 21 February, 2022; v1 submitted 17 June, 2021;
originally announced June 2021.
-
An alternative approach to sharp $L^1$ estimates for the dyadic maximal operator
Authors:
Eleftherios N. Nikolidakis,
Andreas G. Tolias
Abstract:
We prove sharp $L^1$ inequalities for the dyadic maximal function $M_Tφ$ when $φ$ satisfies certain $L^1$ and $L^{\infty}$ conditions
We prove sharp $L^1$ inequalities for the dyadic maximal function $M_Tφ$ when $φ$ satisfies certain $L^1$ and $L^{\infty}$ conditions
△ Less
Submitted 8 March, 2022; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Learning and aggregating deep local descriptors for instance-level recognition
Authors:
Giorgos Tolias,
Tomas Jenicek,
Ondřej Chum
Abstract:
We propose an efficient method to learn deep local descriptors for instance-level recognition. The training only requires examples of positive and negative image pairs and is performed as metric learning of sum-pooled global image descriptors. At inference, the local descriptors are provided by the activations of internal components of the network. We demonstrate why such an approach learns local…
▽ More
We propose an efficient method to learn deep local descriptors for instance-level recognition. The training only requires examples of positive and negative image pairs and is performed as metric learning of sum-pooled global image descriptors. At inference, the local descriptors are provided by the activations of internal components of the network. We demonstrate why such an approach learns local descriptors that work well for image similarity estimation with classical efficient match kernel methods. The experimental validation studies the trade-off between performance and memory requirements of the state-of-the-art image search approach based on match kernels. Compared to existing local descriptors, the proposed ones perform better in two instance-level recognition tasks and keep memory requirements lower. We experimentally show that global descriptors are not effective enough at large scale and that local descriptors are essential. We achieve state-of-the-art performance, in some cases even with a backbone network as small as ResNet18.
△ Less
Submitted 26 July, 2020;
originally announced July 2020.
-
Graph convolutional networks for learning with few clean and many noisy labels
Authors:
Ahmet Iscen,
Giorgos Tolias,
Yannis Avrithis,
Ondrej Chum,
Cordelia Schmid
Abstract:
In this work we consider the problem of learning a classifier from noisy labels when a few clean labeled examples are given. The structure of clean and noisy data is modeled by a graph per class and Graph Convolutional Networks (GCN) are used to predict class relevance of noisy examples. For each class, the GCN is treated as a binary classifier, which learns to discriminate clean from noisy exampl…
▽ More
In this work we consider the problem of learning a classifier from noisy labels when a few clean labeled examples are given. The structure of clean and noisy data is modeled by a graph per class and Graph Convolutional Networks (GCN) are used to predict class relevance of noisy examples. For each class, the GCN is treated as a binary classifier, which learns to discriminate clean from noisy examples using a weighted binary cross-entropy loss function. The GCN-inferred "clean" probability is then exploited as a relevance measure. Each noisy example is weighted by its relevance when learning a classifier for the end task. We evaluate our method on an extended version of a few-shot learning problem, where the few clean examples of novel classes are supplemented with additional noisy data. Experimental results show that our GCN-based cleaning process significantly improves the classification accuracy over not cleaning the noisy data, as well as standard few-shot classification where only few clean examples are used.
△ Less
Submitted 24 August, 2020; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Targeted Mismatch Adversarial Attack: Query with a Flower to Retrieve the Tower
Authors:
Giorgos Tolias,
Filip Radenovic,
Ondřej Chum
Abstract:
Access to online visual search engines implies sharing of private user content - the query images. We introduce the concept of targeted mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image. The generated image looks nothing like the user intended query, but leads to identical or very similar retrieval results. Transferring attacks to…
▽ More
Access to online visual search engines implies sharing of private user content - the query images. We introduce the concept of targeted mismatch attack for deep learning based retrieval systems to generate an adversarial image to conceal the query image. The generated image looks nothing like the user intended query, but leads to identical or very similar retrieval results. Transferring attacks to fully unseen networks is challenging. We show successful attacks to partially unknown systems, by designing various loss functions for the adversarial image construction. These include loss functions, for example, for unknown global pooling operation or unknown input resolution by the retrieval system. We evaluate the attacks on standard retrieval benchmarks and compare the results retrieved with the original and adversarial image.
△ Less
Submitted 24 August, 2019;
originally announced August 2019.
-
Explicit Spatial Encoding for Deep Local Descriptors
Authors:
Arun Mukundan,
Giorgos Tolias,
Ondrej Chum
Abstract:
We propose a kernelized deep local-patch descriptor based on efficient match kernels of neural network activations. Response of each receptive field is encoded together with its spatial location using explicit feature maps. Two location parametrizations, Cartesian and polar, are used to provide robustness to a different types of canonical patch misalignment. Additionally, we analyze how the conven…
▽ More
We propose a kernelized deep local-patch descriptor based on efficient match kernels of neural network activations. Response of each receptive field is encoded together with its spatial location using explicit feature maps. Two location parametrizations, Cartesian and polar, are used to provide robustness to a different types of canonical patch misalignment. Additionally, we analyze how the conventional architecture, i.e. a fully connected layer attached after the convolutional part, encodes responses in a spatially variant way. In contrary, explicit spatial encoding is used in our descriptor, whose potential applications are not limited to local-patches. We evaluate the descriptor on standard benchmarks. Both versions, encoding 32x32 or 64x64 patches, consistently outperform all other methods on all benchmarks. The number of parameters of the model is independent of the input patch resolution.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
Label Propagation for Deep Semi-supervised Learning
Authors:
Ahmet Iscen,
Giorgos Tolias,
Yannis Avrithis,
Ondrej Chum
Abstract:
Semi-supervised learning is becoming increasingly important because it can combine data carefully labeled by humans with abundant unlabeled data to train deep neural networks. Classic methods on semi-supervised learning that have focused on transductive learning have not been fully exploited in the inductive framework followed by modern deep learning. The same holds for the manifold assumption---t…
▽ More
Semi-supervised learning is becoming increasingly important because it can combine data carefully labeled by humans with abundant unlabeled data to train deep neural networks. Classic methods on semi-supervised learning that have focused on transductive learning have not been fully exploited in the inductive framework followed by modern deep learning. The same holds for the manifold assumption---that similar examples should get the same prediction. In this work, we employ a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network. At the core of the transductive method lies a nearest neighbor graph of the dataset that we create based on the embeddings of the same network.Therefore our learning process iterates between these two steps. We improve performance on several datasets especially in the few labels regime and show that our work is complementary to current state of the art.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Understanding and Improving Kernel Local Descriptors
Authors:
Arun Mukundan,
Giorgos Tolias,
Andrei Bursuc,
Hervé Jégou,
Ondřej Chum
Abstract:
We propose a multiple-kernel local-patch descriptor based on efficient match kernels from pixel gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch mis-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Comb…
▽ More
We propose a multiple-kernel local-patch descriptor based on efficient match kernels from pixel gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch mis-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Combined with whitening of the descriptor space, that is learned with or without supervision, the performance is significantly improved. We analyze the effect of the whitening on patch similarity and demonstrate its semantic meaning. Our unsupervised variant is the best performing descriptor constructed without the need of labeled data. Despite the simplicity of the proposed descriptor, it competes well with deep learning approaches on a number of different tasks.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
Hybrid Diffusion: Spectral-Temporal Graph Filtering for Manifold Ranking
Authors:
Ahmet Iscen,
Yannis Avrithis,
Giorgos Tolias,
Teddy Furon,
Ondrej Chum
Abstract:
State of the art image retrieval performance is achieved with CNN features and manifold ranking using a k-NN similarity graph that is pre-computed off-line. The two most successful existing approaches are temporal filtering, where manifold ranking amounts to solving a sparse linear system online, and spectral filtering, where eigen-decomposition of the adjacency matrix is performed off-line and th…
▽ More
State of the art image retrieval performance is achieved with CNN features and manifold ranking using a k-NN similarity graph that is pre-computed off-line. The two most successful existing approaches are temporal filtering, where manifold ranking amounts to solving a sparse linear system online, and spectral filtering, where eigen-decomposition of the adjacency matrix is performed off-line and then manifold ranking amounts to dot-product search online. The former suffers from expensive queries and the latter from significant space overhead. Here we introduce a novel, theoretically well-founded hybrid filtering approach allowing full control of the space-time trade-off between these two extremes. Experimentally, we verify that our hybrid method delivers results on par with the state of the art, with lower memory demands compared to spectral filtering approaches and faster compared to temporal filtering.
△ Less
Submitted 22 November, 2018; v1 submitted 23 July, 2018;
originally announced July 2018.
-
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
Authors:
Filip Radenović,
Ahmet Iscen,
Giorgos Tolias,
Yannis Avrithis,
Ondřej Chum
Abstract:
In this paper we address issues with image retrieval benchmarking on standard and popular Oxford 5k and Paris 6k datasets. In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth. Three new protocols of varying difficulty are introduced. The protoc…
▽ More
In this paper we address issues with image retrieval benchmarking on standard and popular Oxford 5k and Paris 6k datasets. In particular, annotation errors, the size of the dataset, and the level of challenge are addressed: new annotation for both datasets is created with an extra attention to the reliability of the ground truth. Three new protocols of varying difficulty are introduced. The protocols allow fair comparison between different methods, including those using a dataset pre-processing stage. For each dataset, 15 new challenging queries are introduced. Finally, a new set of 1M hard, semi-automatically cleaned distractors is selected.
An extensive comparison of the state-of-the-art methods is performed on the new benchmark. Different types of methods are evaluated, ranging from local-feature-based to modern CNN based methods. The best results are achieved by taking the best of the two worlds. Most importantly, image retrieval appears far from being solved.
△ Less
Submitted 29 March, 2018;
originally announced March 2018.
-
Mining on Manifolds: Metric Learning without Labels
Authors:
Ahmet Iscen,
Giorgos Tolias,
Yannis Avrithis,
Ondrej Chum
Abstract:
In this work we present a novel unsupervised framework for hard training example mining. The only input to the method is a collection of images relevant to the target application and a meaningful initial representation, provided e.g. by pre-trained CNN. Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds. Both types of example…
▽ More
In this work we present a novel unsupervised framework for hard training example mining. The only input to the method is a collection of images relevant to the target application and a meaningful initial representation, provided e.g. by pre-trained CNN. Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds. Both types of examples are revealed by disagreements between Euclidean and manifold similarities. The discovered examples can be used in training with any discriminative loss. The method is applied to unsupervised fine-tuning of pre-trained networks for fine-grained classification and particular object retrieval. Our models are on par or are outperforming prior models that are fully or partially supervised.
△ Less
Submitted 29 March, 2018;
originally announced March 2018.
-
Fine-tuning CNN Image Retrieval with No Human Annotation
Authors:
Filip Radenović,
Giorgos Tolias,
Ondřej Chum
Abstract:
Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs…
▽ More
Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.
△ Less
Submitted 10 July, 2018; v1 submitted 3 November, 2017;
originally announced November 2017.
-
Unsupervised object discovery for instance recognition
Authors:
Oriane Siméoni,
Ahmet Iscen,
Giorgos Tolias,
Yannis Avrithis,
Ondrej Chum
Abstract:
Severe background clutter is challenging in many computer vision tasks, including large-scale image retrieval. Global descriptors, that are popular due to their memory and search efficiency, are especially prone to corruption by such a clutter. Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually r…
▽ More
Severe background clutter is challenging in many computer vision tasks, including large-scale image retrieval. Global descriptors, that are popular due to their memory and search efficiency, are especially prone to corruption by such a clutter. Eliminating the impact of the clutter on the image descriptor increases the chance of retrieving relevant images and prevents topic drift due to actually retrieving the clutter in the case of query expansion. In this work, we propose a novel salient region detection method. It captures, in an unsupervised manner, patterns that are both discriminative and common in the dataset. Saliency is based on a centrality measure of a nearest neighbor graph constructed from regional CNN representations of dataset images. The descriptors derived from the salient regions improve particular object retrieval, most noticeably in a large collections containing small objects.
△ Less
Submitted 24 January, 2018; v1 submitted 14 September, 2017;
originally announced September 2017.
-
Deep Shape Matching
Authors:
Filip Radenović,
Giorgos Tolias,
Ondřej Chum
Abstract:
We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is…
▽ More
We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.
△ Less
Submitted 25 July, 2018; v1 submitted 11 September, 2017;
originally announced September 2017.
-
Multiple-Kernel Local-Patch Descriptor
Authors:
Arun Mukundan,
Giorgos Tolias,
Ondrej Chum
Abstract:
We propose a multiple-kernel local-patch descriptor based on efficient match kernels of patch gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch miss-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Even…
▽ More
We propose a multiple-kernel local-patch descriptor based on efficient match kernels of patch gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch miss-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Even though handcrafted, the proposed method consistently outperforms the state-of-the-art methods on two local patch benchmarks.
△ Less
Submitted 25 July, 2017;
originally announced July 2017.
-
Panorama to panorama matching for location recognition
Authors:
Ahmet Iscen,
Giorgos Tolias,
Yannis Avrithis,
Teddy Furon,
Ondrej Chum
Abstract:
Location recognition is commonly treated as visual instance retrieval on "street view" imagery. The dataset items and queries are panoramic views, i.e. groups of images taken at a single location. This work introduces a novel panorama-to-panorama matching process, either by aggregating features of individual images in a group or by explicitly constructing a larger panorama. In either case, multipl…
▽ More
Location recognition is commonly treated as visual instance retrieval on "street view" imagery. The dataset items and queries are panoramic views, i.e. groups of images taken at a single location. This work introduces a novel panorama-to-panorama matching process, either by aggregating features of individual images in a group or by explicitly constructing a larger panorama. In either case, multiple views are used as queries. We reach near perfect location recognition on a standard benchmark with only four query views.
△ Less
Submitted 21 April, 2017;
originally announced April 2017.
-
Asymmetric Feature Maps with Application to Sketch Based Retrieval
Authors:
Giorgos Tolias,
Ondřej Chum
Abstract:
We propose a novel concept of asymmetric feature maps (AFM), which allows to evaluate multiple kernels between a query and database entries without increasing the memory requirements. To demonstrate the advantages of the AFM method, we derive a short vector image representation that, due to asymmetric feature maps, supports efficient scale and translation invariant sketch-based image retrieval. Un…
▽ More
We propose a novel concept of asymmetric feature maps (AFM), which allows to evaluate multiple kernels between a query and database entries without increasing the memory requirements. To demonstrate the advantages of the AFM method, we derive a short vector image representation that, due to asymmetric feature maps, supports efficient scale and translation invariant sketch-based image retrieval. Unlike most of the short-code based retrieval systems, the proposed method provides the query localization in the retrieved image. The efficiency of the search is boosted by approximating a 2D translation search via trigonometric polynomial of scores by 1D projections. The projections are a special case of AFM. An order of magnitude speed-up is achieved compared to traditional trigonometric polynomials. The results are boosted by an image-based average query expansion, exceeding significantly the state of the art on standard benchmarks.
△ Less
Submitted 12 April, 2017;
originally announced April 2017.
-
Fast Spectral Ranking for Similarity Search
Authors:
Ahmet Iscen,
Yannis Avrithis,
Giorgos Tolias,
Teddy Furon,
Ondrej Chum
Abstract:
Despite the success of deep learning on representing images for particular object retrieval, recent studies show that the learned representations still lie on manifolds in a high dimensional space. This makes the Euclidean nearest neighbor search biased for this task. Exploring the manifolds online remains expensive even if a nearest neighbor graph has been computed offline. This work introduces a…
▽ More
Despite the success of deep learning on representing images for particular object retrieval, recent studies show that the learned representations still lie on manifolds in a high dimensional space. This makes the Euclidean nearest neighbor search biased for this task. Exploring the manifolds online remains expensive even if a nearest neighbor graph has been computed offline. This work introduces an explicit embedding reducing manifold search to Euclidean search followed by dot product similarity search. This is equivalent to linear graph filtering of a sparse signal in the frequency domain. To speed up online search, we compute an approximate Fourier basis of the graph offline. We improve the state of art on particular object retrieval datasets including the challenging Instre dataset containing small objects. At a scale of 10^5 images, the offline cost is only a few hours, while query time is comparable to standard similarity search.
△ Less
Submitted 29 March, 2018; v1 submitted 20 March, 2017;
originally announced March 2017.
-
Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations
Authors:
Ahmet Iscen,
Giorgos Tolias,
Yannis Avrithis,
Teddy Furon,
Ondrej Chum
Abstract:
Query expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlap** image regions rather than on a global image descriptor like i…
▽ More
Query expansion is a popular method to improve the quality of image retrieval with both conventional and CNN representations. It has been so far limited to global image similarity. This work focuses on diffusion, a mechanism that captures the image manifold in the feature space. The diffusion is carried out on descriptors of overlap** image regions rather than on a global image descriptor like in previous approaches. An efficient off-line stage allows optional reduction in the number of stored regions. In the on-line stage, the proposed handling of unseen queries in the indexing stage removes additional computation to adjust the precomputed data. We perform diffusion through a sparse linear system solver, yielding practical query times well below one second. Experimentally, we observe a significant boost in performance of image retrieval with compact CNN descriptors on standard benchmarks, especially when the query object covers only a small part of the image. Small objects have been a common failure case of CNN-based retrieval.
△ Less
Submitted 1 July, 2019; v1 submitted 15 November, 2016;
originally announced November 2016.
-
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples
Authors:
Filip Radenović,
Giorgos Tolias,
Ondřej Chum
Abstract:
Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We emplo…
▽ More
Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.
△ Less
Submitted 7 September, 2016; v1 submitted 8 April, 2016;
originally announced April 2016.
-
Particular object retrieval with integral max-pooling of CNN activations
Authors:
Giorgos Tolias,
Ronan Sicre,
Hervé Jégou
Abstract:
Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on…
▽ More
Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets.
△ Less
Submitted 24 February, 2016; v1 submitted 18 November, 2015;
originally announced November 2015.
-
A comparison of dense region detectors for image search and fine-grained classification
Authors:
Ahmet Iscen,
Giorgos Tolias,
Philippe-Henri Gosselin,
Hervé Jégou
Abstract:
We consider a pipeline for image classification or search based on coding approaches like Bag of Words or Fisher vectors. In this context, the most common approach is to extract the image patches regularly in a dense manner on several scales. This paper proposes and evaluates alternative choices to extract patches densely. Beyond simple strategies derived from regular interest region detectors, we…
▽ More
We consider a pipeline for image classification or search based on coding approaches like Bag of Words or Fisher vectors. In this context, the most common approach is to extract the image patches regularly in a dense manner on several scales. This paper proposes and evaluates alternative choices to extract patches densely. Beyond simple strategies derived from regular interest region detectors, we propose approaches based on super-pixels, edges, and a bank of Zernike filters used as detectors. The different approaches are evaluated on recent image retrieval and fine-grain classification benchmarks. Our results show that the regular dense detector is outperformed by other methods in most situations, leading us to improve the state of the art in comparable setups on standard retrieval and fined-grain benchmarks. As a byproduct of our study, we show that existing methods for blob and super-pixel extraction achieve high accuracy if the patches are extracted along the edges and not around the detected regions.
△ Less
Submitted 17 April, 2015; v1 submitted 29 October, 2014;
originally announced October 2014.
-
Orientation covariant aggregation of local descriptors with embeddings
Authors:
Giorgos Tolias,
Teddy Furon,
Hervé Jégou
Abstract:
Image search systems based on local descriptors typically achieve orientation invariance by aligning the patches on their dominant orientations. Albeit successful, this choice introduces too much invariance because it does not guarantee that the patches are rotated consistently. This paper introduces an aggregation strategy of local descriptors that achieves this covariance property by jointly enc…
▽ More
Image search systems based on local descriptors typically achieve orientation invariance by aligning the patches on their dominant orientations. Albeit successful, this choice introduces too much invariance because it does not guarantee that the patches are rotated consistently. This paper introduces an aggregation strategy of local descriptors that achieves this covariance property by jointly encoding the angle in the aggregation stage in a continuous manner. It is combined with an efficient monomial embedding to provide a codebook-free method to aggregate local descriptors into a single vector representation. Our strategy is also compatible and employed with several popular encoding methods, in particular bag-of-words, VLAD and the Fisher vector. Our geometric-aware aggregation strategy is effective for image search, as shown by experiments performed on standard benchmarks for image and particular object retrieval, namely Holidays and Oxford buildings.
△ Less
Submitted 25 November, 2014; v1 submitted 8 July, 2014;
originally announced July 2014.
-
Hereditarily Indecomposable Banach algebras of diagonal operators
Authors:
Spiros A. Argyros,
Irene Deliyanni,
Andreas G. Tolias
Abstract:
We provide a characterization of the Banach spaces $X$ with a Schauder basis $(e_n)_{n\in\mathbb{N}}$ which have the property that the dual space $X^*$ is naturally isomorphic to the space $\mathcal{L}_{diag}(X)$ of diagonal operators with respect to $(e_n)_{n\in\mathbb{N}}$ . We also construct a Hereditarily Indecomposable Banach space ${\mathfrak X}_D$ with a Schauder basis…
▽ More
We provide a characterization of the Banach spaces $X$ with a Schauder basis $(e_n)_{n\in\mathbb{N}}$ which have the property that the dual space $X^*$ is naturally isomorphic to the space $\mathcal{L}_{diag}(X)$ of diagonal operators with respect to $(e_n)_{n\in\mathbb{N}}$ . We also construct a Hereditarily Indecomposable Banach space ${\mathfrak X}_D$ with a Schauder basis $(e_n)_{n\in\mathbb{N}}$ such that ${\mathfrak X}^*_D$ is isometric to $\mathcal{L}_{diag}({\mathfrak X}_D)$ with these Banach algebras being Hereditarily Indecomposable. Finally, we show that every $T\in \mathcal{L}_{diag}({\mathfrak X}_D)$ is of the form $T=λI+K$, where $K$ is a compact operator.
△ Less
Submitted 10 February, 2009;
originally announced February 2009.
-
Saturated extensions, the attractors method and Hereditarily James Tree Space
Authors:
Spiros A. Argyros,
Alexander D. Arvanitakis,
Andreas G. Tolias
Abstract:
In the present work we provide a variety of examples of HI Banach spaces containing no reflexive subspace and we study the structure of their duals as well as the spaces of their linear bounded operators. Our approach is based on saturated extensions of ground sets and the method of attractors.
In the present work we provide a variety of examples of HI Banach spaces containing no reflexive subspace and we study the structure of their duals as well as the spaces of their linear bounded operators. Our approach is based on saturated extensions of ground sets and the method of attractors.
△ Less
Submitted 15 July, 2008;
originally announced July 2008.
-
Strictly singular non-compact diagonal operators on HI spaces
Authors:
Spiros A. Argyros,
Irene Deliyanni,
Andreas G. Tolias
Abstract:
We construct a Hereditarily Indecomposable Banach space $\eqs_d$ with a Schauder basis \seq{e}{n} on which there exist strictly singular non-compact diagonal operators. Moreover, the space $\mc{L}_{\diag}(\eqs_d)$ of diagonal operators with respect to the basis \seq{e}{n} contains an isomorphic copy of $\ell_{\infty}(\N)$.
We construct a Hereditarily Indecomposable Banach space $\eqs_d$ with a Schauder basis \seq{e}{n} on which there exist strictly singular non-compact diagonal operators. Moreover, the space $\mc{L}_{\diag}(\eqs_d)$ of diagonal operators with respect to the basis \seq{e}{n} contains an isomorphic copy of $\ell_{\infty}(\N)$.
△ Less
Submitted 15 July, 2008;
originally announced July 2008.