Skip to main content

Showing 1–50 of 64 results for author: Jegou, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15613  [pdf, other

    cs.LG cs.AI cs.CV

    Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

    Authors: Huy V. Vo, Vasil Khalidov, Timothée Darcet, Théo Moutakanni, Nikita Smetanin, Marc Szafraniec, Hugo Touvron, Camille Couprie, Maxime Oquab, Armand Joulin, Hervé Jégou, Patrick Labatut, Piotr Bojanowski

    Abstract: Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some limitations similar to those encountered in supervised learning, e.g., the crowd-sourced selection of data is costly and time-consuming, preventing scaling the datas… ▽ More

    Submitted 28 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  2. arXiv:2401.08281  [pdf, other

    cs.LG cs.CV cs.SE

    The Faiss library

    Authors: Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, Hervé Jégou

    Abstract: Vector databases manage large collections of embedding vectors. As AI applications are growing rapidly, so are the number of embeddings that need to be stored and indexed. The Faiss library is dedicated to vector similarity search, a core functionality of vector databases. Faiss is a toolkit of indexing methods and related primitives used to search, cluster, compress and transform vectors. This pa… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  3. arXiv:2306.00802  [pdf, other

    stat.ML cs.CL cs.LG

    Birth of a Transformer: A Memory Viewpoint

    Authors: Alberto Bietti, Vivien Cabannes, Diane Bouchacourt, Herve Jegou, Leon Bottou

    Abstract: Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We stu… ▽ More

    Submitted 6 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  4. arXiv:2304.07193  [pdf, other

    cs.CV

    DINOv2: Learning Robust Visual Features without Supervision

    Authors: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin , et al. (1 additional authors not shown)

    Abstract: The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pr… ▽ More

    Submitted 2 February, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  5. arXiv:2303.15435  [pdf, other

    cs.CV cs.AI

    The Stable Signature: Rooting Watermarks in Latent Diffusion Models

    Authors: Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, Teddy Furon

    Abstract: Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the… ▽ More

    Submitted 26 July, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: Published at ICCV 2023. Code at https://github.com/facebookresearch/stable_signature - webpage at https://pierrefdz.github.io/publications/stablesignature

  6. arXiv:2301.11189  [pdf, other

    eess.IV cs.AI cs.CV cs.IT

    Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models

    Authors: Matthew J. Muckley, Alaaeldin El-Nouby, Karen Ullrich, Hervé Jégou, Jakob Verbeek

    Abstract: Lossy image compression aims to represent images in as few bits as possible while maintaining fidelity to the original. Theoretical results indicate that optimizing distortion metrics such as PSNR or MS-SSIM necessarily leads to a discrepancy in the statistics of original images from those of reconstructions, in particular at low bitrates, often manifested by the blurring of the compressed images.… ▽ More

    Submitted 10 August, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: Upload camera-ready to arXiv. Official version available at https://proceedings.mlr.press/v202/muckley23a.html

    Journal ref: Proceedings of the 40th International Conference on Machine Learning (2023) 25426-25443

  7. arXiv:2212.07372  [pdf, other

    cs.CV eess.IV

    Image Compression with Product Quantized Masked Image Modeling

    Authors: Alaaeldin El-Nouby, Matthew J. Muckley, Karen Ullrich, Ivan Laptev, Jakob Verbeek, Hervé Jégou

    Abstract: Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed. In this work, we attempt to bring these lines of research closer by revisiting vector quanti… ▽ More

    Submitted 6 November, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

  8. arXiv:2212.04884  [pdf, other

    cs.CV

    Co-training $2^L$ Submodels for Visual Recognition

    Authors: Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob Verbeek, Hervé Jégou

    Abstract: We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the reg… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  9. arXiv:2210.10620  [pdf, other

    cs.IR cs.AI cs.CV cs.LG

    Active Image Indexing

    Authors: Pierre Fernandez, Matthijs Douze, Hervé Jégou, Teddy Furon

    Abstract: Image copy detection and retrieval from large databases leverage two components. First, a neural network maps an image to a vector representation, that is relatively robust to various transformations of the image. Second, an efficient but approximate similarity search algorithm trades scalability (size and speed) against quality of the search, thereby introducing a source of error. This paper impr… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  10. arXiv:2204.07118  [pdf, other

    cs.CV

    DeiT III: Revenge of the ViT

    Authors: Hugo Touvron, Matthieu Cord, Hervé Jégou

    Abstract: A Vision Transformer (ViT) is a simple neural architecture amenable to serve several computer vision tasks. It has limited built-in architectural priors, in contrast to more recent architectures that incorporate priors either about the input data or of specific tasks. Recent works show that ViTs benefit from self-supervised pre-training, in particular BerT-like pre-training like BeiT. In this pape… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  11. arXiv:2203.09795  [pdf, other

    cs.CV

    Three things everyone should know about Vision Transformers

    Authors: Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek, Hervé Jégou

    Abstract: After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and video analysis. We offer three insights based on simple and easy to implement variants of vision transformers. (1) The residual layers of vision transformers, wh… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  12. arXiv:2112.13692  [pdf, other

    cs.CV

    Augmenting Convolutional networks with attention-based aggregation

    Authors: Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve, Hervé Jégou

    Abstract: We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning. We replace the final average pooling by an attention-based aggregation layer akin to a single transformer block, that weights how the patches are involved in the classification decision. We plug this learned aggregation layer with a simplistic patch-based convolutional network parame… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

  13. arXiv:2112.10740  [pdf, other

    cs.CV

    Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

    Authors: Alaaeldin El-Nouby, Gautier Izacard, Hugo Touvron, Ivan Laptev, Hervé Jegou, Edouard Grave

    Abstract: Pre-training models on large scale datasets, like ImageNet, is a standard practice in computer vision. This paradigm is especially effective for tasks with small training sets, for which high-capacity models tend to overfit. In this work, we consider a self-supervised pre-training scenario that only leverages the target task data. We consider datasets, like Stanford Cars, Sketch or COCO, which are… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

  14. arXiv:2112.09581  [pdf, other

    cs.CV cs.LG

    Watermarking Images in Self-Supervised Latent Spaces

    Authors: Pierre Fernandez, Alexandre Sablayrolles, Teddy Furon, Hervé Jégou, Matthijs Douze

    Abstract: We revisit watermarking techniques based on pre-trained deep networks, in the light of self-supervised approaches. We present a way to embed both marks and binary messages into their latent spaces, leveraging data augmentation at marking time. Our method can operate at any resolution and creates watermarks robust to a broad range of transformations (rotations, crops, JPEG, contrast, etc). It signi… ▽ More

    Submitted 23 March, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  15. arXiv:2112.09568  [pdf, other

    cs.CV cs.LG

    Nearest neighbor search with compact codes: A decoder perspective

    Authors: Kenza Amara, Matthijs Douze, Alexandre Sablayrolles, Hervé Jégou

    Abstract: Modern approaches for fast retrieval of similar vectors on billion-scaled datasets rely on compressed-domain approaches such as binary sketches or product quantization. These methods minimize a certain loss, typically the mean squared error or other objective functions tailored to the retrieval problem. In this paper, we re-interpret popular methods such as binary hashing or product quantizers as… ▽ More

    Submitted 21 February, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  16. arXiv:2110.00476  [pdf, other

    cs.CV cs.LG

    ResNet strikes back: An improved training procedure in timm

    Authors: Ross Wightman, Hugo Touvron, Hervé Jégou

    Abstract: The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimi… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

  17. arXiv:2106.09681  [pdf, other

    cs.CV cs.LG

    XCiT: Cross-Covariance Image Transformers

    Authors: Alaaeldin El-Nouby, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Hervé Jegou

    Abstract: Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying transformers yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of image data beyond the local interactions of convolutions. This flexibility, however, comes with a quadratic comple… ▽ More

    Submitted 18 June, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  18. arXiv:2105.03404  [pdf, other

    cs.CV

    ResMLP: Feedforward networks for image classification with data-efficient training

    Authors: Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Gautier Izacard, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou

    Abstract: We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using hea… ▽ More

    Submitted 10 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  19. arXiv:2104.14294  [pdf, other

    cs.CV

    Emerging Properties in Self-Supervised Vision Transformers

    Authors: Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin

    Abstract: In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentatio… ▽ More

    Submitted 24 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: 21 pages

  20. arXiv:2104.13733  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Gradient-based Adversarial Attacks against Text Transformers

    Authors: Chuan Guo, Alexandre Sablayrolles, Hervé Jégou, Douwe Kiela

    Abstract: We propose the first general-purpose gradient-based attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix, hence enabling gradient-based optimization. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of nat… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  21. arXiv:2104.01136  [pdf, other

    cs.CV

    LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

    Authors: Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze

    Abstract: We design a family of image classification architectures that optimize the trade-off between accuracy and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures, which are competitive on highly parallel processing hardware. We revisit principles from the extensive literature on convolutional neural networks to apply them to transformers, in particular… ▽ More

    Submitted 6 May, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  22. arXiv:2103.17239  [pdf, other

    cs.CV

    Going deeper with Image Transformers

    Authors: Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou

    Abstract: Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this work, we build and optimize deeper transformer networks for image classification. In particular, we investigate the interplay of architecture and opt… ▽ More

    Submitted 7 April, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

  23. arXiv:2102.05644  [pdf, other

    cs.CV

    Training Vision Transformers for Image Retrieval

    Authors: Alaaeldin El-Nouby, Natalia Neverova, Ivan Laptev, Hervé Jégou

    Abstract: Transformers have shown outstanding results for natural language understanding and, more recently, for image classification. We here extend this work and propose a transformer-based approach for image retrieval: we adopt vision transformers for generating image descriptors and train the resulting model with a metric learning objective, which combines a contrastive loss with a differential entropy… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

  24. arXiv:2012.12877  [pdf, other

    cs.CV

    Training data-efficient image transformers & distillation through attention

    Authors: Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou

    Abstract: Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. However, these visual transformers are pre-trained with hundreds of millions of images using an expensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer by training on Imagenet only. We train them o… ▽ More

    Submitted 15 January, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

  25. arXiv:2011.12982  [pdf, other

    cs.CV

    Grafit: Learning fine-grained image representations with coarse labels

    Authors: Hugo Touvron, Alexandre Sablayrolles, Matthijs Douze, Matthieu Cord, Hervé Jégou

    Abstract: This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlyi… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  26. arXiv:2008.05763  [pdf, other

    cs.CV eess.IV

    Powers of layers for image-to-image translation

    Authors: Hugo Touvron, Matthijs Douze, Matthieu Cord, Hervé Jégou

    Abstract: We propose a simple architecture to address unpaired image-to-image translation tasks: style or class transfer, denoising, deblurring, deblocking, etc. We start from an image autoencoder architecture with fixed weights. For each task we learn a residual block operating in the latent space, which is iteratively called until the target domain is reached. A specific training schedule is required to a… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  27. arXiv:2004.07320  [pdf, other

    cs.LG stat.ML

    Training with Quantization Noise for Extreme Model Compression

    Authors: Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand Joulin

    Abstract: We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression… ▽ More

    Submitted 28 February, 2021; v1 submitted 15 April, 2020; originally announced April 2020.

  28. arXiv:2003.08237  [pdf, other

    cs.CV cs.LG

    Fixing the train-test resolution discrepancy: FixEfficientNet

    Authors: Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

    Abstract: This paper provides an extensive analysis of the performance of the EfficientNet image classifiers with several recent training procedures, in particular one that corrects the discrepancy between train and test images. The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters. For instance, our FixEfficientNet-B0 trained… ▽ More

    Submitted 18 November, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

  29. arXiv:2002.00937  [pdf, other

    stat.ML cs.CR cs.CV cs.LG

    Radioactive data: tracing through training

    Authors: Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

    Abstract: We want to detect whether a particular image dataset has been used to train a model. We propose a new technique, \emph{radioactive data}, that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark. The mark is robust to strong variations such as different architectures or optimization methods. Given a trained model, our technique detects the u… ▽ More

    Submitted 3 February, 2020; originally announced February 2020.

  30. arXiv:1908.11229  [pdf, other

    stat.ML cs.CR cs.LG

    White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

    Authors: Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, Hervé Jégou

    Abstract: Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set. In this paper, we derive the optimal strategy for membership inference with a few assumptions on the distribution of the parameters. We show that optimal attacks only depend on the loss function, and thus black-box attacks are as good as white-box att… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  31. arXiv:1907.05686  [pdf, other

    cs.CV

    And the Bit Goes Down: Revisiting the Quantization of Neural Networks

    Authors: Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou

    Abstract: In this paper, we address the problem of reducing the memory footprint of convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs rather than its weights. The principle of our approach is that it minimizes the loss reconstruction error for in-domain inputs. Our method only requires a set of unla… ▽ More

    Submitted 9 November, 2020; v1 submitted 12 July, 2019; originally announced July 2019.

    Comments: ICLR 2020 camera-ready

  32. arXiv:1907.05242  [pdf, other

    cs.CL cs.LG

    Large Memory Layers with Product Keys

    Authors: Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

    Abstract: This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase th… ▽ More

    Submitted 15 December, 2019; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Advances in Neural Information Processing Systems, 2019

  33. arXiv:1907.01470  [pdf, other

    cs.LG cs.CL stat.ML

    Augmenting Self-attention with Persistent Memory

    Authors: Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

    Abstract: Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long term dependencies and are often regarded as the key ingredient in the success of Transformers. Building upon this intuition, we propose a new model that solely… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

  34. arXiv:1906.06423  [pdf, other

    cs.CV cs.LG

    Fixing the train-test resolution discrepancy

    Authors: Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou

    Abstract: Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. We experimentally validate that, for a target test resolution, using a lower train resolution offers better classification at test time. We t… ▽ More

    Submitted 20 January, 2022; v1 submitted 14 June, 2019; originally announced June 2019.

  35. arXiv:1905.00546  [pdf, other

    cs.CV

    Billion-scale semi-supervised learning for image classification

    Authors: I. Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, Dhruv Mahajan

    Abstract: This paper presents a study of semi-supervised learning with large convolutional networks. We propose a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images (up to 1 billion). Our main goal is to improve the performance for a given target architecture, like ResNet-50 or ResNext. We provide an extensive analysis of the success factors of our approach… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  36. arXiv:1902.10416  [pdf, other

    cs.CV cs.LG

    Equi-normalization of Neural Networks

    Authors: Pierre Stock, Benjamin Graham, Rémi Gribonval, Hervé Jégou

    Abstract: Modern neural networks are over-parametrized. In particular, each rectified linear hidden unit can be modified by a multiplicative factor by adjusting input and output weights, without changing the rest of the network. Inspired by the Sinkhorn-Knopp algorithm, we introduce a fast iterative method for minimizing the L2 norm of the weights, equivalently the weight decay regularizer. It provably conv… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

    Comments: ICLR 2019 camera-ready

  37. arXiv:1902.05509  [pdf, other

    cs.CV

    MultiGrain: a unified image embedding for classes and instances

    Authors: Maxim Berman, Hervé Jégou, Andrea Vedaldi, Iasonas Kokkinos, Matthijs Douze

    Abstract: MultiGrain is a network architecture producing compact vector representations that are suited both for image classification and particular object retrieval. It builds on a standard classification trunk. The top of the network produces an embedding containing coarse and fine-grained information, so that images can be recognized based on the object class, particular object, or if they are distorted… ▽ More

    Submitted 3 April, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

  38. arXiv:1811.11147  [pdf, other

    cs.CV

    Understanding and Improving Kernel Local Descriptors

    Authors: Arun Mukundan, Giorgos Tolias, Andrei Bursuc, Hervé Jégou, Ondřej Chum

    Abstract: We propose a multiple-kernel local-patch descriptor based on efficient match kernels from pixel gradients. It combines two parametrizations of gradient position and direction, each parametrization provides robustness to a different type of patch mis-registration: polar parametrization for noise in the patch dominant orientation detection, Cartesian for imprecise location of the feature point. Comb… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

  39. arXiv:1809.06396  [pdf, other

    cs.CV

    Déjà Vu: an empirical evaluation of the memorization properties of ConvNets

    Authors: Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

    Abstract: Convolutional neural networks memorize part of their training data, which is why strategies such as data augmentation and drop-out are employed to mitigate overfitting. This paper considers the related question of "membership inference", where the goal is to determine if an image was used during training. We consider it under three complementary angles. We show how to detect which dataset was used… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

  40. arXiv:1806.03198  [pdf, other

    stat.ML cs.LG

    Spreading vectors for similarity search

    Authors: Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, Hervé Jégou

    Abstract: Discretizing multi-dimensional data distributions is a fundamental step of modern indexing methods. State-of-the-art techniques learn parameters of quantizers on training data for optimal performance, thus adapting quantizers to the data. In this work, we propose to reverse this paradigm and adapt the data to the quantizer: we train a neural net which last layer forms a fixed parameter-free quanti… ▽ More

    Submitted 30 August, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

    Comments: Published at ICLR 2019

  41. arXiv:1804.09996  [pdf, other

    cs.CV cs.DB cs.DS cs.IR

    Link and code: Fast indexing with graphs and compact regression codes

    Authors: Matthijs Douze, Alexandre Sablayrolles, Hervé Jégou

    Abstract: Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements. In this paper, we revisit these approaches by considering, additionally, the memory constraint required to index billions of images on a single server. This leads us to propose a method based both on graph traversal and compact representations. We en… ▽ More

    Submitted 27 April, 2018; v1 submitted 26 April, 2018; originally announced April 2018.

  42. arXiv:1804.07745  [pdf, other

    cs.CL cs.LG

    Loss in Translation: Learning Bilingual Word Map** with a Retrieval Criterion

    Authors: Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave

    Abstract: Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space. Existing works typically solve a least-square regression problem to learn a rotation aligning a small bilingual lexicon, and use a retrieval criterion for inference. In this paper, we propose an unified formulation that directly optimizes a retrieval crit… ▽ More

    Submitted 5 September, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

  43. arXiv:1710.04087  [pdf, other

    cs.CL

    Word Translation Without Parallel Data

    Authors: Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

    Abstract: State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a comm… ▽ More

    Submitted 30 January, 2018; v1 submitted 11 October, 2017; originally announced October 2017.

    Comments: ICLR 2018

  44. arXiv:1708.02898  [pdf, other

    cs.CV cs.MM

    An evaluation of large-scale methods for image instance and class discovery

    Authors: Matthijs Douze, Hervé Jégou, Jeff Johnson

    Abstract: This paper aims at discovering meaningful subsets of related images from large image collections without annotations. We search groups of images related at different levels of semantic, i.e., either instances or visual classes. While k-means is usually considered as the gold standard for this task, we evaluate and show the interest of diffusion methods that have been neglected by the state of the… ▽ More

    Submitted 9 August, 2017; originally announced August 2017.

    Comments: Published at ACM Multimedia workshop

  45. arXiv:1706.02332  [pdf, other

    cs.CV cs.LG stat.ML

    Low-shot learning with large-scale diffusion

    Authors: Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou

    Abstract: This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based… ▽ More

    Submitted 15 June, 2018; v1 submitted 7 June, 2017; originally announced June 2017.

  46. arXiv:1702.08734  [pdf, other

    cs.CV cs.DB cs.DS cs.IR

    Billion-scale similarity search with GPUs

    Authors: Jeff Johnson, Matthijs Douze, Hervé Jégou

    Abstract: Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less para… ▽ More

    Submitted 28 February, 2017; originally announced February 2017.

  47. arXiv:1612.03651  [pdf, other

    cs.CL cs.LG

    FastText.zip: Compressing text classification models

    Authors: Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, Tomas Mikolov

    Abstract: We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory. After considering different solutions inspired by the hashing literature, we propose a method built upon product quantization to store word embeddings. While the original technique leads to a loss in accuracy, we adapt this method to circumvent quantizati… ▽ More

    Submitted 12 December, 2016; originally announced December 2016.

    Comments: Submitted to ICLR 2017

  48. arXiv:1611.08194  [pdf, other

    cs.CV

    Interferences in match kernels

    Authors: Naila Murray, Hervé Jégou, Florent Perronnin, Andrew Zisserman

    Abstract: We consider the design of an image representation that embeds and aggregates a set of local descriptors into a single vector. Popular representations of this kind include the bag-of-visual-words, the Fisher vector and the VLAD. When two such image representations are compared with the dot-product, the image-to-image similarity can be interpreted as a match kernel. In match kernels, one has to deal… ▽ More

    Submitted 24 November, 2016; originally announced November 2016.

    Comments: Accepted as regular paper in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  49. arXiv:1609.06753  [pdf, other

    cs.CV

    How should we evaluate supervised hashing?

    Authors: Alexandre Sablayrolles, Matthijs Douze, Hervé Jégou, Nicolas Usunier

    Abstract: Hashing produces compact representations for documents, to perform tasks like classification or retrieval based on these short codes. When hashing is supervised, the codes are trained using labels on the training data. This paper first shows that the evaluation protocols used in the literature for supervised hashing are not satisfactory: we show that a trivial solution that encodes the output of a… ▽ More

    Submitted 10 August, 2017; v1 submitted 21 September, 2016; originally announced September 2016.

  50. arXiv:1609.04309  [pdf, other

    cs.CL cs.LG

    Efficient softmax approximation for GPUs

    Authors: Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou

    Abstract: We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by… ▽ More

    Submitted 19 June, 2017; v1 submitted 14 September, 2016; originally announced September 2016.

    Comments: Accepted to ICML 2017