Search | arXiv e-print repository

arXiv:2405.20324 [pdf, other]

Don't drop your samples! Coherence-aware training benefits Conditional diffusion

Authors: Nicolas Dufour, Victor Besnier, Vicky Kalogeiton, David Picard

Abstract: Conditional diffusion models are powerful generative models that can leverage various types of conditional information, such as class labels, segmentation masks, or text captions. However, in many real-world scenarios, conditional information may be noisy or unreliable due to human annotation errors or weak alignment. In this paper, we propose the Coherence-Aware Diffusion (CAD), a novel method th… ▽ More Conditional diffusion models are powerful generative models that can leverage various types of conditional information, such as class labels, segmentation masks, or text captions. However, in many real-world scenarios, conditional information may be noisy or unreliable due to human annotation errors or weak alignment. In this paper, we propose the Coherence-Aware Diffusion (CAD), a novel method that integrates coherence in conditional information into diffusion models, allowing them to learn from noisy annotations without discarding data. We assume that each data point has an associated coherence score that reflects the quality of the conditional information. We then condition the diffusion model on both the conditional information and the coherence score. In this way, the model learns to ignore or discount the conditioning when the coherence is low. We show that CAD is theoretically sound and empirically effective on various conditional generation tasks. Moreover, we show that leveraging coherence generates realistic and diverse samples that respect conditional information better than models trained on cleaned datasets where samples with low coherence have been discarded. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted at CVPR 2024 as a Highlight. Project page: https://nicolas-dufour.github.io/cad.html

arXiv:2404.13040 [pdf, other]

Analysis of Classifier-Free Guidance Weight Schedulers

Authors: Xi Wang, Nicolas Dufour, Nefeli Andreou, Marie-Paule Cani, Victoria Fernandez Abrevaya, David Picard, Vicky Kalogeiton

Abstract: Classifier-Free Guidance (CFG) enhances the quality and condition adherence of text-to-image diffusion models. It operates by combining the conditional and unconditional predictions using a fixed weight. However, recent works vary the weights throughout the diffusion process, reporting superior results but without providing any rationale or analysis. By conducting comprehensive experiments, this p… ▽ More Classifier-Free Guidance (CFG) enhances the quality and condition adherence of text-to-image diffusion models. It operates by combining the conditional and unconditional predictions using a fixed weight. However, recent works vary the weights throughout the diffusion process, reporting superior results but without providing any rationale or analysis. By conducting comprehensive experiments, this paper provides insights into CFG weight schedulers. Our findings suggest that simple, monotonically increasing weight schedulers consistently lead to improved performances, requiring merely a single line of code. In addition, more complex parametrized schedulers can be optimized for further improvement, but do not generalize across different models and tasks. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2401.09629 [pdf, other]

Multiple Locally Linear Kernel Machines

Authors: David Picard

Abstract: In this paper we propose a new non-linear classifier based on a combination of locally linear classifiers. A well known optimization formulation is given as we cast the problem in a $\ell_1$ Multiple Kernel Learning (MKL) problem using many locally linear kernels. Since the number of such kernels is huge, we provide a scalable generic MKL training algorithm handling streaming kernels. With respect… ▽ More In this paper we propose a new non-linear classifier based on a combination of locally linear classifiers. A well known optimization formulation is given as we cast the problem in a $\ell_1$ Multiple Kernel Learning (MKL) problem using many locally linear kernels. Since the number of such kernels is huge, we provide a scalable generic MKL training algorithm handling streaming kernels. With respect to the inference time, the resulting classifier fits the gap between high accuracy but slow non-linear classifiers (such as classical MKL) and fast but low accuracy linear classifiers. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: This paper was written in 2014 and was originally submitted but rejected at ICML'15

arXiv:2310.11265 [pdf, other]

Image Compression using only Attention based Neural Networks

Authors: Natacha Luka, Romain Negrel, David Picard

Abstract: In recent research, Learned Image Compression has gained prominence for its capacity to outperform traditional handcrafted pipelines, especially at low bit-rates. While existing methods incorporate convolutional priors with occasional attention blocks to address long-range dependencies, recent advances in computer vision advocate for a transformative shift towards fully transformer-based architect… ▽ More In recent research, Learned Image Compression has gained prominence for its capacity to outperform traditional handcrafted pipelines, especially at low bit-rates. While existing methods incorporate convolutional priors with occasional attention blocks to address long-range dependencies, recent advances in computer vision advocate for a transformative shift towards fully transformer-based architectures grounded in the attention mechanism. This paper investigates the feasibility of image compression exclusively using attention layers within our novel model, QPressFormer. We introduce the concept of learned image queries to aggregate patch information via cross-attention, followed by quantization and coding techniques. Through extensive evaluations, our work demonstrates competitive performance achieved by convolution-free architectures across the popular Kodak, DIV2K, and CLIC datasets. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2308.11677 [pdf, other]

An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning

Authors: Grégoire Petit, Michael Soumm, Eva Feillet, Adrian Popescu, Bertrand Delezoide, David Picard, Céline Hudelot

Abstract: Class-Incremental Learning (CIL) aims to build classification models from data streams. At each step of the CIL process, new classes must be integrated into the model. Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored, the case on which we focus here. To date, most approaches are based exclusively on the target dataset of the CIL proce… ▽ More Class-Incremental Learning (CIL) aims to build classification models from data streams. At each step of the CIL process, new classes must be integrated into the model. Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored, the case on which we focus here. To date, most approaches are based exclusively on the target dataset of the CIL process. However, the use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum. The initial model of the CIL process may only use the first batch of the target dataset, or also use pre-trained weights obtained on an auxiliary dataset. The choice between these two initial learning strategies can significantly influence the performance of the incremental learning model, but has not yet been studied in depth. Performance is also influenced by the choice of the CIL algorithm, the neural architecture, the nature of the target task, the distribution of classes in the stream and the number of examples available for learning. We conduct a comprehensive experimental study to assess the roles of these factors. We present a statistical analysis framework that quantifies the relative contribution of each factor to incremental performance. Our main finding is that the initial training strategy is the dominant factor influencing the average incremental accuracy, but that the choice of CIL algorithm is more important in preventing forgetting. Based on this analysis, we propose practical recommendations for choosing the right initial training strategy for a given incremental learning use case. These recommendations are intended to facilitate the practical deployment of incremental learning. △ Less

Submitted 27 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

arXiv:2306.02928 [pdf, other]

LRVS-Fashion: Extending Visual Search with Referring Instructions

Authors: Simon Lepage, Jérémie Mary, David Picard

Abstract: This paper introduces a new challenge for image similarity search in the context of fashion, addressing the inherent ambiguity in this domain stemming from complex images. We present Referred Visual Search (RVS), a task allowing users to define more precisely the desired similarity, following recent interest in the industry. We release a new large public dataset, LRVS-Fashion, consisting of 272k f… ▽ More This paper introduces a new challenge for image similarity search in the context of fashion, addressing the inherent ambiguity in this domain stemming from complex images. We present Referred Visual Search (RVS), a task allowing users to define more precisely the desired similarity, following recent interest in the industry. We release a new large public dataset, LRVS-Fashion, consisting of 272k fashion products with 842k images extracted from fashion catalogs, designed explicitly for this task. However, unlike traditional visual search methods in the industry, we demonstrate that superior performance can be achieved by bypassing explicit object detection and adopting weakly-supervised conditional contrastive learning on image tuples. Our method is lightweight and demonstrates robustness, reaching Recall at one superior to strong detection-based baselines against 2M distractors. The dataset is available at https://huggingface.co/datasets/Slep/LAION-RVS-Fashion . △ Less

Submitted 15 May, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 29 pages, 14 figures, 5 tables

MSC Class: 68T07 (Primary) 68T45 (Secondary) ACM Class: I.2.10

arXiv:2302.00384 [pdf, other]

Alphazzle: Jigsaw Puzzle Solver with Deep Monte-Carlo Tree Search

Authors: Marie-Morgane Paumard, Hedi Tabia, David Picard

Abstract: Solving jigsaw puzzles requires to grasp the visual features of a sequence of patches and to explore efficiently a solution space that grows exponentially with the sequence length. Therefore, visual deep reinforcement learning (DRL) should answer this problem more efficiently than optimization solvers coupled with neural networks. Based on this assumption, we introduce Alphazzle, a reassembly algo… ▽ More Solving jigsaw puzzles requires to grasp the visual features of a sequence of patches and to explore efficiently a solution space that grows exponentially with the sequence length. Therefore, visual deep reinforcement learning (DRL) should answer this problem more efficiently than optimization solvers coupled with neural networks. Based on this assumption, we introduce Alphazzle, a reassembly algorithm based on single-player Monte Carlo Tree Search (MCTS). A major difference with DRL algorithms lies in the unavailability of game reward for MCTS, and we show how to estimate it from the visual input with neural networks. This constraint is induced by the puzzle-solving task and dramatically adds to the task complexity (and interest!). We perform an in-deep ablation study that shows the importance of MCTS and the neural networks working together. We achieve excellent results and get exciting insights into the combination of DRL and visual feature learning. △ Less

Submitted 1 February, 2023; originally announced February 2023.

arXiv:2212.10292 [pdf, other]

Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?

Authors: Monika Wysoczańska, Tom Monnier, Tomasz Trzciński, David Picard

Abstract: Recent advances in visual representation learning allowed to build an abundance of powerful off-the-shelf features that are ready-to-use for numerous downstream tasks. This work aims to assess how well these features preserve information about the objects, such as their spatial location, their visual properties and their relative relationships. We propose to do so by evaluating them in the context… ▽ More Recent advances in visual representation learning allowed to build an abundance of powerful off-the-shelf features that are ready-to-use for numerous downstream tasks. This work aims to assess how well these features preserve information about the objects, such as their spatial location, their visual properties and their relative relationships. We propose to do so by evaluating them in the context of visual reasoning, where multiple objects with complex relationships and different attributes are at play. More specifically, we introduce a protocol to evaluate visual representations for the task of Visual Question Answering. In order to decouple visual feature extraction from reasoning, we design a specific attention-based reasoning module which is trained on the frozen visual representations to be evaluated, in a spirit similar to standard feature evaluations relying on shallow networks. We compare two types of visual representations, densely extracted local features and object-centric ones, against the performances of a perfect image representation using ground truth. Our main findings are two-fold. First, despite excellent performances on classical proxy tasks, such representations fall short for solving complex reasoning problem. Second, object-centric features better preserve the critical information necessary to perform visual reasoning. In our proposed framework we show how to methodologically approach this evaluation. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2211.15692 [pdf, other]

H3WB: Human3.6M 3D WholeBody Dataset and Benchmark

Authors: Yue Zhu, Nermin Samet, David Picard

Abstract: We present a benchmark for 3D human whole-body pose estimation, which involves identifying accurate 3D keypoints on the entire human body, including face, hands, body, and feet. Currently, the lack of a fully annotated and accurate 3D whole-body dataset results in deep networks being trained separately on specific body parts, which are combined during inference. Or they rely on pseudo-groundtruth… ▽ More We present a benchmark for 3D human whole-body pose estimation, which involves identifying accurate 3D keypoints on the entire human body, including face, hands, body, and feet. Currently, the lack of a fully annotated and accurate 3D whole-body dataset results in deep networks being trained separately on specific body parts, which are combined during inference. Or they rely on pseudo-groundtruth provided by parametric body models which are not as accurate as detection based methods. To overcome these issues, we introduce the Human3.6M 3D WholeBody (H3WB) dataset, which provides whole-body annotations for the Human3.6M dataset using the COCO Wholebody layout. H3WB comprises 133 whole-body keypoint annotations on 100K images, made possible by our new multi-view pipeline. We also propose three tasks: i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, and iii) 3D whole-body pose estimation from a single RGB image. Additionally, we report several baselines from popular methods for these tasks. Furthermore, we also provide automated 3D whole-body annotations of TotalCapture and experimentally show that when used with H3WB it helps to improve the performance. Code and dataset is available at https://github.com/wholebody3d/wholebody3d △ Less

Submitted 6 September, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: Accepted by ICCV 2023

arXiv:2211.13131 [pdf, other]

FeTrIL: Feature Translation for Exemplar-Free Class-Incremental Learning

Authors: Grégoire Petit, Adrian Popescu, Hugo Schindler, David Picard, Bertrand Delezoide

Abstract: Exemplar-free class-incremental learning is very challenging due to the negative effect of catastrophic forgetting. A balance between stability and plasticity of the incremental process is needed in order to obtain good accuracy for past as well as new classes. Existing exemplar-free class-incremental methods focus either on successive fine tuning of the model, thus favoring plasticity, or on usin… ▽ More Exemplar-free class-incremental learning is very challenging due to the negative effect of catastrophic forgetting. A balance between stability and plasticity of the incremental process is needed in order to obtain good accuracy for past as well as new classes. Existing exemplar-free class-incremental methods focus either on successive fine tuning of the model, thus favoring plasticity, or on using a feature extractor fixed after the initial incremental state, thus favoring stability. We introduce a method which combines a fixed feature extractor and a pseudo-features generator to improve the stability-plasticity balance. The generator uses a simple yet effective geometric translation of new class features to create representations of past classes, made of pseudo-features. The translation of features only requires the storage of the centroid representations of past classes to produce their pseudo-features. Actual features of new classes and pseudo-features of past classes are fed into a linear classifier which is trained incrementally to discriminate between all classes. The incremental process is much faster with the proposed method compared to mainstream ones which update the entire deep model. Experiments are performed with three challenging datasets, and different incremental settings. A comparison with ten existing methods shows that our method outperforms the others in most cases. △ Less

Submitted 28 November, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

arXiv:2210.04883 [pdf, other]

SCAM! Transferring humans between images with Semantic Cross Attention Modulation

Authors: Nicolas Dufour, David Picard, Vicky Kalogeiton

Abstract: A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse info… ▽ More A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: Accepted at ECCV 2022

arXiv:2210.02231 [pdf, other]

Decanus to Legatus: Synthetic training for 2D-3D human pose lifting

Authors: Yue Zhu, David Picard

Abstract: 3D human pose estimation is a challenging task because of the difficulty to acquire ground-truth data outside of controlled environments. A number of further issues have been hindering progress in building a universal and robust model for this task, including domain gaps between different datasets, unseen actions between train and test datasets, various hardware settings and high cost of annotatio… ▽ More 3D human pose estimation is a challenging task because of the difficulty to acquire ground-truth data outside of controlled environments. A number of further issues have been hindering progress in building a universal and robust model for this task, including domain gaps between different datasets, unseen actions between train and test datasets, various hardware settings and high cost of annotation, etc. In this paper, we propose an algorithm to generate infinite 3D synthetic human poses (Legatus) from a 3D pose distribution based on 10 initial handcrafted 3D poses (Decanus) during the training of a 2D to 3D human pose lifter neural network. Our results show that we can achieve 3D pose estimation performance comparable to methods using real data from specialized datasets but in a zero-shot setup, showing the generalization potential of our framework. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: Accepted by ACCV 2022

arXiv:2209.06606 [pdf, other]

PlaStIL: Plastic and Stable Memory-Free Class-Incremental Learning

Authors: Grégoire Petit, Adrian Popescu, Eden Belouadah, David Picard, Bertrand Delezoide

Abstract: Plasticity and stability are needed in class-incremental learning in order to learn from new data while preserving past knowledge. Due to catastrophic forgetting, finding a compromise between these two properties is particularly challenging when no memory buffer is available. Mainstream methods need to store two deep models since they integrate new classes using fine-tuning with knowledge distilla… ▽ More Plasticity and stability are needed in class-incremental learning in order to learn from new data while preserving past knowledge. Due to catastrophic forgetting, finding a compromise between these two properties is particularly challenging when no memory buffer is available. Mainstream methods need to store two deep models since they integrate new classes using fine-tuning with knowledge distillation from the previous incremental state. We propose a method which has similar number of parameters but distributes them differently in order to find a better balance between plasticity and stability. Following an approach already deployed by transfer-based incremental methods, we freeze the feature extractor after the initial state. Classes in the oldest incremental states are trained with this frozen extractor to ensure stability. Recent classes are predicted using partially fine-tuned models in order to introduce plasticity. Our proposed plasticity layer can be incorporated to any transfer-based method designed for exemplar-free incremental learning, and we apply it to two such methods. Evaluation is done with three large-scale datasets. Results show that performance gains are obtained in all tested configurations compared to existing methods. △ Less

Submitted 4 July, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

arXiv:2207.10541 [pdf, other]

Unveiling the Latent Space Geometry of Push-Forward Generative Models

Authors: Thibaut Issenhuth, Ugo Tanielian, Jérémie Mary, David Picard

Abstract: Many deep generative models are defined as a push-forward of a Gaussian measure by a continuous generator, such as Generative Adversarial Networks (GANs) or Variational Auto-Encoders (VAEs). This work explores the latent space of such deep generative models. A key issue with these models is their tendency to output samples outside of the support of the target distribution when learning disconnecte… ▽ More Many deep generative models are defined as a push-forward of a Gaussian measure by a continuous generator, such as Generative Adversarial Networks (GANs) or Variational Auto-Encoders (VAEs). This work explores the latent space of such deep generative models. A key issue with these models is their tendency to output samples outside of the support of the target distribution when learning disconnected distributions. We investigate the relationship between the performance of these models and the geometry of their latent space. Building on recent developments in geometric measure theory, we prove a sufficient condition for optimality in the case where the dimension of the latent space is larger than the number of modes. Through experiments on GANs, we demonstrate the validity of our theoretical results and gain new insights into the latent space geometry of these models. Additionally, we propose a truncation method that enforces a simplicial cluster structure in the latent space and improves the performance of GANs. △ Less

Submitted 15 May, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

arXiv:2207.08782 [pdf, other]

Instance-Aware Observer Network for Out-of-Distribution Object Segmentation

Authors: Victor Besnier, Andrei Bursuc, David Picard, Alexandre Briot

Abstract: Recent works on predictive uncertainty estimation have shown promising results on Out-Of-Distribution (OOD) detection for semantic segmentation. However, these methods struggle to precisely locate the point of interest in the image, i.e, the anomaly. This limitation is due to the difficulty of finegrained prediction at the pixel level. To address this issue, we build upon the recent ObsNet approac… ▽ More Recent works on predictive uncertainty estimation have shown promising results on Out-Of-Distribution (OOD) detection for semantic segmentation. However, these methods struggle to precisely locate the point of interest in the image, i.e, the anomaly. This limitation is due to the difficulty of finegrained prediction at the pixel level. To address this issue, we build upon the recent ObsNet approach by providing object instance knowledge to the observer. We extend ObsNet by harnessing an instance-wise mask prediction. We use an additional, class agnostic, object detector to filter and aggregate observer predictions. Finally, we predict an unique anomaly score for each instance in the image. We show that our proposed method accurately disentangles in-distribution objects from OOD objects on three datasets. △ Less

Submitted 29 August, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

arXiv:2202.00342 [pdf, other]

doi 10.1103/PhysRevA.105.L020802

Absolute measurements of state-to-state rotational energy transfer between CO and H2 at interstellar temperatures

Authors: H. Labiad, M. Fournier, L. A. Mertens, A. Faure, D. Carty, T. Stoecklin, P. Jankowski, K. Szalewicz, S. D. Le Picard, I. R. Sims

Abstract: Experimental measurements and theoretical calculations of state-to-state rate coefficients for rotational energy transfer of CO in collision with H$_2$ are reported at the very low temperatures prevailing in dense interstellar clouds (5 - 20 K). Detailed agreement between quantum state-selected experiments performed in cold supersonic flows using time-resolved infrared - vacuum-ultraviolet double… ▽ More Experimental measurements and theoretical calculations of state-to-state rate coefficients for rotational energy transfer of CO in collision with H$_2$ are reported at the very low temperatures prevailing in dense interstellar clouds (5 - 20 K). Detailed agreement between quantum state-selected experiments performed in cold supersonic flows using time-resolved infrared - vacuum-ultraviolet double resonance spectroscopy and close-coupling quantum scattering calculations confirms the validity of the calculations for collisions between the two most abundant molecules in the interstellar medium. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: 6 pages, 4 figures, accepted for publication in Phys. Rev. A. Letter (21/12/2021)

arXiv:2111.15264 [pdf, other]

EdiBERT, a generative model for image editing

Authors: Thibaut Issenhuth, Ugo Tanielian, Jérémie Mary, David Picard

Abstract: Advances in computer vision are pushing the limits of im-age manipulation, with generative models sampling detailed images on various tasks. However, a specialized model is often developed and trained for each specific task, even though many image edition tasks share similarities. In denoising, inpainting, or image compositing, one always aims at generating a realistic image from a low-quality one… ▽ More Advances in computer vision are pushing the limits of im-age manipulation, with generative models sampling detailed images on various tasks. However, a specialized model is often developed and trained for each specific task, even though many image edition tasks share similarities. In denoising, inpainting, or image compositing, one always aims at generating a realistic image from a low-quality one. In this paper, we aim at making a step towards a unified approach for image editing. To do so, we propose EdiBERT, a bi-directional transformer trained in the discrete latent space built by a vector-quantized auto-encoder. We argue that such a bidirectional model is suited for image manipulation since any patch can be re-sampled conditionally to the whole image. Using this unique and straightforward training objective, we show that the resulting model matches state-of-the-art performances on a wide variety of tasks: image denoising, image completion, and image composition. △ Less

Submitted 21 July, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

arXiv:2111.10248 [pdf, other]

Non asymptotic bounds in asynchronous sum-weight gossip protocols

Authors: David Picard, Jérôme Fellus, Stéphane Garnier

Abstract: This paper focuses on non-asymptotic diffusion time in asynchronous gossip protocols. Asynchronous gossip protocols are designed to perform distributed computation in a network of nodes by randomly exchanging messages on the associated graph. To achieve consensus among nodes, a minimal number of messages has to be exchanged. We provides a probabilistic bound to such number for the general case. We… ▽ More This paper focuses on non-asymptotic diffusion time in asynchronous gossip protocols. Asynchronous gossip protocols are designed to perform distributed computation in a network of nodes by randomly exchanging messages on the associated graph. To achieve consensus among nodes, a minimal number of messages has to be exchanged. We provides a probabilistic bound to such number for the general case. We provide a explicit formula for fully connected graphs depending only on the number of nodes and an approximation for any graph depending on the spectrum of the graph. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: Unpublished work done circa 2016

arXiv:2110.09803 [pdf, other]

Latent reweighting, an almost free improvement for GANs

Authors: Thibaut Issenhuth, Ugo Tanielian, David Picard, Jeremie Mary

Abstract: Standard formulations of GANs, where a continuous function deforms a connected latent space, have been shown to be misspecified when fitting different classes of images. In particular, the generator will necessarily sample some low-quality images in between the classes. Rather than modifying the architecture, a line of works aims at improving the sampling quality from pre-trained generators at the… ▽ More Standard formulations of GANs, where a continuous function deforms a connected latent space, have been shown to be misspecified when fitting different classes of images. In particular, the generator will necessarily sample some low-quality images in between the classes. Rather than modifying the architecture, a line of works aims at improving the sampling quality from pre-trained generators at the expense of increased computational cost. Building on this, we introduce an additional network to predict latent importance weights and two associated sampling methods to avoid the poorest samples. This idea has several advantages: 1) it provides a way to inject disconnectedness into any GAN architecture, 2) since the rejection happens in the latent space, it avoids going through both the generator and the discriminator, saving computation time, 3) this importance weights formulation provides a principled way to reduce the Wasserstein's distance to the target distribution. We demonstrate the effectiveness of our method on several datasets, both synthetic and high-dimensional. △ Less

Submitted 19 October, 2021; originally announced October 2021.

arXiv:2109.08203 [pdf, other]

Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision

Authors: David Picard

Abstract: In this paper I investigate the effect of random seed selection on the accuracy when using popular deep learning architectures for computer vision. I scan a large amount of seeds (up to $10^4$) on CIFAR 10 and I also scan fewer seeds on Imagenet using pre-trained models to investigate large scale datasets. The conclusions are that even if the variance is not very large, it is surprisingly easy to… ▽ More In this paper I investigate the effect of random seed selection on the accuracy when using popular deep learning architectures for computer vision. I scan a large amount of seeds (up to $10^4$) on CIFAR 10 and I also scan fewer seeds on Imagenet using pre-trained models to investigate large scale datasets. The conclusions are that even if the variance is not very large, it is surprisingly easy to find an outlier that performs much better or much worse than the average. △ Less

Submitted 11 May, 2023; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: fixed typos

arXiv:2108.08109 [pdf, other]

Image Collation: Matching illustrations in manuscripts

Authors: Ryad Kaoua, Xi Shen, Alexandra Durr, Stavros Lazaris, David Picard, Mathieu Aubry

Abstract: Illustrations are an essential transmission instrument. For an historian, the first step in studying their evolution in a corpus of similar manuscripts is to identify which ones correspond to each other. This image collation task is daunting for manuscripts separated by many lost copies, spreading over centuries, which might have been completely re-organized and greatly modified to adapt to novel… ▽ More Illustrations are an essential transmission instrument. For an historian, the first step in studying their evolution in a corpus of similar manuscripts is to identify which ones correspond to each other. This image collation task is daunting for manuscripts separated by many lost copies, spreading over centuries, which might have been completely re-organized and greatly modified to adapt to novel knowledge or belief and include hundreds of illustrations. Our contributions in this paper are threefold. First, we introduce the task of illustration collation and a large annotated public dataset to evaluate solutions, including 6 manuscripts of 2 different texts with more than 2 000 illustrations and 1 200 annotated correspondences. Second, we analyze state of the art similarity measures for this task and show that they succeed in simple cases but struggle for large manuscripts when the illustrations have undergone very significant changes and are discriminated only by fine details. Finally, we show clear evidence that significant performance boosts can be expected by exploiting cycle-consistent correspondences. Our code and data are available on http://imagine.enpc.fr/~shenx/ImageCollation. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: accepted to ICDAR 2021

arXiv:2108.01634 [pdf, other]

Triggering Failures: Out-Of-Distribution detection by learning from local adversarial attacks in Semantic Segmentation

Authors: Victor Besnier, Andrei Bursuc, David Picard, Alexandre Briot

Abstract: In this paper, we tackle the detection of out-of-distribution (OOD) objects in semantic segmentation. By analyzing the literature, we found that current methods are either accurate or fast but not both which limits their usability in real world applications. To get the best of both aspects, we propose to mitigate the common shortcomings by following four design principles: decoupling the OOD detec… ▽ More In this paper, we tackle the detection of out-of-distribution (OOD) objects in semantic segmentation. By analyzing the literature, we found that current methods are either accurate or fast but not both which limits their usability in real world applications. To get the best of both aspects, we propose to mitigate the common shortcomings by following four design principles: decoupling the OOD detection from the segmentation task, observing the entire segmentation network instead of just its output, generating training data for the OOD detector by leveraging blind spots in the segmentation network and focusing the generated data on localized regions in the image to simulate OOD objects. Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA). We validate the soundness of our approach across numerous ablation studies. We also show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets. △ Less

Submitted 3 August, 2021; originally announced August 2021.

arXiv:2105.13688 [pdf, other]

Learning Uncertainty For Safety-Oriented Semantic Segmentation In Autonomous Driving

Authors: Victor Besnier, David Picard, Alexandre Briot

Abstract: In this paper, we show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving, by triggering a fallback behavior if a target accuracy cannot be guaranteed. We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function. We propose to estimate this dissimilarity by training a deep neural archite… ▽ More In this paper, we show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving, by triggering a fallback behavior if a target accuracy cannot be guaranteed. We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function. We propose to estimate this dissimilarity by training a deep neural architecture in parallel to the task-specific network. It allows this observer to be dedicated to the uncertainty estimation, and let the task-specific network make predictions. We propose to use self-supervision to train the observer, which implies that our method does not require additional training data. We show experimentally that our proposed approach is much less computationally intensive at inference time than competing methods (e.g. MCDropout), while delivering better results on safety-oriented evaluation metrics on the CamVid dataset, especially in the case of glare artifacts. △ Less

Submitted 28 May, 2021; originally announced May 2021.

arXiv:2103.11409 [pdf, other]

Deep Learning Based Detection for Spectrally Efficient FDM Systems

Authors: David Picard, Arsenia Chorti

Abstract: In this study we present how to approach the problem of building efficient detectors for spectrally efficient frequency division multiplexing (SEFDM) systems. The superiority of residual convolution neural networks (CNNs) for these types of problems is demonstrated through experimentation with many different types of architectures. In this study we present how to approach the problem of building efficient detectors for spectrally efficient frequency division multiplexing (SEFDM) systems. The superiority of residual convolution neural networks (CNNs) for these types of problems is demonstrated through experimentation with many different types of architectures. △ Less

Submitted 21 March, 2021; originally announced March 2021.

arXiv:2103.02306 [pdf, ps, other]

Rate Analysis and Deep Neural Network Detectors for SEFDM FTN Systems

Authors: Arsenia Chorti, David Picard

Abstract: In this work we compare the capacity and achievable rate of uncoded faster than Nyquist (FTN) signalling in the frequency domain, also referred to as spectrally efficient FDM (SEFDM). We propose a deep residual convolutional neural network detector for SEFDM signals in additive white Gaussian noise channels, that allows to approach the Mazo limit in systems with up to 60 subcarriers. Notably, the… ▽ More In this work we compare the capacity and achievable rate of uncoded faster than Nyquist (FTN) signalling in the frequency domain, also referred to as spectrally efficient FDM (SEFDM). We propose a deep residual convolutional neural network detector for SEFDM signals in additive white Gaussian noise channels, that allows to approach the Mazo limit in systems with up to 60 subcarriers. Notably, the deep detectors achieve a loss less than 0.4-0.7 dB for uncoded QPSK SEFDM systems of 12 to 60 subcarriers at a 15% spectral compression. △ Less

Submitted 3 March, 2021; originally announced March 2021.

arXiv:2012.07487 [pdf, other]

Clustering high dimensional meteorological scenarios: results and performance index

Authors: Yamila Barrera, Leonardo Boechi, Matthieu Jonckheere, Vincent Lefieux, Dominique Picard, Ezequiel Smucler, Agustin Somacal, Alfredo Umfurer

Abstract: The Reseau de Transport d'Electricité (RTE) is the French main electricity network operational manager and dedicates large number of resources and efforts towards understanding climate time series data. We discuss here the problem and the methodology of grou** and selecting representatives of possible climate scenarios among a large number of climate simulations provided by RTE. The data used is… ▽ More The Reseau de Transport d'Electricité (RTE) is the French main electricity network operational manager and dedicates large number of resources and efforts towards understanding climate time series data. We discuss here the problem and the methodology of grou** and selecting representatives of possible climate scenarios among a large number of climate simulations provided by RTE. The data used is composed of temperature times series for 200 different possible scenarios on a grid of geographical locations in France. These should be clustered in order to detect common patterns regarding temperatures curves and help to choose representative scenarios for network simulations, which in turn can be used for energy optimisation. We first show that the choice of the distance used for the clustering has a strong impact on the meaning of the results: depending on the type of distance used, either spatial or temporal patterns prevail. Then we discuss the difficulty of fine-tuning the distance choice (combined with a dimension reduction procedure) and we propose a methodology based on a carefully designed index. △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: 19 pages, 14 figures

arXiv:2012.00374 [pdf, other]

doi 10.1063/5.0029991

A new instrument for kinetics and branching ratio studies of gas phase collisional processes at very low temperatures

Authors: Olivier Durif, Michael Capron, Joey P. Messinger, Abdessamad Benidar, Ludovic Biennier, Jérémy Bourgalais, André Canosa, Jonathan Courbe, Gustavo A. Garcia, Jean-François Gil, Laurent Nahon, Mitchio Okumura, Lucile Rutkowski, Ian R. Sims, Jonathan Thiévin, Sébastien D. Le Picard

Abstract: A new instrument dedicated to the kinetic study of low-temperature gas phase neutral-neutral reactions, including clustering processes, is presented. It combines a supersonic flow reactor with Vacuum Ultra-Violet (VUV) synchrotron photoionization time of flight mass spectrometry. A photoion-photoelectron coincidence detection scheme has been adopted to optimize the particle counting efficiency. Th… ▽ More A new instrument dedicated to the kinetic study of low-temperature gas phase neutral-neutral reactions, including clustering processes, is presented. It combines a supersonic flow reactor with Vacuum Ultra-Violet (VUV) synchrotron photoionization time of flight mass spectrometry. A photoion-photoelectron coincidence detection scheme has been adopted to optimize the particle counting efficiency. The characteristics of the instrument are detailed along with its capabilities illustrated through a few results obtained at low temperatures (< 100 K) including a {photoionization spectrum} of n-butane, the detection of formic acid dimer formation as well as the observation of diacetylene molecules formed by the reaction between the C$_2$H radical and C$_2$H$_2$. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2009.01998 [pdf, other]

SSP-Net: Scalable Sequential Pyramid Networks for Real-Time 3D Human Pose Regression

Authors: Diogo Luvizon, Hedi Tabia, David Picard

Abstract: In this paper we propose a highly scalable convolutional neural network, end-to-end trainable, for real-time 3D human pose regression from still RGB images. We call this approach the Scalable Sequential Pyramid Networks (SSP-Net) as it is trained with refined supervision at multiple scales in a sequential manner. Our network requires a single training procedure and is capable of producing its best… ▽ More In this paper we propose a highly scalable convolutional neural network, end-to-end trainable, for real-time 3D human pose regression from still RGB images. We call this approach the Scalable Sequential Pyramid Networks (SSP-Net) as it is trained with refined supervision at multiple scales in a sequential manner. Our network requires a single training procedure and is capable of producing its best predictions at 120 frames per second (FPS), or acceptable predictions at more than 200 FPS when cut at test time. We show that the proposed regression approach is invariant to the size of feature maps, allowing our method to perform multi-resolution intermediate supervisions and reaching results comparable to the state-of-the-art with very low resolution feature maps. We demonstrate the accuracy and the effectiveness of our method by providing extensive experiments on two of the most important publicly available datasets for 3D pose estimation, Human3.6M and MPI-INF-3DHP. Additionally, we provide relevant insights about our decisions on the network architecture and show its flexibility to meet the best precision-speed compromise. △ Less

Submitted 3 September, 2020; originally announced September 2020.

Comments: Under review at PR

arXiv:2006.06611 [pdf, other]

Improving Deep Metric Learning with Virtual Classes and Examples Mining

Authors: Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein

Abstract: In deep metric learning, the training procedure relies on sampling informative tuples. However, as the training procedure progresses, it becomes nearly impossible to sample relevant hard negative examples without proper mining strategies or generation-based methods. Recent work on hard negative generation have shown great promises to solve the mining problem. However, this generation process is di… ▽ More In deep metric learning, the training procedure relies on sampling informative tuples. However, as the training procedure progresses, it becomes nearly impossible to sample relevant hard negative examples without proper mining strategies or generation-based methods. Recent work on hard negative generation have shown great promises to solve the mining problem. However, this generation process is difficult to tune and often leads to incorrectly labelled examples. To tackle this issue, we introduce MIRAGE, a generation-based method that relies on virtual classes entirely composed of generated examples that act as buffer areas between the training classes. We empirically show that virtual classes significantly improve the results on popular datasets (Cub-200-2011, Cars-196 and Stanford Online Products) compared to other generation methods. △ Less

Submitted 11 June, 2020; originally announced June 2020.

arXiv:2005.12548 [pdf, other]

doi 10.1109/TIP.2019.2963378

Deepzzle: Solving Visual Jigsaw Puzzles with Deep Learning andShortest Path Optimization

Authors: Marie-Morgane Paumard, David Picard, Hedi Tabia

Abstract: We tackle the image reassembly problem with wide space between the fragments, in such a way that the patterns and colors continuity is mostly unusable. The spacing emulates the erosion of which the archaeological fragments suffer. We crop-square the fragments borders to compel our algorithm to learn from the content of the fragments. We also complicate the image reassembly by removing fragments an… ▽ More We tackle the image reassembly problem with wide space between the fragments, in such a way that the patterns and colors continuity is mostly unusable. The spacing emulates the erosion of which the archaeological fragments suffer. We crop-square the fragments borders to compel our algorithm to learn from the content of the fragments. We also complicate the image reassembly by removing fragments and adding pieces from other sources. We use a two-step method to obtain the reassemblies: 1) a neural network predicts the positions of the fragments despite the gaps between them; 2) a graph that leads to the best reassemblies is made from these predictions. In this paper, we notably investigate the effect of branch-cut in the graph of reassemblies. We also provide a comparison with the literature, solve complex images reassemblies, explore at length the dataset, and propose a new metric that suits its specificities. Keywords: image reassembly, jigsaw puzzle, deep learning, graph, branch-cut, cultural heritage △ Less

Submitted 26 May, 2020; originally announced May 2020.

Journal ref: IEEE Transactions on Image Processing (2020)

arXiv:2004.14644 [pdf, other]

doi 10.1016/j.patrec.2020.03.020

DIABLO: Dictionary-based Attention Block for Deep Metric Learning

Authors: Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein

Abstract: Recent breakthroughs in representation learning of unseen classes and examples have been made in deep metric learning by training at the same time the image representations and a corresponding metric with deep networks. Recent contributions mostly address the training part (loss functions, sampling strategies, etc.), while a few works focus on improving the discriminative power of the image repres… ▽ More Recent breakthroughs in representation learning of unseen classes and examples have been made in deep metric learning by training at the same time the image representations and a corresponding metric with deep networks. Recent contributions mostly address the training part (loss functions, sampling strategies, etc.), while a few works focus on improving the discriminative power of the image representation. In this paper, we propose DIABLO, a dictionary-based attention method for image embedding. DIABLO produces richer representations by aggregating only visually-related features together while being easier to train than other attention-based methods in deep metric learning. This is experimentally confirmed on four deep metric learning datasets (Cub-200-2011, Cars-196, Stanford Online Products, and In-Shop Clothes Retrieval) for which DIABLO shows state-of-the-art performances. △ Less

Submitted 30 April, 2020; originally announced April 2020.

Comments: Pre-print. Accepted for publication at Pattern Recognition Letters

arXiv:2002.02250 [pdf, other]

Uncovering differential equations from data with hidden variables

Authors: Agustín Somacal, Yamila Barrera, Leonardo Boechi, Matthieu Jonckheere, Vincent Lefieux, Dominique Picard, Ezequiel Smucler

Abstract: SINDy is a method for learning system of differential equations from data by solving a sparse linear regression optimization problem [Brunton et al., 2016]. In this article, we propose an extension of the SINDy method that learns systems of differential equations in cases where some of the variables are not observed. Our extension is based on regressing a higher order time derivative of a target v… ▽ More SINDy is a method for learning system of differential equations from data by solving a sparse linear regression optimization problem [Brunton et al., 2016]. In this article, we propose an extension of the SINDy method that learns systems of differential equations in cases where some of the variables are not observed. Our extension is based on regressing a higher order time derivative of a target variable onto a dictionary of functions that includes lower order time derivatives of the target variable. We evaluate our method by measuring the prediction accuracy of the learned dynamical systems on synthetic data and on a real data-set of temperature time series provided by the Réseau de Transport d'Électricité (RTE). Our method provides high quality short-term forecasts and it is orders of magnitude faster than competing methods for learning differential equations with latent variables. △ Less

Submitted 23 December, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

arXiv:1912.08077 [pdf, other]

doi 10.1109/TPAMI.2020.2976014

Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition

Authors: Diogo C Luvizon, Hedi Tabia, David Picard

Abstract: Human pose estimation and action recognition are related tasks since both problems are strongly dependent on the human body representation and analysis. Nonetheless, most recent methods in the literature handle the two problems separately. In this work, we propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular color images and classifying human actions from video… ▽ More Human pose estimation and action recognition are related tasks since both problems are strongly dependent on the human body representation and analysis. Nonetheless, most recent methods in the literature handle the two problems separately. In this work, we propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular color images and classifying human actions from video sequences. We show that a single architecture can be used to solve both problems in an efficient way and still achieves state-of-the-art or comparable results at each task while running at more than 100 frames per second. The proposed method benefits from high parameters sharing between the two tasks by unifying still images and video clips processing in a single pipeline, allowing the model to be trained with data from different categories simultaneously and in a seamlessly way. Additionally, we provide important insights for end-to-end training the proposed multi-task model by decoupling key prediction parts, which consistently leads to better accuracy on both tasks. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU RGB+D) demonstrate the effectiveness of our method on the targeted tasks. Our source code and trained weights are publicly available at https://github.com/dluvizon/deephar. △ Less

Submitted 3 March, 2020; v1 submitted 14 December, 2019; originally announced December 2019.

Comments: Accepted to TPAMI. arXiv admin note: text overlap with arXiv:1802.09232

arXiv:1911.09245 [pdf, other]

Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates

Authors: Diogo C Luvizon, Hedi Tabia, David Picard

Abstract: 3D human pose estimation is frequently seen as the task of estimating 3D poses relative to the root body joint. Alternatively, we propose a 3D human pose estimation method in camera coordinates, which allows effective combination of 2D annotated data and 3D poses and a straightforward multi-view generalization. To that end, we cast the problem as a view frustum space pose estimation, where absolut… ▽ More 3D human pose estimation is frequently seen as the task of estimating 3D poses relative to the root body joint. Alternatively, we propose a 3D human pose estimation method in camera coordinates, which allows effective combination of 2D annotated data and 3D poses and a straightforward multi-view generalization. To that end, we cast the problem as a view frustum space pose estimation, where absolute depth prediction and joint relative depth estimations are disentangled. Final 3D predictions are obtained in camera coordinates by the inverse camera projection. Based on this, we also present a consensus-based optimization algorithm for multi-view predictions from uncalibrated images, which requires a single monocular training procedure. Although our method is indirectly tied to the training camera intrinsics, it still converges for cameras with different intrinsic parameters, resulting in coherent estimations up to a scale factor. Our method improves the state of the art on well known 3D human pose datasets, reducing the prediction error by 32% in the most common benchmark. We also reported our results in absolute pose position error, achieving 80~mm for monocular estimations and 51~mm for multi-view, on average. △ Less

Submitted 20 August, 2021; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: Source code is available at https://github.com/dluvizon/3d-pose-consensus

arXiv:1908.02735 [pdf, other]

Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings

Authors: Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein

Abstract: Learning an effective similarity measure between image representations is key to the success of recent advances in visual search tasks (e.g. verification or zero-shot learning). Although the metric learning part is well addressed, this metric is usually computed over the average of the extracted deep features. This representation is then trained to be discriminative. However, these deep features t… ▽ More Learning an effective similarity measure between image representations is key to the success of recent advances in visual search tasks (e.g. verification or zero-shot learning). Although the metric learning part is well addressed, this metric is usually computed over the average of the extracted deep features. This representation is then trained to be discriminative. However, these deep features tend to be scattered across the feature space. Consequently, the representations are not robust to outliers, object occlusions, background variations, etc. In this paper, we tackle this scattering problem with a distribution-aware regularization named HORDE. This regularizer enforces visually-close images to have deep features with the same distribution which are well localized in the feature space. We provide a theoretical analysis supporting this regularization effect. We also show the effectiveness of our approach by obtaining state-of-the-art results on 4 well-known datasets (Cub-200-2011, Cars-196, Stanford Online Products and Inshop Clothes Retrieval). △ Less

Submitted 7 August, 2019; originally announced August 2019.

Comments: Camera-ready for our ICCV 2019 paper (poster)

arXiv:1906.01972 [pdf, ps, other]

Efficient Codebook and Factorization for Second Order Representation Learning

Authors: Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein

Abstract: Learning rich and compact representations is an open topic in many fields such as object recognition or image retrieval. Deep neural networks have made a major breakthrough during the last few years for these tasks but their representations are not necessary as rich as needed nor as compact as expected. To build richer representations, high order statistics have been exploited and have shown excel… ▽ More Learning rich and compact representations is an open topic in many fields such as object recognition or image retrieval. Deep neural networks have made a major breakthrough during the last few years for these tasks but their representations are not necessary as rich as needed nor as compact as expected. To build richer representations, high order statistics have been exploited and have shown excellent performances, but they produce higher dimensional features. While this drawback has been partially addressed with factorization schemes, the original compactness of first order models has never been retrieved, or at the cost of a strong performance decrease. Our method, by jointly integrating codebook strategy to factorization scheme, is able to produce compact representations while kee** the second order performances with few additional parameters. This formulation leads to state-of-the-art results on three image retrieval datasets. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: Accepted at IEEE International Conference on Image Processing (ICIP) 2019

arXiv:1809.00898 [pdf, other]

Image Reassembly Combining Deep Learning and Shortest Path Problem

Authors: M. -M. Paumard, D. Picard, H. Tabia

Abstract: This paper addresses the problem of reassembling images from disjointed fragments. More specifically, given an unordered set of fragments, we aim at reassembling one or several possibly incomplete images. The main contributions of this work are: 1) several deep neural architectures to predict the relative position of image fragments that outperform the previous state of the art; 2) casting the rea… ▽ More This paper addresses the problem of reassembling images from disjointed fragments. More specifically, given an unordered set of fragments, we aim at reassembling one or several possibly incomplete images. The main contributions of this work are: 1) several deep neural architectures to predict the relative position of image fragments that outperform the previous state of the art; 2) casting the reassembly problem into the shortest path in a graph problem for which we provide several construction algorithms depending on available information; 3) a new dataset of images taken from the Metropolitan Museum of Art (MET) dedicated to image reassembly for which we provide a clear setup and a strong baseline. △ Less

Submitted 4 September, 2018; originally announced September 2018.

Comments: ECCV 2018

arXiv:1807.03155 [pdf, other]

Jigsaw Puzzle Solving Using Local Feature Co-Occurrences in Deep Neural Networks

Authors: Marie-Morgane Paumard, David Picard, Hedi Tabia

Abstract: Archaeologists are in dire need of automated object reconstruction methods. Fragments reassembly is close to puzzle problems, which may be solved by computer vision algorithms. As they are often beaten on most image related tasks by deep learning algorithms, we study a classification method that can solve jigsaw puzzles. In this paper, we focus on classifying the relative position: given a couple… ▽ More Archaeologists are in dire need of automated object reconstruction methods. Fragments reassembly is close to puzzle problems, which may be solved by computer vision algorithms. As they are often beaten on most image related tasks by deep learning algorithms, we study a classification method that can solve jigsaw puzzles. In this paper, we focus on classifying the relative position: given a couple of fragments, we compute their local relation (e.g. on top). We propose several enhancements over the state of the art in this domain, which is outperformed by our method by 25\%. We propose an original dataset composed of pictures from the Metropolitan Museum of Art. We propose a greedy reconstruction method based on the predicted relative positions. △ Less

Submitted 5 July, 2018; originally announced July 2018.

Comments: ICIP 2018

arXiv:1806.08991 [pdf, other]

Leveraging Implicit Spatial Information in Global Features for Image Retrieval

Authors: Pierre Jacob, David Picard, Aymeric Histace, Edouard Klein

Abstract: Most image retrieval methods use global features that aggregate local distinctive patterns into a single representation. However, the aggregation process destroys the relative spatial information by considering orderless sets of local descriptors. We propose to integrate relative spatial information into the aggregation process by taking into account co-occurrences of local patterns in a tensor fr… ▽ More Most image retrieval methods use global features that aggregate local distinctive patterns into a single representation. However, the aggregation process destroys the relative spatial information by considering orderless sets of local descriptors. We propose to integrate relative spatial information into the aggregation process by taking into account co-occurrences of local patterns in a tensor framework. The resulting signature called Improved Spatial Tensor Aggregation (ISTA) is able to reach state of the art performances on well known datasets such as Holidays, Oxford5k and Paris6k. △ Less

Submitted 23 June, 2018; originally announced June 2018.

Comments: 8 pages, 2 figures and 1 table. Draft paper for conference, IEEE International Conference on Image Processing (ICIP) 2018

arXiv:1805.04682 [pdf, ps, other]

Kernel and wavelet density estimators on manifolds and more general metric spaces

Authors: G. Cleanthous, A. Georgiadis, G. Kerkyacharian, P. Petrushev, D. Picard

Abstract: We consider the problem of estimating the density of observations taking values in classical or nonclassical spaces such as manifolds and more general metric spaces. Our setting is quite general but also sufficiently rich in allowing the development of smooth functional calculus with well localized spectral kernels, Besov regularity spaces, and wavelet type systems. Kernel and both linear and nonl… ▽ More We consider the problem of estimating the density of observations taking values in classical or nonclassical spaces such as manifolds and more general metric spaces. Our setting is quite general but also sufficiently rich in allowing the development of smooth functional calculus with well localized spectral kernels, Besov regularity spaces, and wavelet type systems. Kernel and both linear and nonlinear wavelet density estimators are introduced and studied. Convergence rates for these estimators are established, which are analogous to the existing results in the classical setting of real-valued variables. △ Less

Submitted 9 February, 2019; v1 submitted 12 May, 2018; originally announced May 2018.

MSC Class: Primary 62G07; 58J35; Secondary 43A85; 42B35

arXiv:1805.00900 [pdf, other]

doi 10.1109/ICDEW.2018.00035

Images & Recipes: Retrieval in the cooking context

Authors: Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Matthieu Cord

Abstract: Recent advances in the machine learning community allowed different use cases to emerge, as its association to domains like cooking which created the computational cuisine. In this paper, we tackle the picture-recipe alignment problem, having as target application the large-scale retrieval task (finding a recipe given a picture, and vice versa). Our approach is validated on the Recipe1M dataset, c… ▽ More Recent advances in the machine learning community allowed different use cases to emerge, as its association to domains like cooking which created the computational cuisine. In this paper, we tackle the picture-recipe alignment problem, having as target application the large-scale retrieval task (finding a recipe given a picture, and vice versa). Our approach is validated on the Recipe1M dataset, composed of one million image-recipe pairs and additional class information, for which we achieve state-of-the-art results. △ Less

Submitted 2 May, 2018; originally announced May 2018.

Comments: Published at DECOR / ICDE 2018. Extended version accepted at SIGIR 2018, available here: arXiv:1804.11146

arXiv:1804.11146 [pdf, other]

Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings

Authors: Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, Matthieu Cord

Abstract: Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an ef… ▽ More Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an effective learning scheme, capable of tackling large-scale problems, and validate it on the Recipe1M dataset containing nearly 1 million picture-recipe pairs. We show the effectiveness of our approach regarding previous state-of-the-art models and present qualitative results over computational cooking use cases. △ Less

Submitted 30 April, 2018; originally announced April 2018.

Comments: accepted at the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018

arXiv:1804.01852

GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange

Authors: Michael Blot, David Picard, Matthieu Cord

Abstract: We address the issue of speeding up the training of convolutional neural networks by studying a distributed method adapted to stochastic gradient descent. Our parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way of sharing information between different threads based on gossip algorithms that show good consensus conve… ▽ More We address the issue of speeding up the training of convolutional neural networks by studying a distributed method adapted to stochastic gradient descent. Our parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way of sharing information between different threads based on gossip algorithms that show good consensus convergence properties. Our method called GoSGD has the advantage to be fully asynchronous and decentralized. △ Less

Submitted 12 November, 2018; v1 submitted 4 April, 2018; originally announced April 2018.

Comments: Correction to do, and difficulties to change the document

arXiv:1802.09232 [pdf, other]

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

Authors: Diogo C. Luvizon, David Picard, Hedi Tabia

Abstract: Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still images and human action recognition from video sequences. We show that a single architecture can be used to solve the two problems in an efficient way and still a… ▽ More Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature. In this work, we propose a multitask framework for jointly 2D and 3D pose estimation from still images and human action recognition from video sequences. We show that a single architecture can be used to solve the two problems in an efficient way and still achieves state-of-the-art results. Additionally, we demonstrate that optimization from end-to-end leads to significantly higher accuracy than separated learning. The proposed architecture can be trained with data from different categories simultaneously in a seamlessly way. The reported results on four datasets (MPII, Human3.6M, Penn Action and NTU) demonstrate the effectiveness of our method on the targeted tasks. △ Less

Submitted 21 March, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

Comments: To appear in CVPR 2018

arXiv:1802.07617 [pdf, other]

Convergence rates for smooth k-means change-point detection

Authors: Aurélie Fischer, Dominique Picard

Abstract: In this paper, we consider the estimation of a change-point for possibly high-dimensional data in a Gaussian model, using a k-means method. We prove that, up to a logarithmic term, this change-point estimator has a minimax rate of convergence. Then, considering the case of sparse data, with a Sobolev regularity, we propose a smoothing procedure based on Lepski's method and show that the resulting… ▽ More In this paper, we consider the estimation of a change-point for possibly high-dimensional data in a Gaussian model, using a k-means method. We prove that, up to a logarithmic term, this change-point estimator has a minimax rate of convergence. Then, considering the case of sparse data, with a Sobolev regularity, we propose a smoothing procedure based on Lepski's method and show that the resulting estimator attains the optimal rate of convergence. Our results are illustrated by some simulations. As the theoretical statement relying on Lepski's method depends on some unknown constant, practical strategies are suggested to perform an optimal smoothing. △ Less

Submitted 21 February, 2018; originally announced February 2018.

arXiv:1710.02322 [pdf, other]

Human Pose Regression by Combining Indirect Part Detection and Contextual Information

Authors: Diogo C. Luvizon, Hedi Tabia, David Picard

Abstract: In this paper, we propose an end-to-end trainable regression approach for human pose estimation from still images. We use the proposed Soft-argmax function to convert feature maps directly to joint coordinates, resulting in a fully differentiable framework. Our method is able to learn heat maps representations indirectly, without additional steps of artificial ground truth generation. Consequently… ▽ More In this paper, we propose an end-to-end trainable regression approach for human pose estimation from still images. We use the proposed Soft-argmax function to convert feature maps directly to joint coordinates, resulting in a fully differentiable framework. Our method is able to learn heat maps representations indirectly, without additional steps of artificial ground truth generation. Consequently, contextual information can be included to the pose predictions in a seamless way. We evaluated our method on two very challenging datasets, the Leeds Sports Poses (LSP) and the MPII Human Pose datasets, reaching the best performance among all the existing regression methods and comparable results to the state-of-the-art detection based approaches. △ Less

Submitted 6 October, 2017; originally announced October 2017.

arXiv:1701.00167 [pdf, ps, other]

Very Fast Kernel SVM under Budget Constraints

Authors: David Picard

Abstract: In this paper we propose a fast online Kernel SVM algorithm under tight budget constraints. We propose to split the input space using LVQ and train a Kernel SVM in each cluster. To allow for online training, we propose to limit the size of the support vector set of each cluster using different strategies. We show in the experiment that our algorithm is able to achieve high accuracy while having a… ▽ More In this paper we propose a fast online Kernel SVM algorithm under tight budget constraints. We propose to split the input space using LVQ and train a Kernel SVM in each cluster. To allow for online training, we propose to limit the size of the support vector set of each cluster using different strategies. We show in the experiment that our algorithm is able to achieve high accuracy while having a very high number of samples processed per second both in training and in the evaluation. △ Less

Submitted 31 December, 2016; originally announced January 2017.

arXiv:1611.09726 [pdf, other]

Gossip training for deep learning

Authors: Michael Blot, David Picard, Matthieu Cord, Nicolas Thome

Abstract: We address the issue of speeding up the training of convolutional networks. Here we study a distributed method adapted to stochastic gradient descent (SGD). The parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way to share information between different threads inspired by gossip algorithms and showing good consensus… ▽ More We address the issue of speeding up the training of convolutional networks. Here we study a distributed method adapted to stochastic gradient descent (SGD). The parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way to share information between different threads inspired by gossip algorithms and showing good consensus convergence properties. Our method called GoSGD has the advantage to be fully asynchronous and decentralized. We compared our method to the recent EASGD in \cite{elastic} on CIFAR-10 show encouraging results. △ Less

Submitted 29 November, 2016; originally announced November 2016.

arXiv:1610.01000 [pdf, other]

doi 10.1002/we.2139

Statistical learning for wind power : a modeling and stability study towards forecasting

Authors: Aurélie Fischer, Lucie Montuelle, Mathilde Mougeot, Dominique Picard

Abstract: We focus on wind power modeling using machine learning techniques. We show on real data provided by the wind energy company Ma{ï}a Eolis, that parametric models, even following closely the physical equation relating wind production to wind speed are outperformed by intelligent learning algorithms. In particular, the CART-Bagging algorithm gives very stable and promising results. Besides, as a step… ▽ More We focus on wind power modeling using machine learning techniques. We show on real data provided by the wind energy company Ma{ï}a Eolis, that parametric models, even following closely the physical equation relating wind production to wind speed are outperformed by intelligent learning algorithms. In particular, the CART-Bagging algorithm gives very stable and promising results. Besides, as a step towards forecast, we quantify the impact of using deteriorated wind measures on the performances. We show also on this application that the default methodology to select a subset of predictors provided in the standard random forest package can be refined, especially when there exists among the predictors one variable which has a major impact. △ Less

Submitted 12 January, 2018; v1 submitted 4 October, 2016; originally announced October 2016.

Journal ref: Wind Energy, Wiley, 2017, 20 (12), pp.2037 - 2047

arXiv:1603.08257 [pdf]

doi 10.1088/0004-637X/812/2/106

The C(3P) + NH3 reaction in interstellar chemistry: I. Investigation of the product formation channels

Authors: Jeremy Bourgalais, Michael Capron, Ranjith Kumar Abhinavam Kailasanathan, David L. Osborn, Kevin M. Hickson, Jean-Christophe Loison, Valentine Wakelam, Fabien Goulay, Sébastien D. Le Picard

Abstract: The product formation channels of ground state carbon atoms, C(3P), reacting with ammonia, NH3, have been investigated using two complementary experiments and electronic structure calculations. Reaction products are detected in a gas flow tube experiment (330 K, 4 Torr) using tunable VUV photoionization coupled with time of flight mass spectrometry. Temporal profiles of the species formed and phot… ▽ More The product formation channels of ground state carbon atoms, C(3P), reacting with ammonia, NH3, have been investigated using two complementary experiments and electronic structure calculations. Reaction products are detected in a gas flow tube experiment (330 K, 4 Torr) using tunable VUV photoionization coupled with time of flight mass spectrometry. Temporal profiles of the species formed and photoionization spectra are used to identify primary products of the C + NH3 reaction. In addition, H-atom formation is monitored by VUV laser induced fluorescence from room temperature to 50 K in a supersonic gas flow generated by the Laval nozzle technique. Electronic structure calculations are performed to derive intermediates, transition states and complexes formed along the reaction coordinate. The combination of photoionization and laser induced fluorescence experiments supported by theoretical calculations indicate that in the temperature and pressure range investigated, the H + H2CN production channel represents 100% of the product yield for this reaction. Kinetics measurements of the title reaction down to 50 K and the effect of the new rate constants on interstellar nitrogen hydride abundances using a model of dense interstellar clouds are reported in paper II. △ Less

Submitted 27 March, 2016; originally announced March 2016.

Showing 1–50 of 74 results for author: Picard, D