Skip to main content

Showing 1–49 of 49 results for author: Thome, N

.
  1. Properties of core-EP matrices and binary relationships

    Authors: Ehsan Kheirandish, Abbas Salemi, Néstor Thome

    Abstract: In this paper, various properties of core-EP matrices are investigated. We introduce the MPDMP matrix associated with $A$ and by means of it, some properties and equivalent conditions of core-EP matrices can be obtained. Also, properties of MPD, DMP, and CMP inverses are studied and we prove that in the class of core-EP matrices, DMP, MPD, and Drazin inverses are the same. Moreover, DMP and MPD bi… ▽ More

    Submitted 6 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 20 pages

    MSC Class: 15A09; 15A45

  2. arXiv:2407.02217  [pdf, other

    cs.LG cs.AI

    Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning

    Authors: Zakariae El Asri, Olivier Sigaud, Nicolas Thome

    Abstract: Applying reinforcement learning (RL) to real-world applications requires addressing a trade-off between asymptotic performance, sample efficiency, and inference time. In this work, we demonstrate how to address this triple challenge by leveraging partial physical knowledge about the system dynamics. Our approach involves learning a physics-informed model to boost sample efficiency and generating i… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2407.01400  [pdf, other

    cs.CV

    GalLoP: Learning Global and Local Prompts for Vision-Language Models

    Authors: Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Audebert, Nicolas Thome

    Abstract: Prompt learning has been widely adopted to efficiently adapt vision-language models (VLMs), e.g. CLIP, for few-shot image classification. Despite their success, most prompt learning methods trade-off between classification accuracy and robustness, e.g. in domain generalization or out-of-distribution (OOD) detection. In this work, we introduce Global-Local Prompts (GalLoP), a new prompt learning me… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: To be published at ECCV 2024

  4. arXiv:2406.02842  [pdf, other

    cs.CV

    Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features

    Authors: Paul Couairon, Mustafa Shukor, Jean-Emmanuel Haugeard, Matthieu Cord, Nicolas Thome

    Abstract: Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks. While prior works have addressed unsupervised image segmentation, they significantly lag behind supervised models. In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method that solely harn… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2403.14201  [pdf, ps, other

    math.RA

    Parametrizing $W$-weighted BT inverse to obtain the $W$-weighted $q$-BT inverse

    Authors: D. E. Ferreyra, N. Thome, C. Torigino

    Abstract: The core-EP and BT inverses for rectangular matrices were studied recently in the literature. The main aim of this paper is to unify both concepts by means of a new kind of generalized inverse called $W$-weighted $q$-BT inverse. We analyze its existence and uniqueness by considering an adequate matrix system. Basic properties and some interesting characterizations are proved for this new weighted… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    MSC Class: 15A09; 15A24

  6. arXiv:2403.10403  [pdf, other

    cs.CV cs.AI cs.LG

    Energy Correction Model in the Feature Space for Out-of-Distribution Detection

    Authors: Marc Lafon, Clément Rambour, Nicolas Thome

    Abstract: In this work, we study the out-of-distribution (OOD) detection problem through the use of the feature space of a pre-trained deep classifier. We show that learning the density of in-distribution (ID) features with an energy-based models (EBM) leads to competitive detection results. However, we found that the non-mixing of MCMC sampling during the EBM's training undermines its detection performance… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: NeurIPS ML Safety Workshop (2022)

  7. G-Drazin inverse combined with inner inverse

    Authors: G. Maharanaa, J. K. Sahooa, Nestor Thome

    Abstract: This paper introduces new classes of generalized inverses for square matrices named GD1, and the dual, called 1GD inverse. In addition, we discuss a few characterizations and representations of these inverses. The explicit expressions of these inverses have been established via core-nilpotent decomposition. Further, we introduce a binary relation for GD1 inverse and 1GD inverse, along with a few d… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 16 pages, Linear and Multilinear Algebra (2024)

  8. arXiv:2401.13121  [pdf, other

    math.OC

    Procrustes problem for the inverse eigenvalue problem of normal (skew) $J$-Hamiltonian matrices and normal $J$-symplectic matrices

    Authors: S. Gigola, L. Lebtahi, N. Thome

    Abstract: A square complex matrix $A$ is called (skew) $J$-Hamiltonian if $AJ$ is (skew) hermitian where $J$ is a real normal matrix such that $J^2=-I$, where $I$ is the identity matrix. In this paper, we solve the Procrustes problem to find normal (skew) $J$-Hamiltonian solutions for the inverse eigenvalue problem. In addition, a similar problem is investigated for normal $J$-symplectic matrices.

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 25 pages

  9. arXiv:2401.09106  [pdf, ps, other

    math.RA

    Extending EP matrices by means of recent generalized inverses

    Authors: D. E. Ferreyra, F. E. Levis, A. N. Priori, N. Thome

    Abstract: It is well known that a square complex matrix is called EP if it commutes with its Moore-Penrose inverse. In this paper, new classes of matrices which extend this concept are characterized. For that, we consider commutative equalities given by matrices of arbitrary index and generalized inverses recently investigated in the literature. More specifically, these classes are characterized by expressi… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    MSC Class: 15A09; 15A27

  10. arXiv:2310.12646  [pdf, other

    eess.IV cs.CV

    TRUSTED: The Paired 3D Transabdominal Ultrasound and CT Human Data for Kidney Segmentation and Registration Research

    Authors: William Ndzimbong, Cyril Fourniol, Loic Themyr, Nicolas Thome, Yvonne Keeza, Beniot Sauer, Pierre-Thierry Piechaud, Arnaud Mejean, Jacques Marescaux, Daniel George, Didier Mutter, Alexandre Hostettler, Toby Collins

    Abstract: Inter-modal image registration (IMIR) and image segmentation with abdominal Ultrasound (US) data has many important clinical applications, including image-guided surgery, automatic organ measurement and robotic navigation. However, research is severely limited by the lack of public datasets. We propose TRUSTED (the Tridimensional Renal Ultra Sound TomodEnsitometrie Dataset), comprising paired tran… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Alexandre Hostettler, and Toby Collins share last authorship

  11. arXiv:2309.08250  [pdf, other

    cs.CV

    Optimization of Rank Losses for Image Retrieval

    Authors: Elias Ramzi, Nicolas Audebert, Clément Rambour, André Araujo, Xavier Bitot, Nicolas Thome

    Abstract: In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomp… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: text overlap with arXiv:2207.04873

  12. arXiv:2307.06795  [pdf, other

    cs.CV

    Leveraging Vision-Language Foundation Models for Fine-Grained Downstream Tasks

    Authors: Denis Coquenet, Clément Rambour, Emanuele Dalsasso, Nicolas Thome

    Abstract: Vision-language foundation models such as CLIP have shown impressive zero-shot performance on many tasks and datasets, especially thanks to their free-text inputs. However, they struggle to handle some downstream tasks, such as fine-grained attribute detection and localization. In this paper, we propose a multitask fine-tuning strategy based on a positive/negative prompt formulation to further lev… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  13. arXiv:2306.08707  [pdf, other

    cs.CV

    VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing

    Authors: Paul Couairon, Clément Rambour, Jean-Emmanuel Haugeard, Nicolas Thome

    Abstract: Recently, diffusion-based generative models have achieved remarkable success for image generation and edition. However, existing diffusion-based video editing approaches lack the ability to offer precise control over generated content that maintains temporal consistency in long-term videos. On the other hand, atlas-based methods provide strong temporal consistency but are costly to edit a video an… ▽ More

    Submitted 2 April, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: TMLR 2024. Project web-page at https://videdit.github.io

  14. arXiv:2305.16966  [pdf, other

    cs.CV

    Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection

    Authors: Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Thome

    Abstract: Out-of-distribution (OOD) detection is a critical requirement for the deployment of deep neural networks. This paper introduces the HEAT model, a new post-hoc OOD detection method estimating the density of in-distribution (ID) samples using hybrid energy-based models (EBM) in the feature space of a pre-trained backbone. HEAT complements prior density estimators of the ID density, e.g. parametric m… ▽ More

    Submitted 1 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Journal ref: International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA

  15. arXiv:2302.10803  [pdf, other

    cs.LG cs.AI physics.flu-dyn

    Eagle: Large-Scale Learning of Turbulent Fluid Dynamics with Mesh Transformers

    Authors: Steeven Janny, Aurélien Béneteau, Madiha Nadri, Julie Digne, Nicolas Thome, Christian Wolf

    Abstract: Estimating fluid dynamics is classically done through the simulation and integration of numerical models solving the Navier-Stokes equations, which is computationally complex and time-consuming even on high-end hardware. This is a notoriously hard problem to solve, which has recently been addressed with machine learning, in particular graph neural networks (GNN) and variants trained and evaluated… ▽ More

    Submitted 17 March, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Published as a conference paper at ICLR 2023

    Journal ref: International Conference on Learning Representation (ICLR) 2023

  16. arXiv:2302.03462  [pdf, other

    cs.LG

    Diverse Probabilistic Trajectory Forecasting with Admissibility Constraints

    Authors: Laura Calem, Hedi Ben-Younes, Patrick Pérez, Nicolas Thome

    Abstract: Predicting multiple trajectories for road users is important for automated driving systems: ego-vehicle motion planning indeed requires a clear view of the possible motions of the surrounding agents. However, the generative models used for multiple-trajectory forecasting suffer from a lack of diversity in their proposals. To avoid this form of collapse, we propose a novel method for structured pre… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Journal ref: International Conference on Pattern Recognition (ICPR) 2022

  17. arXiv:2212.07890  [pdf, other

    cs.CV

    Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

    Authors: Loic Themyr, Clement Rambour, Nicolas Thome, Toby Collins, Alexandre Hostettler

    Abstract: Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in high-resolution feature maps. This paper extends the notion of globa… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Winter Conference on Applications of Computer Vision (WACV 2023)

    MSC Class: 68T45

  18. arXiv:2212.04267  [pdf, other

    cs.CV cs.LG

    Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval

    Authors: Mustafa Shukor, Nicolas Thome, Matthieu Cord

    Abstract: Vision-Language Pretraining (VLP) and Foundation models have been the go-to recipe for achieving SoTA performance on general benchmarks. However, leveraging these powerful techniques for more complex vision-language tasks, such as cooking applications, with more structured input data, is still little investigated. In this work, we propose to leverage these techniques for structured-text based comp… ▽ More

    Submitted 15 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Code: https://github.com/mshukor/VLPCook

  19. arXiv:2210.05313  [pdf, other

    cs.CV

    Memory transformers for full context and high-resolution 3D Medical Segmentation

    Authors: Loic Themyr, Clément Rambour, Nicolas Thome, Toby Collins, Alexandre Hostettler

    Abstract: Transformer models achieve state-of-the-art results for image segmentation. However, achieving long-range attention, necessary to capture global context, with high-resolution 3D images is a fundamental challenge. This paper introduces the Full resolutIoN mEmory (FINE) transformer to overcome this issue. The core idea behind FINE is to learn memory tokens to indirectly model full range interactions… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    MSC Class: 68T45

  20. arXiv:2208.12625  [pdf, other

    cs.LG cs.CV

    Take One Gram of Neural Features, Get Enhanced Group Robustness

    Authors: Simon Roburin, Charles Corbière, Gilles Puy, Nicolas Thome, Matthieu Aubry, Renaud Marlet, Patrick Pérez

    Abstract: Predictive performance of machine learning models trained with empirical risk minimization (ERM) can degrade considerably under distribution shifts. The presence of spurious correlations in training datasets leads ERM-trained models to display high loss when evaluated on minority groups not presenting such correlations. Extensive attempts have been made to develop methods improving worst-group rob… ▽ More

    Submitted 7 February, 2023; v1 submitted 26 August, 2022; originally announced August 2022.

    Comments: Long version (Previous version: OOD-CV Workshop @ ECCV 2022)

  21. arXiv:2207.04873  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Hierarchical Average Precision Training for Pertinent Image Retrieval

    Authors: Elias Ramzi, Nicolas Audebert, Nicolas Thome, Clément Rambour, Xavier Bitot

    Abstract: Image Retrieval is commonly evaluated with Average Precision (AP) or Recall@k. Yet, those metrics, are limited to binary labels and do not take into account errors' severity. This paper introduces a new hierarchical AP training method for pertinent image retrieval (HAP-PIER). HAPPIER is based on a new H-AP metric, which leverages a concept hierarchy to refine AP by integrating errors' importance a… ▽ More

    Submitted 22 July, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

    Journal ref: ECCV 2022, Oct 2022, Tel-Aviv, Israel

  22. arXiv:2207.03790  [pdf, other

    cs.CV stat.ML

    Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction

    Authors: Vincent Le Guen, Clément Rambour, Nicolas Thome

    Abstract: State-of-the-art methods for optical flow estimation rely on deep learning, which require complex sequential training schemes to reach optimal performances on real-world data. In this work, we introduce the COMBO deep network that explicitly exploits the brightness constancy (BC) model used in traditional methods. Since BC is an approximate physical model violated in several situations, we propose… ▽ More

    Submitted 12 July, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

  23. arXiv:2205.10158  [pdf, other

    cs.CV cs.LG

    Swap** Semantic Contents for Mixing Images

    Authors: Rémy Sun, Clément Masson, Gilles Hénaff, Nicolas Thome, Matthieu Cord

    Abstract: Deep architecture have proven capable of solving many tasks provided a sufficient amount of labeled data. In fact, the amount of available labeled data has become the principal bottleneck in low label settings such as Semi-Supervised Learning. Mixing Data Augmentations do not typically yield new labeled samples, as indiscriminately mixing contents creates between-class samples. In this work, we in… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: Accepted at ICPR 2022, 7 pages, 4 figures, 6 tables

  24. arXiv:2205.10139  [pdf, other

    cs.LG

    Towards efficient feature sharing in MIMO architectures

    Authors: Rémy Sun, Alexandre Ramé, Clément Masson, Nicolas Thome, Matthieu Cord

    Abstract: Multi-input multi-output architectures propose to train multiple subnetworks within one base network and then average the subnetwork predictions to benefit from ensembling for free. Despite some relative success, these architectures are wasteful in their use of parameters. Indeed, we highlight in this paper that the learned subnetwork fail to share even generic features which limits their applicab… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: 7 pages, 6 figures, 1 table

  25. arXiv:2110.01445  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Robust and Decomposable Average Precision for Image Retrieval

    Authors: Elias Ramzi, Nicolas Thome, Clément Rambour, Nicolas Audebert, Xavier Bitot

    Abstract: In image retrieval, standard evaluation metrics rely on score ranking, e.g. average precision (AP). In this paper, we introduce a method for robust and decomposable average precision (ROADMAP) addressing two major challenges for end-to-end training of deep neural networks with AP: non-differentiability and non-decomposability. Firstly, we propose a new differentiable approximation of the rank func… ▽ More

    Submitted 8 December, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Journal ref: Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021), Dec 2021, Sydney, Australia

  26. arXiv:2104.04610  [pdf, other

    stat.ML cs.AI cs.LG

    Deep Time Series Forecasting with Shape and Temporal Criteria

    Authors: Vincent Le Guen, Nicolas Thome

    Abstract: This paper addresses the problem of multi-step time series forecasting for non-stationary signals that can present sudden changes. Current state-of-the-art deep learning forecasting methods, often trained with variants of the MSE, lack the ability to provide sharp predictions in deterministic and probabilistic contexts. To handle these challenges, we propose to incorporate shape and temporal crite… ▽ More

    Submitted 17 February, 2022; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:2010.07349

  27. arXiv:2103.06104  [pdf, other

    eess.IV cs.CV

    U-Net Transformer: Self and Cross Attention for Medical Image Segmentation

    Authors: Olivier Petit, Nicolas Thome, Clément Rambour, Luc Soler

    Abstract: Medical image segmentation remains particularly challenging for complex and low-contrast anatomical structures. In this paper, we introduce the U-Transformer network, which combines a U-shaped architecture for image segmentation with self- and cross-attention from Transformers. U-Transformer overcomes the inability of U-Nets to model long-range contextual interactions and spatial dependencies, whi… ▽ More

    Submitted 12 March, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

  28. arXiv:2012.06508  [pdf, other

    cs.CV cs.LG stat.ML

    Confidence Estimation via Auxiliary Models

    Authors: Charles Corbière, Nicolas Thome, Antoine Saporta, Tuan-Hung Vu, Matthieu Cord, Patrick Pérez

    Abstract: Reliably quantifying the confidence of deep neural classifiers is a challenging yet fundamental requirement for deploying such models in safety-critical applications. In this paper, we introduce a novel target criterion for model confidence, namely the true class probability (TCP). We show that TCP offers better properties for confidence estimation than standard maximum class probability (MCP). Si… ▽ More

    Submitted 31 May, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

    Comments: Accepted to TPAMI 2021

  29. arXiv:2010.07349  [pdf, other

    stat.ML cs.AI cs.LG

    Probabilistic Time Series Forecasting with Structured Shape and Temporal Diversity

    Authors: Vincent Le Guen, Nicolas Thome

    Abstract: Probabilistic forecasting consists in predicting a distribution of possible future outcomes. In this paper, we address this problem for non-stationary time series, which is very challenging yet crucially important. We introduce the STRIPE model for representing structured diversity based on shape and time features, ensuring both probable predictions while being sharp and accurate. STRIPE is agnost… ▽ More

    Submitted 10 April, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

  30. arXiv:2010.04456  [pdf, other

    stat.ML cs.AI cs.CV cs.LG

    Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting

    Authors: Yuan Yin, Vincent Le Guen, Jérémie Dona, Emmanuel de Bézenac, Ibrahim Ayed, Nicolas Thome, Patrick Gallinari

    Abstract: Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framewor… ▽ More

    Submitted 10 May, 2022; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted at ICLR 2021 (Oral)

    Journal ref: J. Stat. Mech. (2021) 124012

  31. arXiv:2003.01460  [pdf, other

    cs.CV

    Disentangling Physical Dynamics from Unknown Factors for Unsupervised Video Prediction

    Authors: Vincent Le Guen, Nicolas Thome

    Abstract: Leveraging physical knowledge described by partial differential equations (PDEs) is an appealing way to improve unsupervised video prediction methods. Since physics is too restrictive for describing the full visual content of generic videos, we introduce PhyDNet, a two-branch deep architecture, which explicitly disentangles PDE dynamics from unknown complementary information. A second contribution… ▽ More

    Submitted 16 March, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Report number: CVPR 2020

  32. arXiv:1910.04851  [pdf, other

    cs.CV cs.LG stat.ML

    Addressing Failure Prediction by Learning Model Confidence

    Authors: Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, Patrick Pérez

    Abstract: Assessing reliably the confidence of a deep neural network and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addi… ▽ More

    Submitted 26 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: NeurIPS 2019 (accepted)

  33. arXiv:1909.09020  [pdf, other

    stat.ML cs.LG

    Shape and Time Distortion Loss for Training Deep Time Series Forecasting Models

    Authors: Vincent Le Guen, Nicolas Thome

    Abstract: This paper addresses the problem of time series forecasting for non-stationary signals and multiple future steps prediction. To handle this challenging task, we introduce DILATE (DIstortion Loss including shApe and TimE), a new objective function for training deep neural networks. DILATE aims at accurately predicting sudden changes, and explicitly incorporates two terms supporting precise shape an… ▽ More

    Submitted 10 November, 2019; v1 submitted 19 September, 2019; originally announced September 2019.

  34. arXiv:1909.05397  [pdf, other

    eess.IV

    Multitask Classification and Segmentation for Cancer Diagnosis in Mammography

    Authors: Thi-Lam-Thuy Le, Nicolas Thome, Sylvain Bernard, Vincent Bismuth, Fanny Patoureaux

    Abstract: Annotation cost is a bottleneck for collecting massive data in mammography, especially for training deep neural networks. In this paper, we study the use of heterogeneous levels of annotation granularity to improve predictive performances. More precisely, we introduce a multi-task learning scheme for training convolutional neural network (ConvNets), which combines segmentation and classification,… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: International Conference on Medical Imaging with Deep Learning 2019. MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/r1xDM5DGcV

  35. arXiv:1906.00804  [pdf, other

    cs.CV

    DualDis: Dual-Branch Disentangling with Adversarial Learning

    Authors: Thomas Robert, Nicolas Thome, Matthieu Cord

    Abstract: In computer vision, disentangling techniques aim at improving latent representations of images by modeling factors of variation. In this paper, we propose DualDis, a new auto-encoder-based framework that disentangles and linearizes class and attribute information. This is achieved thanks to a two-branch architecture forcing the separation of the two kinds of information, accompanied by a decoder f… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

  36. arXiv:1902.09487  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MUREL: Multimodal Relational Reasoning for Visual Question Answering

    Authors: Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome

    Abstract: Multimodal attentional networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images. Although attention allows to focus on the visual content relevant to the question, this simple mechanism is arguably insufficient to model complex reasoning features required for VQA or other high-level tasks. In this paper, we propose MuRel, a multimodal relatio… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: CVPR2019 accepted paper

  37. arXiv:1902.00038  [pdf, other

    cs.CV

    BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

    Authors: Hedi Ben-younes, Rémi Cadene, Nicolas Thome, Matthieu Cord

    Abstract: Multimodal representation learning is gaining more and more interest within the deep learning community. While bilinear models provide an interesting framework to find subtle combination of modalities, their number of parameters grows quadratically with the input dimensions, making their practical implementation within classical deep learning pipelines challenging. In this paper, we introduce BLOC… ▽ More

    Submitted 12 February, 2019; v1 submitted 31 January, 2019; originally announced February 2019.

  38. arXiv:1807.11407  [pdf, other

    cs.LG stat.ML

    HybridNet: Classification and Reconstruction Cooperation for Semi-Supervised Learning

    Authors: Thomas Robert, Nicolas Thome, Matthieu Cord

    Abstract: In this paper, we introduce a new model for leveraging unlabeled data to improve generalization performances of image classifiers: a two-branch encoder-decoder architecture called HybridNet. The first branch receives supervision signal and is dedicated to the extraction of invariant class-related representations. The second branch is fully unsupervised and dedicated to model information discarded… ▽ More

    Submitted 30 July, 2018; originally announced July 2018.

    Comments: Accepted at ECCV 2018

  39. arXiv:1805.05814  [pdf, other

    stat.ML cs.LG

    SHADE: Information-Based Regularization for Deep Learning

    Authors: Michael Blot, Thomas Robert, Nicolas Thome, Matthieu Cord

    Abstract: Regularization is a big issue for training deep neural networks. In this paper, we propose a new information-theory-based regularization scheme named SHADE for SHAnnon DEcay. The originality of the approach is to define a prior based on conditional entropy, which explicitly decouples the learning of invariant representations in the regularizer and the learning of correlations between inputs and la… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

    Comments: IEEE International Conference on Image Processing (ICIP) 2018. arXiv admin note: substantial text overlap with arXiv:1804.10988

  40. arXiv:1804.11146  [pdf, other

    cs.CL cs.CV cs.IR

    Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings

    Authors: Micael Carvalho, Rémi Cadène, David Picard, Laure Soulier, Nicolas Thome, Matthieu Cord

    Abstract: Designing powerful tools that support cooking activities has rapidly gained popularity due to the massive amounts of available data, as well as recent advances in machine learning that are capable of analyzing them. In this paper, we propose a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space. We describe an ef… ▽ More

    Submitted 30 April, 2018; originally announced April 2018.

    Comments: accepted at the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018

  41. arXiv:1804.10988  [pdf, other

    stat.ML cs.LG

    SHADE: Information Based Regularization for Deep Learning

    Authors: Michael Blot, Thomas Robert, Nicolas Thome, Matthieu Cord

    Abstract: Regularization is a big issue for training deep neural networks. In this paper, we propose a new information-theory-based regularization scheme named SHADE for SHAnnon DEcay. The originality of the approach is to define a prior based on conditional entropy, which explicitly decouples the learning of invariant representations in the regularizer and the learning of correlations between inputs and la… ▽ More

    Submitted 22 May, 2018; v1 submitted 29 April, 2018; originally announced April 2018.

  42. arXiv:1707.06175  [pdf, other

    cs.CV cs.AI cs.LG

    Deformable Part-based Fully Convolutional Network for Object Detection

    Authors: Taylor Mordan, Nicolas Thome, Matthieu Cord, Gilles Henaff

    Abstract: Existing region-based object detectors are limited to regions with fixed box geometry to represent objects, even if those are highly non-rectangular. In this paper we introduce DP-FCN, a deep model for object detection which explicitly adapts to shapes of objects with deformable parts. Without additional annotations, it learns to focus on discriminative elements and to align them, and simultaneous… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

    Comments: Accepted to BMVC 2017 (oral)

  43. arXiv:1705.06676  [pdf, other

    cs.CV

    MUTAN: Multimodal Tucker Fusion for Visual Question Answering

    Authors: Hedi Ben-younes, Rémi Cadene, Matthieu Cord, Nicolas Thome

    Abstract: Bilinear models provide an appealing framework for mixing and merging information in Visual Question Answering (VQA) tasks. They help to learn high level associations between question meaning and visual concepts in the image, but they suffer from huge dimensionality issues. We introduce MUTAN, a multimodal tensor-based Tucker decomposition to efficiently parametrize bilinear interactions between v… ▽ More

    Submitted 18 May, 2017; originally announced May 2017.

  44. arXiv:1611.09726  [pdf, other

    cs.CV cs.LG stat.ML

    Gossip training for deep learning

    Authors: Michael Blot, David Picard, Matthieu Cord, Nicolas Thome

    Abstract: We address the issue of speeding up the training of convolutional networks. Here we study a distributed method adapted to stochastic gradient descent (SGD). The parallel optimization setup uses several threads, each applying individual gradient descents on a local variable. We propose a new way to share information between different threads inspired by gossip algorithms and showing good consensus… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

  45. arXiv:1610.07882  [pdf, other

    cs.CV

    Maxmin convolutional neural networks for image classification

    Authors: Michael Blot, Matthieu Cord, Nicolas Thome

    Abstract: Convolutional neural networks (CNN) are widely used in computer vision, especially in image classification. However, the way in which information and invariance properties are encoded through in deep CNN architectures is still an open question. In this paper, we propose to modify the standard convo- lutional block of CNN in order to transfer more information layer after layer while kee** some in… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.

  46. arXiv:1610.05567  [pdf, other

    cs.CV

    Master's Thesis : Deep Learning for Visual Recognition

    Authors: Rémi Cadène, Nicolas Thome, Matthieu Cord

    Abstract: The goal of our research is to develop methods advancing automatic visual recognition. In order to predict the unique or multiple labels associated to an image, we study different kind of Deep Neural Networks architectures and methods for supervised features learning. We first draw up a state-of-the-art review of the Convolutional Neural Networks aiming to understand the history behind this family… ▽ More

    Submitted 18 October, 2016; originally announced October 2016.

  47. arXiv:1610.05541  [pdf, other

    cs.CV

    M2CAI Workflow Challenge: Convolutional Neural Networks with Time Smoothing and Hidden Markov Model for Video Frames Classification

    Authors: Rémi Cadène, Thomas Robert, Nicolas Thome, Matthieu Cord

    Abstract: Our approach is among the three best to tackle the M2CAI Workflow challenge. The latter consists in recognizing the operation phase for each frames of endoscopic videos. In this technical report, we compare several classification models and temporal smoothing methods. Our submitted solution is a fine tuned Residual Network-200 on 80% of the training set with temporal smoothing using simple tempora… ▽ More

    Submitted 2 December, 2016; v1 submitted 18 October, 2016; originally announced October 2016.

  48. Deep Neural Networks Under Stress

    Authors: Micael Carvalho, Matthieu Cord, Sandra Avila, Nicolas Thome, Eduardo Valle

    Abstract: In recent years, deep architectures have been used for transfer learning with state-of-the-art performance in many datasets. The properties of their features remain, however, largely unstudied under the transfer perspective. In this work, we present an extensive analysis of the resiliency of feature vectors extracted from deep models, with special focus on the trade-off between performance and com… ▽ More

    Submitted 23 May, 2016; v1 submitted 11 May, 2016; originally announced May 2016.

    Comments: This article corresponds to the accepted version at IEEE ICIP 2016. We will link the DOI as soon as it is available

  49. arXiv:1312.6594  [pdf, other

    cs.CV cs.LG

    Sequentially Generated Instance-Dependent Image Representations for Classification

    Authors: Gabriel Dulac-Arnold, Ludovic Denoyer, Nicolas Thome, Matthieu Cord, Patrick Gallinari

    Abstract: In this paper, we investigate a new framework for image classification that adaptively generates spatial representations. Our strategy is based on a sequential process that learns to explore the different regions of any image in order to infer its category. In particular, the choice of regions is specific to each image, directed by the actual content of previously selected regions.The capacity of… ▽ More

    Submitted 11 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.