-
DP-Net: Learning Discriminative Parts for image recognition
Authors:
Ronan Sicre,
Hanwei Zhang,
Julien Dejasmin,
Chiheb Daaloul,
Stéphane Ayache,
Thierry Artières
Abstract:
This paper presents Discriminative Part Network (DP-Net), a deep architecture with strong interpretation capabilities, which exploits a pretrained Convolutional Neural Network (CNN) combined with a part-based recognition module. This system learns and detects parts in the images that are discriminative among categories, without the need for fine-tuning the CNN, making it more scalable than other p…
▽ More
This paper presents Discriminative Part Network (DP-Net), a deep architecture with strong interpretation capabilities, which exploits a pretrained Convolutional Neural Network (CNN) combined with a part-based recognition module. This system learns and detects parts in the images that are discriminative among categories, without the need for fine-tuning the CNN, making it more scalable than other part-based models. While part-based approaches naturally offer interpretable representations, we propose explanations at image and category levels and introduce specific constraints on the part learning process to make them more discrimative.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A Learning Paradigm for Interpretable Gradients
Authors:
Felipe Torres Figueroa,
Hanwei Zhang,
Ronan Sicre,
Yannis Avrithis,
Stephane Ayache
Abstract:
This paper studies interpretability of convolutional networks by means of saliency maps. Most approaches based on Class Activation Maps (CAM) combine information from fully connected layers and gradient through variants of backpropagation. However, it is well understood that gradients are noisy and alternatives like guided backpropagation have been proposed to obtain better visualization at infere…
▽ More
This paper studies interpretability of convolutional networks by means of saliency maps. Most approaches based on Class Activation Maps (CAM) combine information from fully connected layers and gradient through variants of backpropagation. However, it is well understood that gradients are noisy and alternatives like guided backpropagation have been proposed to obtain better visualization at inference. In this work, we present a novel training approach to improve the quality of gradients for interpretability. In particular, we introduce a regularization loss such that the gradient with respect to the input image obtained by standard backpropagation is similar to the gradient obtained by guided backpropagation. We find that the resulting gradient is qualitatively less noisy and improves quantitatively the interpretability properties of different networks, using several interpretability methods.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
CA-Stream: Attention-based pooling for interpretable image recognition
Authors:
Felipe Torres,
Hanwei Zhang,
Ronan Sicre,
Stéphane Ayache,
Yannis Avrithis
Abstract:
Explanations obtained from transformer-based architectures in the form of raw attention, can be seen as a class-agnostic saliency map. Additionally, attention-based pooling serves as a form of masking the in feature space. Motivated by this observation, we design an attention-based pooling mechanism intended to replace Global Average Pooling (GAP) at inference. This mechanism, called Cross-Attenti…
▽ More
Explanations obtained from transformer-based architectures in the form of raw attention, can be seen as a class-agnostic saliency map. Additionally, attention-based pooling serves as a form of masking the in feature space. Motivated by this observation, we design an attention-based pooling mechanism intended to replace Global Average Pooling (GAP) at inference. This mechanism, called Cross-Attention Stream (CA-Stream), comprises a stream of cross attention blocks interacting with features at different network depths. CA-Stream enhances interpretability in models, while preserving recognition performance.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent
Authors:
Alice Delbosc,
Magalie Ochs,
Nicolas Sabouret,
Brian Ravenet,
Stéphane Ayache
Abstract:
This paper introduces a new model to generate rhythmically relevant non-verbal facial behaviors for virtual agents while they speak. The model demonstrates perceived performance comparable to behaviors directly extracted from the data and replayed on a virtual agent, in terms of synchronization with speech and believability. Interestingly, we found that training the model with two different sets o…
▽ More
This paper introduces a new model to generate rhythmically relevant non-verbal facial behaviors for virtual agents while they speak. The model demonstrates perceived performance comparable to behaviors directly extracted from the data and replayed on a virtual agent, in terms of synchronization with speech and believability. Interestingly, we found that training the model with two different sets of data, instead of one, did not necessarily improve its performance. The expressiveness of the people in the dataset and the shooting conditions are key elements. We also show that employing an adversarial model, in which fabricated fake examples are introduced during the training phase, increases the perception of synchronization with speech. A collection of videos demonstrating the results and code can be accessed at: https://github.com/aldelb/non_verbal_facial_animation.
△ Less
Submitted 15 September, 2023;
originally announced November 2023.
-
Opti-CAM: Optimizing saliency maps for interpretability
Authors:
Hanwei Zhang,
Felipe Torres,
Ronan Sicre,
Yannis Avrithis,
Stephane Ayache
Abstract:
Methods based on class activation maps (CAM) provide a simple mechanism to interpret predictions of convolutional neural networks by using linear combinations of feature maps as saliency maps. By contrast, masking-based methods optimize a saliency map directly in the image space or learn it by training another network on additional data.
In this work we introduce Opti-CAM, combining ideas from C…
▽ More
Methods based on class activation maps (CAM) provide a simple mechanism to interpret predictions of convolutional neural networks by using linear combinations of feature maps as saliency maps. By contrast, masking-based methods optimize a saliency map directly in the image space or learn it by training another network on additional data.
In this work we introduce Opti-CAM, combining ideas from CAM-based and masking-based approaches. Our saliency map is a linear combination of feature maps, where weights are optimized per image such that the logit of the masked image for a given class is maximized. We also fix a fundamental flaw in two of the most common evaluation metrics of attribution methods. On several datasets, Opti-CAM largely outperforms other CAM-based approaches according to the most relevant classification metrics. We provide empirical evidence supporting that localization and classifier interpretability are not necessarily aligned.
△ Less
Submitted 5 April, 2024; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Authors:
Mitja Nikolaus,
Emmanuelle Salin,
Stephane Ayache,
Abdellah Fourtassi,
Benoit Favre
Abstract:
Recent advances in vision-and-language modeling have seen the development of Transformer architectures that achieve remarkable performance on multimodal reasoning tasks. Yet, the exact capabilities of these black-box models are still poorly understood. While much of previous work has focused on studying their ability to learn meaning at the word-level, their ability to track syntactic dependencies…
▽ More
Recent advances in vision-and-language modeling have seen the development of Transformer architectures that achieve remarkable performance on multimodal reasoning tasks. Yet, the exact capabilities of these black-box models are still poorly understood. While much of previous work has focused on studying their ability to learn meaning at the word-level, their ability to track syntactic dependencies between words has received less attention. We take a first step in closing this gap by creating a new multimodal task targeted at evaluating understanding of predicate-noun dependencies in a controlled setup. We evaluate a range of state-of-the-art models and find that their performance on the task varies considerably, with some models performing relatively well and others at chance level. In an effort to explain this variability, our analyses indicate that the quality (and not only sheer quantity) of pretraining data is essential. Additionally, the best performing models leverage fine-grained multimodal pretraining objectives in addition to the standard image-text matching objectives. This study highlights that targeted and controlled evaluations are a crucial step for a precise and rigorous test of the multimodal knowledge of vision-and-language models.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Implicit Regularization with Polynomial Growth in Deep Tensor Factorization
Authors:
Kais Hariz,
Hachem Kadri,
Stéphane Ayache,
Maher Moakher,
Thierry Artières
Abstract:
We study the implicit regularization effects of deep learning in tensor factorization. While implicit regularization in deep matrix and 'shallow' tensor factorization via linear and certain type of non-linear neural networks promotes low-rank solutions with at most quadratic growth, we show that its effect in deep tensor factorization grows polynomially with the depth of the network. This provides…
▽ More
We study the implicit regularization effects of deep learning in tensor factorization. While implicit regularization in deep matrix and 'shallow' tensor factorization via linear and certain type of non-linear neural networks promotes low-rank solutions with at most quadratic growth, we show that its effect in deep tensor factorization grows polynomially with the depth of the network. This provides a remarkably faithful description of the observed experimental behaviour. Using numerical experiments, we demonstrate the benefits of this implicit regularization in yielding a more accurate estimation and better convergence properties.
△ Less
Submitted 25 July, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
ChaLearn Looking at People: Inpainting and Denoising challenges
Authors:
Sergio Escalera,
Marti Soler,
Stephane Ayache,
Umut Guclu,
Jun Wan,
Meysam Madadi,
Xavier Baro,
Hugo Jair Escalante,
Isabelle Guyon
Abstract:
Dealing with incomplete information is a well studied problem in the context of machine learning and computational intelligence. However, in the context of computer vision, the problem has only been studied in specific scenarios (e.g., certain types of occlusions in specific types of images), although it is common to have incomplete information in visual data. This chapter describes the design of…
▽ More
Dealing with incomplete information is a well studied problem in the context of machine learning and computational intelligence. However, in the context of computer vision, the problem has only been studied in specific scenarios (e.g., certain types of occlusions in specific types of images), although it is common to have incomplete information in visual data. This chapter describes the design of an academic competition focusing on inpainting of images and video sequences that was part of the competition program of WCCI2018 and had a satellite event collocated with ECCV2018. The ChaLearn Looking at People Inpainting Challenge aimed at advancing the state of the art on visual inpainting by promoting the development of methods for recovering missing and occluded information from images and video. Three tracks were proposed in which visual inpainting might be helpful but still challenging: human body pose estimation, text overlays removal and fingerprint denoising. This chapter describes the design of the challenge, which includes the release of three novel datasets, and the description of evaluation metrics, baselines and evaluation protocol. The results of the challenge are analyzed and discussed in detail and conclusions derived from this event are outlined.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Implicit Regularization in Deep Tensor Factorization
Authors:
Paolo Milanesi,
Hachem Kadri,
Stéphane Ayache,
Thierry Artières
Abstract:
Attempts of studying implicit regularization associated to gradient descent (GD) have identified matrix completion as a suitable test-bed. Late findings suggest that this phenomenon cannot be phrased as a minimization-norm problem, implying that a paradigm shift is required and that dynamics has to be taken into account. In the present work we address the more general setup of tensor completion by…
▽ More
Attempts of studying implicit regularization associated to gradient descent (GD) have identified matrix completion as a suitable test-bed. Late findings suggest that this phenomenon cannot be phrased as a minimization-norm problem, implying that a paradigm shift is required and that dynamics has to be taken into account. In the present work we address the more general setup of tensor completion by leveraging two popularized tensor factorization, namely Tucker and TensorTrain (TT). We track relevant quantities such as tensor nuclear norm, effective rank, generalized singular values and we introduce deep Tucker and TT unconstrained factorization to deal with the completion task. Experiments on both synthetic and real data show that gradient descent promotes solution with low-rank, and validate the conjecture saying that the phenomenon has to be addressed from a dynamical perspective.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Distillation of Weighted Automata from Recurrent Neural Networks using a Spectral Approach
Authors:
Remi Eyraud,
Stephane Ayache
Abstract:
This paper is an attempt to bridge the gap between deep learning and grammatical inference. Indeed, it provides an algorithm to extract a (stochastic) formal language from any recurrent neural network trained for language modelling. In detail, the algorithm uses the already trained network as an oracle -- and thus does not require the access to the inner representation of the black-box -- and appl…
▽ More
This paper is an attempt to bridge the gap between deep learning and grammatical inference. Indeed, it provides an algorithm to extract a (stochastic) formal language from any recurrent neural network trained for language modelling. In detail, the algorithm uses the already trained network as an oracle -- and thus does not require the access to the inner representation of the black-box -- and applies a spectral approach to infer a weighted automaton.
As weighted automata compute linear functions, they are computationally more efficient than neural networks and thus the nature of the approach is the one of knowledge distillation. We detail experiments on 62 data sets (both synthetic and from real-world applications) that allow an in-depth study of the abilities of the proposed algorithm. The results show the WA we extract are good approximations of the RNN, validating the approach. Moreover, we show how the process provides interesting insights toward the behavior of RNN learned on data, enlarging the scope of this work to the one of explainability of deep learning models.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
An AI-powered blood test to detect cancer using nanoDSF
Authors:
Philipp O. Tsvetkov,
Rémi Eyraud,
Stéphane Ayache,
Anton A. Bougaev,
Soazig Malesinski,
Hamed Benazha,
Svetlana Gorokhova,
Christophe Buffat,
Caroline Dehais,
Marc Sanson,
Franck Bielle,
Dominique Figarella-Branger,
Olivier Chinot,
Emeline Tabouret,
François Devred
Abstract:
We describe a novel cancer diagnostic method based on plasma denaturation profiles obtained by a non-conventional use of Differential Scanning Fluorimetry. We show that 84 glioma patients and 63 healthy controls can be automatically classified using denaturation profiles with the help of machine learning algorithms with 92% accuracy. Proposed high throughput workflow can be applied to any type of…
▽ More
We describe a novel cancer diagnostic method based on plasma denaturation profiles obtained by a non-conventional use of Differential Scanning Fluorimetry. We show that 84 glioma patients and 63 healthy controls can be automatically classified using denaturation profiles with the help of machine learning algorithms with 92% accuracy. Proposed high throughput workflow can be applied to any type of cancer and could become a powerful pan-cancer diagnostic and monitoring tool from a simple blood test.
△ Less
Submitted 8 August, 2020;
originally announced August 2020.
-
Partial Trace Regression and Low-Rank Kraus Decomposition
Authors:
Hachem Kadri,
Stéphane Ayache,
Riikka Huusari,
Alain Rakotomamonjy,
Liva Ralaivola
Abstract:
The trace regression model, a direct extension of the well-studied linear regression model, allows one to map matrices to real-valued outputs. We here introduce an even more general model, namely the partial-trace regression model, a family of linear map**s from matrix-valued inputs to matrix-valued outputs; this model subsumes the trace regression model and thus the linear regression model. Bor…
▽ More
The trace regression model, a direct extension of the well-studied linear regression model, allows one to map matrices to real-valued outputs. We here introduce an even more general model, namely the partial-trace regression model, a family of linear map**s from matrix-valued inputs to matrix-valued outputs; this model subsumes the trace regression model and thus the linear regression model. Borrowing tools from quantum information theory, where partial trace operators have been extensively studied, we propose a framework for learning partial trace regression models from data by taking advantage of the so-called low-rank Kraus representation of completely positive maps. We show the relevance of our framework with synthetic and real-world experiments conducted for both i) matrix-to-matrix regression and ii) positive semidefinite matrix completion, two tasks which can be formulated as partial trace regression problems.
△ Less
Submitted 25 August, 2020; v1 submitted 2 July, 2020;
originally announced July 2020.
-
Map** individual differences in cortical architecture using multi-view representation learning
Authors:
Akrem Sellami,
François-Xavier Dupé,
Bastien Cagna,
Hachem Kadri,
Stéphane Ayache,
Thierry Artières,
Sylvain Takerkart
Abstract:
In neuroscience, understanding inter-individual differences has recently emerged as a major challenge, for which functional magnetic resonance imaging (fMRI) has proven invaluable. For this, neuroscientists rely on basic methods such as univariate linear correlations between single brain features and a score that quantifies either the severity of a disease or the subject's performance in a cogniti…
▽ More
In neuroscience, understanding inter-individual differences has recently emerged as a major challenge, for which functional magnetic resonance imaging (fMRI) has proven invaluable. For this, neuroscientists rely on basic methods such as univariate linear correlations between single brain features and a score that quantifies either the severity of a disease or the subject's performance in a cognitive task. However, to this date, task-fMRI and resting-state fMRI have been exploited separately for this question, because of the lack of methods to effectively combine them. In this paper, we introduce a novel machine learning method which allows combining the activation-and connectivity-based information respectively measured through these two fMRI protocols to identify markers of individual differences in the functional organization of the brain. It combines a multi-view deep autoencoder which is designed to fuse the two fMRI modalities into a joint representation space within which a predictive model is trained to guess a scalar score that characterizes the patient. Our experimental results demonstrate the ability of the proposed method to outperform competitive approaches and to produce interpretable and biologically plausible results.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Deep Networks with Adaptive Nyström Approximation
Authors:
Luc Giffon,
Stéphane Ayache,
Thierry Artières,
Hachem Kadri
Abstract:
Recent work has focused on combining kernel methods and deep learning to exploit the best of the two approaches. Here, we introduce a new architecture of neural networks in which we replace the top dense layers of standard convolutional architectures with an approximation of a kernel function by relying on the Nystr{ö}m approximation. Our approach is easy and highly flexible. It is compatible with…
▽ More
Recent work has focused on combining kernel methods and deep learning to exploit the best of the two approaches. Here, we introduce a new architecture of neural networks in which we replace the top dense layers of standard convolutional architectures with an approximation of a kernel function by relying on the Nystr{ö}m approximation. Our approach is easy and highly flexible. It is compatible with any kernel function and it allows exploiting multiple kernels. We show that our architecture has the same performance than standard architecture on datasets like SVHN and CIFAR100. One benefit of the method lies in its limited number of learnable parameters which makes it particularly suited for small training set sizes, e.g. from 5 to 20 samples per class.
△ Less
Submitted 29 November, 2019;
originally announced November 2019.
-
Explaining Black Boxes on Sequential Data using Weighted Automata
Authors:
Stephane Ayache,
Remi Eyraud,
Noe Goudian
Abstract:
Understanding how a learned black box works is of crucial interest for the future of Machine Learning. In this paper, we pioneer the question of the global interpretability of learned black box models that assign numerical values to symbolic sequential data. To tackle that task, we propose a spectral algorithm for the extraction of weighted automata (WA) from such black boxes. This algorithm does…
▽ More
Understanding how a learned black box works is of crucial interest for the future of Machine Learning. In this paper, we pioneer the question of the global interpretability of learned black box models that assign numerical values to symbolic sequential data. To tackle that task, we propose a spectral algorithm for the extraction of weighted automata (WA) from such black boxes. This algorithm does not require the access to a dataset or to the inner representation of the black box: the inferred model can be obtained solely by querying the black box, feeding it with inputs and analyzing its outputs. Experiments using Recurrent Neural Networks (RNN) trained on a wide collection of 48 synthetic datasets and 2 real datasets show that the obtained approximation is of great quality.
△ Less
Submitted 12 October, 2018;
originally announced October 2018.
-
Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos
Authors:
Hugo Jair Escalante,
Heysem Kaya,
Albert Ali Salah,
Sergio Escalera,
Yagmur Gucluturk,
Umut Guclu,
Xavier Baro,
Isabelle Guyon,
Julio Jacques Junior,
Meysam Madadi,
Stephane Ayache,
Evelyne Viegas,
Furkan Gurpinar,
Achmadnoer Sukma Wicaksana,
Cynthia C. S. Liem,
Marcel A. J. van Gerven,
Rob van Lier
Abstract:
Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in…
▽ More
Explainability and interpretability are two critical aspects of decision support systems. Within computer vision, they are critical in certain tasks related to human behavior analysis such as in health care applications. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of computer vision with an emphasis on looking at people tasks. Specifically, we review and study those mechanisms in the context of first impressions analysis. To the best of our knowledge, this is the first effort in this direction. Additionally, we describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, the evaluation protocol, and summarize the results of the challenge. Finally, derived from our study, we outline research opportunities that we foresee will be decisive in the near future for the development of the explainable computer vision field.
△ Less
Submitted 28 September, 2019; v1 submitted 2 February, 2018;
originally announced February 2018.
-
Majority Vote of Diverse Classifiers for Late Fusion
Authors:
Emilie Morvant,
Amaury Habrard,
Stéphane Ayache
Abstract:
In the past few years, a lot of attention has been devoted to multimedia indexing by fusing multimodal informations. Two kinds of fusion schemes are generally considered: The early fusion and the late fusion. We focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic pr…
▽ More
In the past few years, a lot of attention has been devoted to multimedia indexing by fusing multimodal informations. Two kinds of fusion schemes are generally considered: The early fusion and the late fusion. We focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the machine learning PAC-Bayesian theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while maximizing the voters' diversity. We propose an extension of MinCq tailored to multimedia indexing. Our method is based on an order-preserving pairwise loss adapted to ranking that allows us to improve Mean Averaged Precision measure while taking into account the diversity of the voters that we want to fuse. We provide evidence that this method is naturally adapted to late fusion procedures and confirm the good behavior of our approach on the challenging PASCAL VOC'07 benchmark.
△ Less
Submitted 19 June, 2014; v1 submitted 30 April, 2014;
originally announced April 2014.
-
PAC-Bayesian Majority Vote for Late Classifier Fusion
Authors:
Emilie Morvant,
Amaury Habrard,
Stéphane Ayache
Abstract:
A lot of attention has been devoted to multimedia indexing over the past few years. In the literature, we often consider two kinds of fusion schemes: The early fusion and the late fusion. In this paper we focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program n…
▽ More
A lot of attention has been devoted to multimedia indexing over the past few years. In the literature, we often consider two kinds of fusion schemes: The early fusion and the late fusion. In this paper we focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the Machine Learning PAC-Bayes theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while making use of the voters' diversity. We provide evidence that this method is naturally adapted to late fusion procedure. We propose an extension of MinCq by adding an order- preserving pairwise loss for ranking, hel** to improve Mean Averaged Precision measure. We confirm the good behavior of the MinCq-based fusion approaches with experiments on a real image benchmark.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.