-
TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic Segmentation
Authors:
Ruiwen Li,
Zheda Mai,
Chiheb Trabelsi,
Zhibo Zhang,
Jongseong Jang,
Scott Sanner
Abstract:
Weakly supervised semantic segmentation (WSSS) with only image-level supervision is a challenging task. Most existing methods exploit Class Activation Maps (CAM) to generate pixel-level pseudo labels for supervised training. However, due to the local receptive field of Convolution Neural Networks (CNN), CAM applied to CNNs often suffers from partial activation -- highlighting the most discriminati…
▽ More
Weakly supervised semantic segmentation (WSSS) with only image-level supervision is a challenging task. Most existing methods exploit Class Activation Maps (CAM) to generate pixel-level pseudo labels for supervised training. However, due to the local receptive field of Convolution Neural Networks (CNN), CAM applied to CNNs often suffers from partial activation -- highlighting the most discriminative part instead of the entire object area. In order to capture both local features and global representations, the Conformer has been proposed to combine a visual transformer branch with a CNN branch. In this paper, we propose TransCAM, a Conformer-based solution to WSSS that explicitly leverages the attention weights from the transformer branch of the Conformer to refine the CAM generated from the CNN branch. TransCAM is motivated by our observation that attention weights from shallow transformer blocks are able to capture low-level spatial feature similarities while attention weights from deep transformer blocks capture high-level semantic context. Despite its simplicity, TransCAM achieves a new state-of-the-art performance of 69.3% and 69.6% on the respective PASCAL VOC 2012 validation and test sets, showing the effectiveness of transformer attention-based refinement of CAM for WSSS.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
ExCon: Explanation-driven Supervised Contrastive Learning for Image Classification
Authors:
Zhibo Zhang,
Jongseong Jang,
Chiheb Trabelsi,
Ruiwen Li,
Scott Sanner,
Yeonjeong Jeong,
Dongsub Shim
Abstract:
Contrastive learning has led to substantial improvements in the quality of learned embedding representations for tasks such as image classification. However, a key drawback of existing contrastive augmentation methods is that they may lead to the modification of the image content which can yield undesired alterations of its semantics. This can affect the performance of the model on downstream task…
▽ More
Contrastive learning has led to substantial improvements in the quality of learned embedding representations for tasks such as image classification. However, a key drawback of existing contrastive augmentation methods is that they may lead to the modification of the image content which can yield undesired alterations of its semantics. This can affect the performance of the model on downstream tasks. Hence, in this paper, we ask whether we can augment image data in contrastive learning such that the task-relevant semantic content of an image is preserved. For this purpose, we propose to leverage saliency-based explanation methods to create content-preserving masked augmentations for contrastive learning. Our novel explanation-driven supervised contrastive learning (ExCon) methodology critically serves the dual goals of encouraging nearby image embeddings to have similar content and explanation. To quantify the impact of ExCon, we conduct experiments on the CIFAR-100 and the Tiny ImageNet datasets. We demonstrate that ExCon outperforms vanilla supervised contrastive learning in terms of classification, explanation quality, adversarial robustness as well as probabilistic calibration in the context of distributional shift.
△ Less
Submitted 17 April, 2022; v1 submitted 28 November, 2021;
originally announced November 2021.
-
EDDA: Explanation-driven Data Augmentation to Improve Explanation Faithfulness
Authors:
Ruiwen Li,
Zhibo Zhang,
Jiani Li,
Chiheb Trabelsi,
Scott Sanner,
Jongseong Jang,
Yeonjeong Jeong,
Dongsub Shim
Abstract:
Recent years have seen the introduction of a range of methods for post-hoc explainability of image classifier predictions. However, these post-hoc explanations may not always be faithful to classifier predictions, which poses a significant challenge when attempting to debug models based on such explanations. To this end, we seek a methodology that can improve the faithfulness of an explanation met…
▽ More
Recent years have seen the introduction of a range of methods for post-hoc explainability of image classifier predictions. However, these post-hoc explanations may not always be faithful to classifier predictions, which poses a significant challenge when attempting to debug models based on such explanations. To this end, we seek a methodology that can improve the faithfulness of an explanation method with respect to model predictions which does not require ground truth explanations. We achieve this through a novel explanation-driven data augmentation (EDDA) technique that augments the training data with occlusions inferred from model explanations; this is based on the simple motivating principle that \emph{if} the explainer is faithful to the model \emph{then} occluding salient regions for the model prediction should decrease the model confidence in the prediction, while occluding non-salient regions should not change the prediction. To verify that the proposed augmentation method has the potential to improve faithfulness, we evaluate EDDA using a variety of datasets and classification models. We demonstrate empirically that our approach leads to a significant increase of faithfulness, which can facilitate better debugging and successful deployment of image classification models in real-world applications.
△ Less
Submitted 24 September, 2021; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Authors:
Titouan Parcollet,
Ying Zhang,
Mohamed Morchid,
Chiheb Trabelsi,
Georges Linarès,
Renato De Mori,
Yoshua Bengio
Abstract:
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives…
▽ More
Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence map** with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
Quaternion Recurrent Neural Networks
Authors:
Titouan Parcollet,
Mirco Ravanelli,
Mohamed Morchid,
Georges Linarès,
Chiheb Trabelsi,
Renato De Mori,
Yoshua Bengio
Abstract:
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector. W…
▽ More
Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector. We propose a novel quaternion recurrent neural network (QRNN), alongside with a quaternion long-short term memory neural network (QLSTM), that take into account both the external relations and these internal structural dependencies with the quaternion algebra. Similarly to capsules, quaternions allow the QRNN to code internal dependencies by composing and processing multidimensional features as single entities, while the recurrent operation reveals correlations between the elements composing the sequence. We show that both QRNN and QLSTM achieve better performances than RNN and LSTM in a realistic application of automatic speech recognition. Finally, we show that QRNN and QLSTM reduce by a maximum factor of 3.3x the number of free parameters needed, compared to real-valued RNNs and LSTMs to reach better results, leading to a more compact representation of the relevant information.
△ Less
Submitted 7 January, 2019; v1 submitted 12 June, 2018;
originally announced June 2018.
-
Probabilistic reconstruction of genealogies for polyploid plant species
Authors:
Frédéric Proïa,
Fabien Panloup,
Chiraz Trabelsi,
Jérémy Clotault
Abstract:
A probabilistic reconstruction of genealogies in a polyploid population (from 2x to 4x) is investigated, by considering genetic data analyzed as the probability of allele presence in a given genotype. Based on the likelihood of all possible crossbreeding patterns, our model enables us to infer and to quantify the whole potential genealogies in the population. We explain in particular how to deal w…
▽ More
A probabilistic reconstruction of genealogies in a polyploid population (from 2x to 4x) is investigated, by considering genetic data analyzed as the probability of allele presence in a given genotype. Based on the likelihood of all possible crossbreeding patterns, our model enables us to infer and to quantify the whole potential genealogies in the population. We explain in particular how to deal with the uncertain allelic multiplicity that may occur with polyploids. Then we build an \textit{ad hoc} penalized likelihood to compare genealogies and to decide whether a particular individual brings sufficient information to be included in the taken genealogy. This decision criterion enables us in a next part to suggest a greedy algorithm in order to explore missing links and to rebuild some connections in the genealogies, retrospectively. As a by-product, we also give a way to infer the individuals that may have been favored by breeders over the years. In the last part we highlight the results given by our model and our algorithm, firstly on a simulated population and then on a real population of rose bushes. Most of the methodology relies on the maximum likelihood principle and on graph theory.
△ Less
Submitted 28 November, 2018; v1 submitted 13 April, 2018;
originally announced April 2018.
-
Deep Complex Networks
Authors:
Chiheb Trabelsi,
Olexa Bilaniuk,
Ying Zhang,
Dmitriy Serdyuk,
Sandeep Subramanian,
João Felipe Santos,
Soroush Mehri,
Negar Rostamzadeh,
Yoshua Bengio,
Christopher J Pal
Abstract:
At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite…
▽ More
At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.
△ Less
Submitted 25 February, 2018; v1 submitted 27 May, 2017;
originally announced May 2017.
-
On orthogonality and learning recurrent networks with long term dependencies
Authors:
Eugene Vorontsov,
Chiheb Trabelsi,
Samuel Kadoury,
Chris Pal
Abstract:
It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding gradients is to use either soft or hard constraints on weight matrices so as to encourage or enforce o…
▽ More
It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding gradients is to use either soft or hard constraints on weight matrices so as to encourage or enforce orthogonality. Orthogonal matrices preserve gradient norm during backpropagation and may therefore be a desirable property. This paper explores issues with optimization convergence, speed and gradient stability when encouraging or enforcing orthogonality. To perform this analysis, we propose a weight matrix factorization and parameterization strategy through which we can bound matrix norms and therein control the degree of expansivity induced during backpropagation. We find that hard constraints on orthogonality can negatively affect the speed of convergence and model performance.
△ Less
Submitted 12 October, 2017; v1 submitted 31 January, 2017;
originally announced February 2017.
-
Exponential ergodicity of the jump-diffusion CIR process
Authors:
Peng **,
Barbara Rüdiger,
Chiraz Trabelsi
Abstract:
In this paper we study the jump-diffusion CIR process (shorted as JCIR), which is an extension of the classical CIR model. The jumps of the JCIR are introduced with the help of a pure-jump Lévy process $(J_t, t \ge 0)$. Under some suitable conditions on the Lévy measure of $(J_t, t \ge 0)$, we derive a lower bound for the transition densities of the JCIR process. We also find some sufficient condi…
▽ More
In this paper we study the jump-diffusion CIR process (shorted as JCIR), which is an extension of the classical CIR model. The jumps of the JCIR are introduced with the help of a pure-jump Lévy process $(J_t, t \ge 0)$. Under some suitable conditions on the Lévy measure of $(J_t, t \ge 0)$, we derive a lower bound for the transition densities of the JCIR process. We also find some sufficient condition guaranteeing the existence of a Forster-Lyapunov function for the JCIR process, which allows us to prove its exponential ergodicity.
△ Less
Submitted 10 March, 2015;
originally announced March 2015.
-
Positive Harris recurrence and exponential ergodicity of the basic affine jump-diffusion
Authors:
Peng **,
Barbara Rüdiger,
Chiraz Trabelsi
Abstract:
In this paper we find the transition densities of the basic affine jump-diffusion (BAJD), which is introduced by Duffie and Garleanu [D. Duffie and N. Garleanu, Risk and valuation of collateralized debt obligations, Financial Analysts Journal 57(1) (2001), pp. 41--59] as an extension of the CIR model with jumps. We prove the positive Harris recurrence and exponential ergodicity of the BAJD. Furthe…
▽ More
In this paper we find the transition densities of the basic affine jump-diffusion (BAJD), which is introduced by Duffie and Garleanu [D. Duffie and N. Garleanu, Risk and valuation of collateralized debt obligations, Financial Analysts Journal 57(1) (2001), pp. 41--59] as an extension of the CIR model with jumps. We prove the positive Harris recurrence and exponential ergodicity of the BAJD. Furthermore we prove that the unique invariant probability measure $π$ of the BAJD is absolutely continuous with respect to the Lebesgue measure and we also derive a closed form formula for the density function of $π$.
△ Less
Submitted 16 January, 2015; v1 submitted 15 January, 2015;
originally announced January 2015.