Skip to main content

Showing 1–50 of 62 results for author: Pedersoli, M

.
  1. arXiv:2405.17517  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

    Authors: Louis Fournier, Adel Nabli, Masih Aminbeidokhti, Marco Pedersoli, Eugene Belilovsky, Edouard Oyallon

    Abstract: The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as model… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2404.19654  [pdf, other

    cs.CV cs.LG

    Masked Multi-Query Slot Attention for Unsupervised Object Discovery

    Authors: Rishav Pramanik, José-Fabian Villa-Vásquez, Marco Pedersoli

    Abstract: Unsupervised object discovery is becoming an essential line of research for tackling recognition problems that require decomposing an image into entities, such as semantic segmentation and object detection. Recently, object-centric methods that leverage self-supervision have gained popularity, due to their simplicity and adaptability to different settings and conditions. However, those methods do… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Paper accepted for presentation at IJCNN 2024

  3. arXiv:2404.18849  [pdf, other

    cs.CV

    MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection

    Authors: Heitor R. Medeiros, David Latortue, Fidel Guerrero Pena, Eric Granger, Marco Pedersoli

    Abstract: In this paper, we present a different way to use two modalities, in which either one modality or the other is seen by a single model. This can be useful when adapting an unimodal model to leverage more information while respecting a limited computational budget. This would mean having a single model that is able to deal with any modalities. To describe this, we coined the term anymodal learning. A… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2404.10034  [pdf, other

    cs.CV cs.LG

    Realistic Model Selection for Weakly Supervised Object Localization

    Authors: Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Eric Granger

    Abstract: Weakly Supervised Object Localization (WSOL) allows for training deep learning models for classification and localization, using only global class-level labels. The lack of bounding box (bbox) supervision during training represents a considerable challenge for hyper-parameter search and model selection. Earlier WSOL works implicitly observed localization performance over a test set which leads to… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 13 pages, 5 figures

  5. arXiv:2404.01492  [pdf, other

    cs.CV cs.AI

    Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge

    Authors: Heitor Rapela Medeiros, Masih Aminbeidokhti, Fidel Guerrero Pena, David Latortue, Eric Granger, Marco Pedersoli

    Abstract: A common practice in deep learning consists of training large neural networks on massive datasets to perform accurately for different domains and tasks. While this methodology may work well in numerous application areas, it only applies across modalities due to a larger distribution shift in data captured using different sensors. This paper focuses on the problem of adapting a large object detecti… ▽ More

    Submitted 11 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  6. arXiv:2403.15567  [pdf, other

    cs.LG cs.CV

    Do not trust what you trust: Miscalibration in Semi-supervised Learning

    Authors: Shambhavi Mishra, Balamurali Murugesan, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz

    Abstract: State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  7. arXiv:2403.10488  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Joint Multimodal Transformer for Emotion Recognition in the Wild

    Authors: Paul Waligora, Haseeb Aslam, Osama Zeeshan, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger

    Abstract: Multimodal emotion recognition (MMER) systems typically outperform unimodal systems by leveraging the inter- and intra-modal relationships between, e.g., visual, textual, physiological, and auditory modalities. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention. This framework can exploit the complementary nature of dive… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 10 pages, 4 figures, 6 tables, CVPRw 2024

  8. arXiv:2403.09918  [pdf, other

    cs.CV cs.LG

    Attention-based Class-Conditioned Alignment for Multi-Source Domain Adaptive Object Detection

    Authors: Atif Belal, Akhil Meethal, Francisco Perdigon Romero, Marco Pedersoli, Eric Granger

    Abstract: Domain adaptation methods for object detection (OD) strive to mitigate the impact of distribution shifts by promoting feature alignment across source and target domains. Multi-source domain adaptation (MSDA) allows leveraging multiple annotated source datasets, and unlabeled target data to improve the accuracy and robustness of the detection model. Most state-of-the-art MSDA methods for OD perform… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  9. arXiv:2402.00281  [pdf, other

    cs.CV

    Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

    Authors: Soufiane Belharbi, Marco Pedersoli, Alessandro Lameiras Koerich, Simon Bacon, Eric Granger

    Abstract: Although state-of-the-art classifiers for facial expression recognition (FER) can achieve a high level of accuracy, they lack interpretability, an important feature for end-users. Experts typically associate spatial action units (\aus) from a codebook to facial regions for the visual interpretation of expressions. In this paper, the same expert steps are followed. A new learning strategy is propos… ▽ More

    Submitted 14 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: 15 pages, 11 figures, 3 tables, International Conference on Automatic Face and Gesture Recognition (FG 2024)

  10. arXiv:2401.15489  [pdf, other

    cs.CV cs.AI

    Distilling Privileged Multimodal Information for Expression Recognition using Optimal Transport

    Authors: Muhammad Haseeb Aslam, Muhammad Osama Zeeshan, Soufiane Belharbi, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Eric Granger

    Abstract: Deep learning models for multimodal expression recognition have reached remarkable performance in controlled laboratory environments because of their ability to learn complementary and redundant semantic information. However, these models struggle in the wild, mainly because of the unavailability and quality of modalities used for training. In practice, only a subset of the training-time modalitie… ▽ More

    Submitted 28 April, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

  11. arXiv:2312.11556  [pdf, other

    cs.CV cs.AI cs.CL

    StarVector: Generating Scalable Vector Graphics Code from Images

    Authors: Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, Marco Pedersoli

    Abstract: Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simple… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  12. arXiv:2312.05632  [pdf, other

    cs.CV

    Subject-Based Domain Adaptation for Facial Expression Recognition

    Authors: Muhammad Osama Zeeshan, Muhammad Haseeb Aslam, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger

    Abstract: Adapting a deep learning model to a specific target individual is a challenging facial expression recognition (FER) task that may be achieved using unsupervised domain adaptation (UDA) methods. Although several UDA methods have been proposed to adapt deep FER models across source and target data sets, multiple subject-specific source domains are needed to accurately represent the intra- and inter-… ▽ More

    Submitted 26 April, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

  13. arXiv:2311.11974  [pdf, other

    cs.CV cs.AI cs.LG

    Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting

    Authors: David Latortue, Moetez Kdayem, Fidel A Guerrero Peña, Eric Granger, Marco Pedersoli

    Abstract: Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly bounding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision can affect the performance of dee… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

  14. arXiv:2310.06670  [pdf, other

    cs.LG cs.CV

    Domain Generalization by Rejecting Extreme Augmentations

    Authors: Masih Aminbeidokhti, Fidel A. Guerrero Peña, Heitor Rapela Medeiros, Thomas Dubail, Eric Granger, Marco Pedersoli

    Abstract: Data augmentation is one of the most effective techniques for regularizing deep learning models and improving their recognition performance in a variety of tasks and domains. However, this holds for standard in-domain settings, in which the training and test data follow the same distribution. For the out-of-domain case, where the test data follow a different and unknown distribution, the best reci… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  15. DiPS: Discriminative Pseudo-Label Sampling with Self-Supervised Transformers for Weakly Supervised Object Localization

    Authors: Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Aydin Sarraf, Eric Granger

    Abstract: Self-supervised vision transformers (SSTs) have shown great potential to yield rich localization maps that highlight different objects in an image. However, these maps remain class-agnostic since the model is unsupervised. They often tend to decompose the image into multiple maps containing different objects while being unable to distinguish the object of interest from background noise objects. In… ▽ More

    Submitted 18 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Journal ref: Image and Vision Computing 140C (2023) 104838

  16. arXiv:2310.04662  [pdf, other

    cs.CV cs.AI

    HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information

    Authors: Heitor Rapela Medeiros, Fidel A. Guerrero Pena, Masih Aminbeidokhti, Thomas Dubail, Eric Granger, Marco Pedersoli

    Abstract: A powerful way to adapt a visual recognition model to a new domain is through image translation. However, common image translation approaches only focus on generating data from the same distribution as the target domain. Given a cross-modal application, such as pedestrian detection from aerial images, with a considerable shift in data distribution between infrared (IR) to visible (RGB) images, a t… ▽ More

    Submitted 22 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2024

  17. arXiv:2310.02416  [pdf, other

    cs.LG cs.CV

    Bag of Tricks for Fully Test-Time Adaptation

    Authors: Saypraseuth Mounsaveng, Florent Chiaroni, Malik Boudiaf, Marco Pedersoli, Ismail Ben Ayed

    Abstract: Fully Test-Time Adaptation (TTA), which aims at adapting models to data drifts, has recently attracted wide interest. Numerous tricks and techniques have been proposed to ensure robust learning on arbitrary streams of unlabeled data. However, assessing the true impact of each individual technique and obtaining a fair comparison still constitutes a significant challenge. To help consolidate the com… ▽ More

    Submitted 9 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at WACV 2024

  18. arXiv:2309.14950  [pdf, other

    cs.CV cs.AI

    Multi-Source Domain Adaptation for Object Detection with Prototype-based Mean-teacher

    Authors: Atif Belal, Akhil Meethal, Francisco Perdigon Romero, Marco Pedersoli, Eric Granger

    Abstract: Adapting visual object detectors to operational target domains is a challenging task, commonly achieved using unsupervised domain adaptation (UDA) methods. Recent studies have shown that when the labeled dataset comes from multiple source domains, treating them as separate domains and performing a multi-source domain adaptation (MSDA) improves the accuracy and robustness over blending these source… ▽ More

    Submitted 15 December, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  19. arXiv:2308.05032  [pdf, other

    cs.CV cs.LG

    Density Crop-guided Semi-supervised Object Detection in Aerial Images

    Authors: Akhil Meethal, Eric Granger, Marco Pedersoli

    Abstract: One of the important bottlenecks in training modern object detectors is the need for labeled images where bounding box annotations have to be produced for each object present in the image. This bottleneck is further exacerbated in aerial images where the annotators have to label small objects often distributed in clusters on high-resolution images. In recent days, the mean-teacher approach trained… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 12 pages, 8 figures

  20. arXiv:2306.11982  [pdf, other

    cs.CV cs.LG

    Balanced Mixture of SuperNets for Learning the CNN Pooling Architecture

    Authors: Mehraveh Javan, Matthew Toews, Marco Pedersoli

    Abstract: Downsampling layers, including pooling and strided convolutions, are crucial components of the convolutional neural network architecture that determine both the granularity/scale of image feature analysis as well as the receptive field size of a given layer. To fully understand this problem, we analyse the performance of models independently trained with each pooling configurations on CIFAR10, usi… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  21. arXiv:2306.00800  [pdf, other

    cs.CV cs.AI

    FigGen: Text to Scientific Figure Generation

    Authors: Juan A Rodriguez, David Vazquez, Issam Laradji, Marco Pedersoli, Pau Rodriguez

    Abstract: The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. Recent techniques have shown impressive potential in creating complex visual compositions while delivering impressive realism and quality. However, state-of-the-art methods have been focusing on the narrow domain of natural images, while other distributions remain… ▽ More

    Submitted 17 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Published at ICLR 2023 as a Tiny Paper

  22. arXiv:2303.09044  [pdf, other

    cs.CV

    CoLo-CAM: Class Activation Map** for Object Co-Localization in Weakly-Labeled Unconstrained Videos

    Authors: Soufiane Belharbi, Shakeeb Murtaza, Marco Pedersoli, Ismail Ben Ayed, Luke McCaffrey, Eric Granger

    Abstract: Leveraging spatiotemporal information in videos is critical for weakly supervised video object localization (WSVOL) tasks. However, state-of-the-art methods only rely on visual and motion cues, while discarding discriminative information, making them susceptible to inaccurate localizations. Recently, discriminative models have been explored for WSVOL tasks using a temporal class activation map**… ▽ More

    Submitted 28 February, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 18 pages, 6 figures

  23. arXiv:2303.08747  [pdf, other

    cs.CV cs.AI

    Cascaded Zoom-in Detector for High Resolution Aerial Images

    Authors: Akhil Meethal, Eric Granger, Marco Pedersoli

    Abstract: Detecting objects in aerial images is challenging because they are typically composed of crowded small objects distributed non-uniformly over high-resolution images. Density crop** is a widely used method to improve this small object detection where the crowded small object regions are extracted and processed in high resolution. However, this is typically accomplished by adding other learnable c… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 12 pages, 7 figures

  24. arXiv:2212.12042  [pdf, other

    cs.CV

    Re-basin via implicit Sinkhorn differentiation

    Authors: Fidel A. Guerrero Peña, Heitor Rapela Medeiros, Thomas Dubail, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli

    Abstract: The recent emergence of new algorithms for permuting models into functionally equivalent regions of the solution space has shed some light on the complexity of error surfaces, and some promising properties like mode connectivity. However, finding the right permutation is challenging, and current optimization techniques are not differentiable, which makes it difficult to integrate into a gradient-b… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

  25. arXiv:2211.03626  [pdf, other

    cs.CV

    Camera Alignment and Weighted Contrastive Learning for Domain Adaptation in Video Person ReID

    Authors: Djebril Mekhazni, Maximilien Dufau, Christian Desrosiers, Marco Pedersoli, Eric Granger

    Abstract: Systems for person re-identification (ReID) can achieve a high accuracy when trained on large fully-labeled image datasets. However, the domain shift typically associated with diverse operational capture conditions (e.g., camera viewpoints and lighting) may translate to a significant decline in performance. This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID - a relevan… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: IEEE/CVF Winter Conference on Applications of Computer Vision(WACV) 2023

  26. arXiv:2210.11248  [pdf, other

    cs.CV

    OCR-VQGAN: Taming Text-within-Image Generation

    Authors: Juan A. Rodriguez, David Vazquez, Issam Laradji, Marco Pedersoli, Pau Rodriguez

    Abstract: Synthetic image generation has recently experienced significant improvements in domains such as natural image or art generation. However, the problem of figure and diagram generation remains unexplored. A challenging aspect of generating figures and diagrams is effectively rendering readable texts within the images. To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder tha… ▽ More

    Submitted 21 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: Paper accepted at WACV 2023

  27. arXiv:2209.11335  [pdf, other

    cs.CV

    Privacy-Preserving Person Detection Using Low-Resolution Infrared Cameras

    Authors: Thomas Dubail, Fidel Alejandro Guerrero Peña, Heitor Rapela Medeiros, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli

    Abstract: In intelligent building management, knowing the number of people and their location in a room are important for better control of its illumination, ventilation, and heating with reduced costs and improved comfort. This is typically achieved by detecting people using compact embedded devices that are installed on the room's ceiling, and that integrate low-resolution infrared camera, which conceals… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  28. arXiv:2209.09209  [pdf, other

    cs.CV

    Discriminative Sampling of Proposals in Self-Supervised Transformers for Weakly Supervised Object Localization

    Authors: Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Aydin Sarraf, Eric Granger

    Abstract: Drones are employed in a growing number of visual recognition applications. A recent development in cell tower inspection is drone-based asset surveillance, where the autonomous flight of a drone is guided by localizing objects of interest in successive aerial images. In this paper, we propose a method to train deep weakly-supervised object localization (WSOL) models based only on image-class labe… ▽ More

    Submitted 19 November, 2022; v1 submitted 9 September, 2022; originally announced September 2022.

  29. arXiv:2209.09195  [pdf, other

    cs.CV

    Constrained Sampling for Class-Agnostic Weakly Supervised Object Localization

    Authors: Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Aydin Sarraf, Eric Granger

    Abstract: Self-supervised vision transformers can generate accurate localization maps of the objects in an image. However, since they decompose the scene into multiple maps containing various objects, and they do not rely on any explicit supervisory signal, they cannot distinguish between the object of interest from other objects, as required in weakly-supervised object localization (WSOL). To address this… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: 3 pages, 2 figures

  30. arXiv:2204.00147  [pdf, other

    cs.CV

    Semi-Weakly Supervised Object Detection by Sampling Pseudo Ground-Truth Boxes

    Authors: Akhil Meethal, Marco Pedersoli, Zhongwen Zhu, Francisco Perdigon Romero, Eric Granger

    Abstract: Semi- and weakly-supervised learning have recently attracted considerable attention in the object detection literature since they can alleviate the cost of annotation needed to successfully train deep learning models. State-of-art approaches for semi-supervised learning rely on student-teacher models trained using a multi-stage process, and considerable data augmentation. Custom networks have been… ▽ More

    Submitted 16 June, 2022; v1 submitted 31 March, 2022; originally announced April 2022.

    Comments: Accepted at IJCNN 2022

  31. arXiv:2203.14779  [pdf, other

    cs.CV cs.HC cs.SD eess.AS

    A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

    Authors: R. Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, Haseeb Aslam, Osama Zeeshan, Théo Denorme, Marco Pedersoli, Alessandro Koerich, Simon Bacon, Patrick Cardinal, Eric Granger

    Abstract: Multimodal emotion recognition has recently gained much attention since it can leverage diverse and complementary relationships over multiple modalities (e.g., audio, visual, biosignals, etc.), and can provide some robustness to noisy modalities. Most state-of-the-art methods for audio-visual (A-V) fusion rely on recurrent networks or conventional attention mechanisms that do not effectively lever… ▽ More

    Submitted 6 July, 2024; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2111.05222

  32. arXiv:2202.02371  [pdf, other

    eess.IV cs.CV

    Boundary-aware Information Maximization for Self-supervised Medical Image Segmentation

    Authors: Jizong Peng, ** Wang, Marco Pedersoli, Christian Desrosiers

    Abstract: Unsupervised pre-training has been proven as an effective approach to boost various downstream tasks given limited labeled data. Among various methods, contrastive learning learns a discriminative representation by constructing positive and negative pairs. However, it is not trivial to build reasonable pairs for a segmentation task in an unsupervised way. In this work, we propose a novel unsupervi… ▽ More

    Submitted 16 February, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  33. arXiv:2201.02445  [pdf, other

    eess.IV cs.CV cs.LG

    Negative Evidence Matters in Interpretable Histology Image Classification

    Authors: Soufiane Belharbi, Marco Pedersoli, Ismail Ben Ayed, Luke McCaffrey, Eric Granger

    Abstract: Using only global image-class labels, weakly-supervised learning methods, such as class activation map**, allow training CNNs to jointly classify an image, and locate regions of interest associated with the predicted class. However, without any guidance at the pixel level, such methods may yield inaccurate regions. This problem is known to be more challenging with histology images than with natu… ▽ More

    Submitted 5 May, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: 9 figures

  34. arXiv:2111.08651  [pdf

    cs.CV

    Diversified Multi-prototype Representation for Semi-supervised Segmentation

    Authors: Jizong Peng, Christian Desrosiers, Marco Pedersoli

    Abstract: This work considers semi-supervised segmentation as a dense prediction problem based on prototype vector correlation and proposes a simple way to represent each segmentation class with multiple prototypes. To avoid degenerate solutions, two regularization strategies are applied on unlabeled images. The first one leverages mutual information maximization to ensure that all prototype vectors are con… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

  35. arXiv:2109.07069  [pdf, other

    cs.CV cs.LG

    F-CAM: Full Resolution Class Activation Maps via Guided Parametric Upscaling

    Authors: Soufiane Belharbi, Aydin Sarraf, Marco Pedersoli, Ismail Ben Ayed, Luke McCaffrey, Eric Granger

    Abstract: Class Activation Map** (CAM) methods have recently gained much attention for weakly-supervised object localization (WSOL) tasks. They allow for CNN visualization and interpretation without training on fully annotated image datasets. CAM methods are typically integrated within off-the-shelf CNN backbones, such as ResNet50. Due to convolution and pooling operations, these backbones yield low resol… ▽ More

    Submitted 20 October, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: 23pages, WACV 2022

  36. arXiv:2107.13741  [pdf, other

    cs.CV

    Self-Paced Contrastive Learning for Semi-supervised Medical Image Segmentation with Meta-labels

    Authors: Jizong Peng, ** Wang, Chrisitian Desrosiers, Marco Pedersoli

    Abstract: Pre-training a recognition model with contrastive learning on a large dataset of unlabeled data has shown great potential to boost the performance of a downstream task, e.g., image classification. However, in domains such as medical imaging, collecting unlabeled data can be challenging and expensive. In this work, we propose to adapt contrastive learning to work with meta-label annotations, for im… ▽ More

    Submitted 22 August, 2021; v1 submitted 29 July, 2021; originally announced July 2021.

  37. arXiv:2107.05532  [pdf, other

    cs.CV

    Context-aware virtual adversarial training for anatomically-plausible segmentation

    Authors: ** Wang, Jizong Peng, Marco Pedersoli, Yuanfeng Zhou, Caiming Zhang, Christian Desrosiers

    Abstract: Despite their outstanding accuracy, semi-supervised segmentation methods based on deep neural networks can still yield predictions that are considered anatomically impossible by clinicians, for instance, containing holes or disconnected regions. To solve this problem, we present a Context-aware Virtual Adversarial Training (CaVAT) method for generating anatomically plausible segmentation. Unlike a… ▽ More

    Submitted 13 July, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: This paper is accepted at MICCAI2021

  38. arXiv:2104.06476  [pdf, other

    cs.CV

    Incremental Multi-Target Domain Adaptation for Object Detection with Efficient Domain Transfer

    Authors: Le Thanh Nguyen-Meidine, Madhu Kiran, Marco Pedersoli, Jose Dolz, Louis-Antoine Blais-Morin, Eric Granger

    Abstract: Recent advances in unsupervised domain adaptation have significantly improved the recognition accuracy of CNNs by alleviating the domain shift between (labeled) source and (unlabeled) target data distributions. While the problem of single-target domain adaptation (STDA) for object detection has recently received much attention, multi-target domain adaptation (MTDA) remains largely unexplored, desp… ▽ More

    Submitted 11 May, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted for Journal of Pattern Recognition. Code available at https://github.com/Natlem/M-HTCN

  39. arXiv:2103.04813  [pdf, other

    cs.CV

    Boosting Semi-supervised Image Segmentation with Global and Local Mutual Information Regularization

    Authors: Jizong Peng, Marco Pedersoli, Christian Desrosiers

    Abstract: The scarcity of labeled data often impedes the application of deep learning to the segmentation of medical images. Semi-supervised learning seeks to overcome this limitation by exploiting unlabeled examples in the learning process. In this paper, we present a novel semi-supervised segmentation method that leverages mutual information (MI) on categorical distributions to achieve both global represe… ▽ More

    Submitted 24 June, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

  40. arXiv:2011.11857  [pdf, other

    cs.LG cs.CV

    Augmented Lagrangian Adversarial Attacks

    Authors: Jérôme Rony, Eric Granger, Marco Pedersoli, Ismail Ben Ayed

    Abstract: Adversarial attack algorithms are dominated by penalty methods, which are slow in practice, or more efficient distance-customized methods, which are heavily tailored to the properties of the distance considered. We propose a white-box attack algorithm to generate minimally perturbed adversarial examples based on Augmented Lagrangian principles. We bring several algorithmic modifications, which hav… ▽ More

    Submitted 19 August, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: ICCV 2021 (Poster). Code available at: https://github.com/jeromerony/augmented_lagrangian_adversarial_attacks

  41. arXiv:2011.05227  [pdf, other

    cs.CV cs.LG

    Temporal Stochastic Softmax for 3D CNNs: An Application in Facial Expression Recognition

    Authors: Théo Ayral, Marco Pedersoli, Simon Bacon, Eric Granger

    Abstract: Training deep learning models for accurate spatiotemporal recognition of facial expressions in videos requires significant computational resources. For practical reasons, 3D Convolutional Neural Networks (3D CNNs) are usually trained with relatively short clips randomly extracted from videos. However, such uniform sampling is generally sub-optimal because equal importance is assigned to each tempo… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: Accepted to WACV 2021

  42. Self-paced and self-consistent co-training for semi-supervised image segmentation

    Authors: ** Wang, Jizong Peng, Marco Pedersoli, Yuanfeng Zhou, Caiming Zhang, Christian Desrosiers

    Abstract: Deep co-training has recently been proposed as an effective approach for image segmentation when annotated data is scarce. In this paper, we improve existing approaches for semi-supervised segmentation with a self-paced and self-consistent co-training method. To help distillate information from unlabeled images, we first design a self-paced learning strategy for co-training that lets jointly-train… ▽ More

    Submitted 10 February, 2021; v1 submitted 31 October, 2020; originally announced November 2020.

    Journal ref: Medical Image Analysis, 2021

  43. arXiv:2006.14699  [pdf, other

    cs.CV stat.ML

    Learning Data Augmentation with Online Bilevel Optimization for Image Classification

    Authors: Saypraseuth Mounsaveng, Issam Laradji, Ismail Ben Ayed, David Vazquez, Marco Pedersoli

    Abstract: Data augmentation is a key practice in machine learning for improving generalization performance. However, finding the best data augmentation hyperparameters requires domain knowledge or a computationally demanding search. We address this issue by proposing an efficient approach to automatically train a network that learns an effective distribution of transformations to improve its generalization.… ▽ More

    Submitted 10 November, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

  44. arXiv:2003.08983  [pdf, other

    cs.LG cs.CV stat.ML

    A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses

    Authors: Malik Boudiaf, Jérôme Rony, Imtiaz Masud Ziko, Eric Granger, Marco Pedersoli, Pablo Piantanida, Ismail Ben Ayed

    Abstract: Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses, which require convoluted schemes to ease optimization, such as sample mining or pair weighting. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it… ▽ More

    Submitted 26 November, 2021; v1 submitted 19 March, 2020; originally announced March 2020.

    Comments: ECCV 2020 (Spotlight) - Code available at: https://github.com/jeromerony/dml_cross_entropy

  45. arXiv:2003.08462  [pdf, other

    cs.CV

    Semi-supervised few-shot learning for medical image segmentation

    Authors: Abdur R Feyjie, Reza Azad, Marco Pedersoli, Claude Kauffman, Ismail Ben Ayed, Jose Dolz

    Abstract: Recent years have witnessed the great progress of deep neural networks on semantic segmentation, particularly in medical imaging. Nevertheless, training high-performing models require large amounts of pixel-level ground truth masks, which can be prohibitive to obtain in the medical domain. Furthermore, training such models in a low-data regime highly increases the risk of overfitting. Recent attem… ▽ More

    Submitted 8 April, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

  46. arXiv:2003.04052  [pdf, other

    cs.CV

    On the Texture Bias for Few-Shot CNN Segmentation

    Authors: Reza Azad, Abdur R Fayjie, Claude Kauffman, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz

    Abstract: Despite the initial belief that Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks, recent evidence suggests that texture bias in CNNs provides higher performing models when learning on large labeled training datasets. This contrasts with the perceptual bias in the human visual cortex, which has a stronger preference towards shape components. Perceptual d… ▽ More

    Submitted 23 December, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: Accepted at WACV'21

  47. arXiv:1912.01522  [pdf, other

    cs.CV cs.LG

    Convolutional STN for Weakly Supervised Object Localization

    Authors: Akhil Meethal, Marco Pedersoli, Soufiane Belharbi, Eric Granger

    Abstract: Weakly supervised object localization is a challenging task in which the object of interest should be localized while learning its appearance. State-of-the-art methods recycle the architecture of a standard CNN by using the activation maps of the last layer for localizing the object. While this approach is simple and works relatively well, object localization relies on different features than clas… ▽ More

    Submitted 1 December, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: 8 pages, 9 figures

  48. arXiv:1911.04477  [pdf, other

    cs.LG cs.NE

    A Computing Kernel for Network Binarization on PyTorch

    Authors: Xianda Xu, Marco Pedersoli

    Abstract: Deep Neural Networks have now achieved state-of-the-art results in a wide range of tasks including image classification, object detection and so on. However, they are both computation consuming and memory intensive, making them difficult to deploy on low-power devices. Network binarization is one of the existing effective techniques for model compression and acceleration, but there is no computing… ▽ More

    Submitted 11 November, 2019; originally announced November 2019.

  49. arXiv:1910.01665  [pdf, other

    cs.CV

    Information based Deep Clustering: An experimental study

    Authors: Jizong Peng, Christian Desrosiers, Marco Pedersoli

    Abstract: Recently, two methods have shown outstanding performance for clustering images and jointly learning the feature representation. The first, called Information Maximiz-ing Self-Augmented Training (IMSAT), maximizes the mutual information between input and clusters while using a regularization term based on virtual adversarial examples. The second, named Invariant Information Clustering (IIC), maximi… ▽ More

    Submitted 10 December, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

  50. Emotion Recognition with Spatial Attention and Temporal Softmax Pooling

    Authors: Masih Aminbeidokhti, Marco Pedersoli, Patrick Cardinal, Eric Granger

    Abstract: Video-based emotion recognition is a challenging task because it requires to distinguish the small deformations of the human face that represent emotions, while being invariant to stronger visual differences due to different identities. State-of-the-art methods normally use complex deep learning models such as recurrent neural networks (RNNs, LSTMs, GRUs), convolutional neural networks (CNNs, C3D,… ▽ More

    Submitted 3 October, 2019; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: 9 pages; 2 figures; 2 tables; Best paper award at ICIAR 2019