-
Smooth Pseudo-Labeling
Authors:
Nikolaos Karaliolios,
Hervé Le Borgne,
Florian Chabot
Abstract:
Semi-Supervised Learning (SSL) seeks to leverage large amounts of non-annotated data along with the smallest amount possible of annotated data in order to achieve the same level of performance as if all data were annotated. A fruitful method in SSL is Pseudo-Labeling (PL), which, however, suffers from the important drawback that the associated loss function has discontinuities in its derivatives,…
▽ More
Semi-Supervised Learning (SSL) seeks to leverage large amounts of non-annotated data along with the smallest amount possible of annotated data in order to achieve the same level of performance as if all data were annotated. A fruitful method in SSL is Pseudo-Labeling (PL), which, however, suffers from the important drawback that the associated loss function has discontinuities in its derivatives, which cause instabilities in performance when labels are very scarce. In the present work, we address this drawback with the introduction of a Smooth Pseudo-Labeling (SP L) loss function. It consists in adding a multiplicative factor in the loss function that smooths out the discontinuities in the derivative due to thresholding. In our experiments, we test our improvements on FixMatch and show that it significantly improves the performance in the regime of scarce labels, without addition of any modules, hyperparameters, or computational overhead. In the more stable regime of abundant labels, performance remains at the same level. Robustness with respect to variation of hyperparameters and training parameters is also significantly improved. Moreover, we introduce a new benchmark, where labeled images are selected randomly from the whole dataset, without imposing representation of each class proportional to its frequency in the dataset. We see that the smooth version of FixMatch does appear to perform better than the original, non-smooth implementation. However, more importantly, we notice that both implementations do not necessarily see their performance improve when labeled images are added, an important issue in the design of SSL algorithms that should be addressed so that Active Learning algorithms become more reliable and explainable.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields
Authors:
Doriand Petit,
Steve Bourgeois,
Dumitru Pavel,
Vincent Gay-Bellile,
Florian Chabot,
Loic Barthe
Abstract:
Recent advances in Neural Fields mostly rely on develo** task-specific supervision which often complicates the models. Rather than develo** hard-to-combine and specific modules, another approach generally overlooked is to directly inject generic priors on the scene representation (also called inductive biases) into the NeRF architecture. Based on this idea, we propose the RING-NeRF architectur…
▽ More
Recent advances in Neural Fields mostly rely on develo** task-specific supervision which often complicates the models. Rather than develo** hard-to-combine and specific modules, another approach generally overlooked is to directly inject generic priors on the scene representation (also called inductive biases) into the NeRF architecture. Based on this idea, we propose the RING-NeRF architecture which includes two inductive biases : a continuous multi-scale representation of the scene and an invariance of the decoder's latent space over spatial and scale domains. We also design a single reconstruction process that takes advantage of those inductive biases and experimentally demonstrates on-par performances in terms of quality with dedicated architecture on multiple tasks (anti-aliasing, few view reconstruction, SDF reconstruction without scene-specific initialization) while being more efficient. Moreover, RING-NeRF has the distinctive ability to dynamically increase the resolution of the model, opening the way to adaptive reconstruction.
△ Less
Submitted 14 March, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
MonoProb: Self-Supervised Monocular Depth Estimation with Interpretable Uncertainty
Authors:
Rémi Marsal,
Florian Chabot,
Angelique Loesch,
William Grolleau,
Hichem Sahbi
Abstract:
Self-supervised monocular depth estimation methods aim to be used in critical applications such as autonomous vehicles for environment analysis. To circumvent the potential imperfections of these approaches, a quantification of the prediction confidence is crucial to guide decision-making systems that rely on depth estimation. In this paper, we propose MonoProb, a new unsupervised monocular depth…
▽ More
Self-supervised monocular depth estimation methods aim to be used in critical applications such as autonomous vehicles for environment analysis. To circumvent the potential imperfections of these approaches, a quantification of the prediction confidence is crucial to guide decision-making systems that rely on depth estimation. In this paper, we propose MonoProb, a new unsupervised monocular depth estimation method that returns an interpretable uncertainty, which means that the uncertainty reflects the expected error of the network in its depth predictions. We rethink the stereo or the structure-from-motion paradigms used to train unsupervised monocular depth models as a probabilistic problem. Within a single forward pass inference, this model provides a depth prediction and a measure of its confidence, without increasing the inference time. We then improve the performance on depth and uncertainty with a novel self-distillation loss for which a student is supervised by a pseudo ground truth that is a probability distribution on depth output by a teacher. To quantify the performance of our models we design new metrics that, unlike traditional ones, measure the absolute performance of uncertainty predictions. Our experiments highlight enhancements achieved by our method on standard depth and uncertainty metrics as well as on our tailored metrics. https://github.com/CEA-LIST/MonoProb
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
The Background Also Matters: Background-Aware Motion-Guided Objects Discovery
Authors:
Sandra Kara,
Hejer Ammar,
Florian Chabot,
Quoc-Cuong Pham
Abstract:
Recent works have shown that objects discovery can largely benefit from the inherent motion information in video data. However, these methods lack a proper background processing, resulting in an over-segmentation of the non-object regions into random segments. This is a critical limitation given the unsupervised setting, where object segments and noise are not distinguishable. To address this limi…
▽ More
Recent works have shown that objects discovery can largely benefit from the inherent motion information in video data. However, these methods lack a proper background processing, resulting in an over-segmentation of the non-object regions into random segments. This is a critical limitation given the unsupervised setting, where object segments and noise are not distinguishable. To address this limitation we propose BMOD, a Background-aware Motion-guided Objects Discovery method. Concretely, we leverage masks of moving objects extracted from optical flow and design a learning mechanism to extend them to the true foreground composed of both moving and static objects. The background, a complementary concept of the learned foreground class, is then isolated in the object discovery process. This enables a joint learning of the objects discovery task and the object/non-object separation. The conducted experiments on synthetic and real-world datasets show that integrating our background handling with various cutting-edge methods brings each time a considerable improvement. Specifically, we improve the objects discovery performance with a large margin, while establishing a strong baseline for object/non-object separation.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Image Segmentation-based Unsupervised Multiple Objects Discovery
Authors:
Sandra Kara,
Hejer Ammar,
Florian Chabot,
Quoc-Cuong Pham
Abstract:
Unsupervised object discovery aims to localize objects in images, while removing the dependence on annotations required by most deep learning-based methods. To address this problem, we propose a fully unsupervised, bottom-up approach, for multiple objects discovery. The proposed approach is a two-stage framework. First, instances of object parts are segmented by using the intra-image similarity be…
▽ More
Unsupervised object discovery aims to localize objects in images, while removing the dependence on annotations required by most deep learning-based methods. To address this problem, we propose a fully unsupervised, bottom-up approach, for multiple objects discovery. The proposed approach is a two-stage framework. First, instances of object parts are segmented by using the intra-image similarity between self-supervised local features. The second step merges and filters the object parts to form complete object instances. The latter is performed by two CNN models that capture semantic information on objects from the entire dataset. We demonstrate that the pseudo-labels generated by our method provide a better precision-recall trade-off than existing single and multiple objects discovery methods. In particular, we provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Self-Supervised Pre-training of Vision Transformers for Dense Prediction Tasks
Authors:
Jaonary Rabarisoa,
Valentin Belissen,
Florian Chabot,
Quoc-Cuong Pham
Abstract:
We present a new self-supervised pre-training of Vision Transformers for dense prediction tasks. It is based on a contrastive loss across views that compares pixel-level representations to global image representations. This strategy produces better local features suitable for dense prediction tasks as opposed to contrastive pre-training based on global image representation only. Furthermore, our a…
▽ More
We present a new self-supervised pre-training of Vision Transformers for dense prediction tasks. It is based on a contrastive loss across views that compares pixel-level representations to global image representations. This strategy produces better local features suitable for dense prediction tasks as opposed to contrastive pre-training based on global image representation only. Furthermore, our approach does not suffer from a reduced batch size since the number of negative examples needed in the contrastive loss is in the order of the number of local features. We demonstrate the effectiveness of our pre-training strategy on two dense prediction tasks: semantic segmentation and monocular depth estimation.
△ Less
Submitted 7 June, 2022; v1 submitted 30 May, 2022;
originally announced May 2022.
-
Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations
Authors:
AI Redefined,
Sai Krishna Gottipati,
Sagar Kurandwad,
Clodéric Mars,
Gregory Szriftgiser,
François Chabot
Abstract:
Involving humans directly for the benefit of AI agents' training is getting traction thanks to several advances in reinforcement learning and human-in-the-loop learning. Humans can provide rewards to the agent, demonstrate tasks, design a curriculum, or act in the environment, but these benefits also come with architectural, functional design and engineering complexities. We present Cogment, a uni…
▽ More
Involving humans directly for the benefit of AI agents' training is getting traction thanks to several advances in reinforcement learning and human-in-the-loop learning. Humans can provide rewards to the agent, demonstrate tasks, design a curriculum, or act in the environment, but these benefits also come with architectural, functional design and engineering complexities. We present Cogment, a unifying open-source framework that introduces an actor formalism to support a variety of humans-agents collaboration typologies and training approaches. It is also scalable out of the box thanks to a distributed micro service architecture, and offers solutions to the aforementioned complexities.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation
Authors:
Abdallah Benzine,
Florian Chabot,
Bertrand Luvison,
Quoc Cong Pham,
Cahterine Achrd
Abstract:
Recently, several deep learning models have been proposed for 3D human pose estimation. Nevertheless, most of these approaches only focus on the single-person case or estimate 3D pose of a few people at high resolution. Furthermore, many applications such as autonomous driving or crowd analysis require pose estimation of a large number of people possibly at low-resolution. In this work, we present…
▽ More
Recently, several deep learning models have been proposed for 3D human pose estimation. Nevertheless, most of these approaches only focus on the single-person case or estimate 3D pose of a few people at high resolution. Furthermore, many applications such as autonomous driving or crowd analysis require pose estimation of a large number of people possibly at low-resolution. In this work, we present PandaNet (Pose estimAtioN and Dectection Anchor-based Network), a new single-shot, anchor-based and multi-person 3D pose estimation approach. The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass. It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box and allows the pose estimation of a possibly large number of people at low resolution. To manage people overlap**, we introduce a Pose-Aware Anchor Selection strategy. Moreover, as imbalance exists between different people sizes in the image, and joints coordinates have different uncertainties depending on these sizes, we propose a method to automatically optimize weights associated to different people scales and joints for efficient training. PandaNet surpasses previous single-shot methods on several challenging datasets: a multi-person urban virtual but very realistic dataset (JTA Dataset), and two real world 3D multi-person datasets (CMU Panoptic and MuPoTS-3D).
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
LapNet : Automatic Balanced Loss and Optimal Assignment for Real-Time Dense Object Detection
Authors:
Florian Chabot,
Quoc-Cuong Pham,
Mohamed Chaouch
Abstract:
Real-time single-stage object detectors based on deep learning still remain less accurate than more complex ones. The trade-off between model performance and computational speed is a major challenge. In this paper, we propose a new way to efficiently learn a single-shot detector which offers a very good compromise between these two objectives. To this end, we introduce LapNet, an anchor based dete…
▽ More
Real-time single-stage object detectors based on deep learning still remain less accurate than more complex ones. The trade-off between model performance and computational speed is a major challenge. In this paper, we propose a new way to efficiently learn a single-shot detector which offers a very good compromise between these two objectives. To this end, we introduce LapNet, an anchor based detector, trained end-to-end without any sampling strategy. Our approach aims to overcome two important problems encountered in training an anchor based detector: (1) ambiguity in the assignment of anchor to ground truth and (2) class and object size imbalance. To address the first limitation, we propose a soft positive/negative anchor assignment procedure based on a new overlap** function called "Per-Object Normalized Overlap" (PONO). This soft assignment can be self-corrected by the network itself to avoid ambiguity between close objects. To cope with the second limitation, we propose to learn additional weights, that are not used at inference, to efficiently manage sample imbalance. These two contributions make the detector learning more generic whatever the training dataset. Various experiments show the effectiveness of the proposed approach.
△ Less
Submitted 17 March, 2020; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image
Authors:
Florian Chabot,
Mohamed Chaouch,
Jaonary Rabarisoa,
Céline Teulière,
Thierry Chateau
Abstract:
In this paper, we present a novel approach, called Deep MANTA (Deep Many-Tasks), for many-task vehicle analysis from a given image. A robust convolutional network is introduced for simultaneous vehicle detection, part localization, visibility characterization and 3D dimension estimation. Its architecture is based on a new coarse-to-fine object proposal that boosts the vehicle detection. Moreover,…
▽ More
In this paper, we present a novel approach, called Deep MANTA (Deep Many-Tasks), for many-task vehicle analysis from a given image. A robust convolutional network is introduced for simultaneous vehicle detection, part localization, visibility characterization and 3D dimension estimation. Its architecture is based on a new coarse-to-fine object proposal that boosts the vehicle detection. Moreover, the Deep MANTA network is able to localize vehicle parts even if these parts are not visible. In the inference, the network's outputs are used by a real time robust pose estimation algorithm for fine orientation estimation and 3D vehicle localization. We show in experiments that our method outperforms monocular state-of-the-art approaches on vehicle detection, orientation and 3D location tasks on the very challenging KITTI benchmark.
△ Less
Submitted 22 March, 2017;
originally announced March 2017.