Search | arXiv e-print repository

Domain Generalization for In-Orbit 6D Pose Estimation

Authors: Antoine Legrand, Renaud Detry, Christophe De Vleeschouwer

Abstract: We address the problem of estimating the relative 6D pose, i.e., position and orientation, of a target spacecraft, from a monocular image, a key capability for future autonomous Rendezvous and Proximity Operations. Due to the difficulty of acquiring large sets of real images, spacecraft pose estimation networks are exclusively trained on synthetic ones. However, because those images do not capture… ▽ More We address the problem of estimating the relative 6D pose, i.e., position and orientation, of a target spacecraft, from a monocular image, a key capability for future autonomous Rendezvous and Proximity Operations. Due to the difficulty of acquiring large sets of real images, spacecraft pose estimation networks are exclusively trained on synthetic ones. However, because those images do not capture the illumination conditions encountered in orbit, pose estimation networks face a domain gap problem, i.e., they do not generalize to real images. Our work introduces a method that bridges this domain gap. It relies on a novel, end-to-end, neural-based architecture as well as a novel learning strategy. This strategy improves the domain generalization abilities of the network through multi-task learning and aggressive data augmentation policies, thereby enforcing the network to learn domain-invariant features. We demonstrate that our method effectively closes the domain gap, achieving state-of-the-art accuracy on the widespread SPEED+ dataset. Finally, ablation studies assess the impact of key components of our method on its generalization abilities. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.12728 [pdf, other]

Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations

Authors: Antoine Legrand, Renaud Detry, Christophe De Vleeschouwer

Abstract: We address the estimation of the 6D pose of an unknown target spacecraft relative to a monocular camera, a key step towards the autonomous rendezvous and proximity operations required by future Active Debris Removal missions. We present a novel method that enables an "off-the-shelf" spacecraft pose estimator, which is supposed to known the target CAD model, to be applied on an unknown target. Our… ▽ More We address the estimation of the 6D pose of an unknown target spacecraft relative to a monocular camera, a key step towards the autonomous rendezvous and proximity operations required by future Active Debris Removal missions. We present a novel method that enables an "off-the-shelf" spacecraft pose estimator, which is supposed to known the target CAD model, to be applied on an unknown target. Our method relies on an in-the wild NeRF, i.e., a Neural Radiance Field that employs learnable appearance embeddings to represent varying illumination conditions found in natural scenes. We train the NeRF model using a sparse collection of images that depict the target, and in turn generate a large dataset that is diverse both in terms of viewpoint and illumination. This dataset is then used to train the pose estimation network. We validate our method on the Hardware-In-the-Loop images of SPEED+ that emulate lighting conditions close to those encountered on orbit. We demonstrate that our method successfully enables the training of an off-the-shelf spacecraft pose estimation network from a sparse set of images. Furthermore, we show that a network trained using our method performs similarly to a model trained on synthetic images generated using the CAD model of the target. △ Less

Submitted 11 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: Accepted at IEEE International Conference on Space Robotics 2024 (ISpaRo 2024), Workshop on Advances in Orbital Robotics: In Orbit Manipulation, Servicing, and Assembly

arXiv:2404.17930 [pdf, other]

Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments

Authors: Benoît Gérin, Anaïs Halin, Anthony Cioppa, Maxim Henry, Bernard Ghanem, Benoît Macq, Christophe De Vleeschouwer, Marc Van Droogenbroeck

Abstract: In the era of the Internet of Things (IoT), objects connect through a dynamic network, empowered by technologies like 5G, enabling real-time data sharing. However, smart objects, notably autonomous vehicles, face challenges in critical local computations due to limited resources. Lightweight AI models offer a solution but struggle with diverse data distributions. To address this limitation, we pro… ▽ More In the era of the Internet of Things (IoT), objects connect through a dynamic network, empowered by technologies like 5G, enabling real-time data sharing. However, smart objects, notably autonomous vehicles, face challenges in critical local computations due to limited resources. Lightweight AI models offer a solution but struggle with diverse data distributions. To address this limitation, we propose a novel Multi-Stream Cellular Test-Time Adaptation (MSC-TTA) setup where models adapt on the fly to a dynamic environment divided into cells. Then, we propose a real-time adaptive student-teacher method that leverages the multiple streams available in each cell to quickly adapt to changing data distributions. We validate our methodology in the context of autonomous vehicles navigating across cells defined based on location and weather conditions. To facilitate future benchmarking, we release a new multi-stream large-scale synthetic semantic segmentation dataset, called DADE, and show that our multi-stream approach outperforms a single-stream baseline. We believe that our work will open research opportunities in the IoT and 5G eras, offering solutions for real-time model adaptation. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2309.03640 [pdf, other]

doi 10.1145/3606038.3616173

Context-Aware 3D Object Localization from Single Calibrated Images: A Study of Basketballs

Authors: Marcello Davide Caio, Gabriel Van Zandycke, Christophe De Vleeschouwer

Abstract: Accurately localizing objects in three dimensions (3D) is crucial for various computer vision applications, such as robotics, autonomous driving, and augmented reality. This task finds another important application in sports analytics and, in this work, we present a novel method for 3D basketball localization from a single calibrated image. Our approach predicts the object's height in pixels in im… ▽ More Accurately localizing objects in three dimensions (3D) is crucial for various computer vision applications, such as robotics, autonomous driving, and augmented reality. This task finds another important application in sports analytics and, in this work, we present a novel method for 3D basketball localization from a single calibrated image. Our approach predicts the object's height in pixels in image space by estimating its projection onto the ground plane within the image, leveraging the image itself and the object's location as inputs. The 3D coordinates of the ball are then reconstructed by exploiting the known projection matrix. Extensive experiments on the public DeepSport dataset, which provides ground truth annotations for 3D ball location alongside camera calibration information for each image, demonstrate the effectiveness of our method, offering substantial accuracy improvements compared to recent work. Our work opens up new possibilities for enhanced ball tracking and understanding, advancing computer vision in diverse domains. The source code of this work is made publicly available at \url{https://github.com/gabriel-vanzandycke/deepsport}. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures, MMSports'23, in proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (MMSports '23), October 29, 2023, Ottawa, ON, Canada

ACM Class: I.4.6; I.4.8

arXiv:2307.06233 [pdf, other]

doi 10.1109/WACV56688.2023.00247

On the Importance of Denoising when Learning to Compress Images

Authors: Benoit Brummer, Christophe De Vleeschouwer

Abstract: Image noise is ubiquitous in photography. However, image noise is not compressible nor desirable, thus attempting to convey the noise in compressed image bitstreams yields sub-par results in both rate and distortion. We propose to explicitly learn the image denoising task when training a codec. Therefore, we leverage the Natural Image Noise Dataset, which offers a wide variety of scenes captured w… ▽ More Image noise is ubiquitous in photography. However, image noise is not compressible nor desirable, thus attempting to convey the noise in compressed image bitstreams yields sub-par results in both rate and distortion. We propose to explicitly learn the image denoising task when training a codec. Therefore, we leverage the Natural Image Noise Dataset, which offers a wide variety of scenes captured with various ISO numbers, leading to different noise levels, including insignificant ones. Given this training set, we supervise the codec with noisy-clean image pairs, and show that a single model trained based on a mixture of images with variable noise levels appears to yield best-in-class results with both noisy and clean images, achieving better rate-distortion than a compression-only model or even than a pair of denoising-then-compression models with almost one order of magnitude fewer GMac operations. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Published in 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

MSC Class: 68T07 (Primary); 68P30 (Secondary) ACM Class: I.4.2; I.4.4

arXiv:2207.04825 [pdf, other]

Forward Error Correction applied to JPEG-XS codestreams

Authors: Antoine Legrand, Benoît Macq, Christophe De Vleeschouwer

Abstract: JPEG-XS offers low complexity image compression for applications with constrained but reasonable bit-rate, and low latency. Our paper explores the deployment of JPEG-XS on lossy packet networks. To preserve low latency, Forward Error Correction (FEC) is envisioned as the protection mechanism of interest. Despite the JPEG-XS codestream is not scalable in essence, we observe that the loss of a codes… ▽ More JPEG-XS offers low complexity image compression for applications with constrained but reasonable bit-rate, and low latency. Our paper explores the deployment of JPEG-XS on lossy packet networks. To preserve low latency, Forward Error Correction (FEC) is envisioned as the protection mechanism of interest. Despite the JPEG-XS codestream is not scalable in essence, we observe that the loss of a codestream fraction impacts the decoded image quality differently, depending on whether this codestream fraction corresponds to codestream headers, to coefficients significance information, or to low/high frequency data, respectively. Hence, we propose a rate-distortion optimal unequal error protection scheme that adapts the redundancy level of Reed-Solomon codes according to the rate of channel losses and the type of information protected by the code. Our experiments demonstrate that, at 5% loss rates, it reduces the Mean Squared Error by up to 92% and 65%, compared to a transmission without and with optimal but equal protection, respectively. △ Less

Submitted 7 October, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

arXiv:2204.00003 [pdf, other]

doi 10.1109/CVPRW56347.2022.00391

Ball 3D Localization From A Single Calibrated Image

Authors: Gabriel Van Zandycke, Christophe De Vleeschouwer

Abstract: Ball 3D localization in team sports has various applications including automatic offside detection in soccer, or shot release localization in basketball. Today, this task is either resolved by using expensive multi-views setups, or by restricting the analysis to ballistic trajectories. In this work, we propose to address the task on a single image from a calibrated monocular camera by estimating b… ▽ More Ball 3D localization in team sports has various applications including automatic offside detection in soccer, or shot release localization in basketball. Today, this task is either resolved by using expensive multi-views setups, or by restricting the analysis to ballistic trajectories. In this work, we propose to address the task on a single image from a calibrated monocular camera by estimating ball diameter in pixels and use the knowledge of real ball diameter in meters. This approach is suitable for any game situation where the ball is (even partly) visible. To achieve this, we use a small neural network trained on image patches around candidates generated by a conventional ball detector. Besides predicting ball diameter, our network outputs the confidence of having a ball in the image patch. Validations on 3 basketball datasets reveals that our model gives remarkable predictions on ball 3D localization. In addition, through its confidence output, our model improves the detection rate by filtering the candidates produced by the detector. The contributions of this work are (i) the first model to address 3D ball localization on a single image, (ii) an effective method for ball 3D annotation from single calibrated images, (iii) a high quality 3D ball evaluation dataset annotated from a single viewpoint. In addition, the code to reproduce this research is be made freely available at https://github.com/gabriel-vanzandycke/deepsport. △ Less

Submitted 19 April, 2022; v1 submitted 30 March, 2022; originally announced April 2022.

Comments: 9 pages, CVSports2022

ACM Class: I.4.6; I.4.8

arXiv:2111.09172 [pdf, other]

doi 10.1109/CVPRW53098.2021.00212

End-to-end optimized image compression with competition of prior distributions

Authors: Benoit Brummer, Christophe De Vleeschouwer

Abstract: Convolutional autoencoders are now at the forefront of image compression research. To improve their entropy coding, encoder output is typically analyzed with a second autoencoder to generate per-variable parametrized prior probability distributions. We instead propose a compression scheme that uses a single convolutional autoencoder and multiple learned prior distributions working as a competition… ▽ More Convolutional autoencoders are now at the forefront of image compression research. To improve their entropy coding, encoder output is typically analyzed with a second autoencoder to generate per-variable parametrized prior probability distributions. We instead propose a compression scheme that uses a single convolutional autoencoder and multiple learned prior distributions working as a competition of experts. Trained prior distributions are stored in a static table of cumulative distribution functions. During inference, this table is used by an entropy coder as a look-up-table to determine the best prior for each spatial location. Our method offers rate-distortion performance comparable to that obtained with a predicted parametrized prior with only a fraction of its entropy coding and decoding complexity. △ Less

Submitted 17 November, 2021; originally announced November 2021.

MSC Class: 68T07 (Primary); 68P30 (Secondary) ACM Class: I.4.2

Journal ref: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

arXiv:2008.12977 [pdf, other]

Improved anomaly detection by training an autoencoder with skip connections on images corrupted with Stain-shaped noise

Authors: Anne-Sophie Collin, Christophe De Vleeschouwer

Abstract: In industrial vision, the anomaly detection problem can be addressed with an autoencoder trained to map an arbitrary image, i.e. with or without any defect, to a clean image, i.e. without any defect. In this approach, anomaly detection relies conventionally on the reconstruction residual or, alternatively, on the reconstruction uncertainty. To improve the sharpness of the reconstruction, we consid… ▽ More In industrial vision, the anomaly detection problem can be addressed with an autoencoder trained to map an arbitrary image, i.e. with or without any defect, to a clean image, i.e. without any defect. In this approach, anomaly detection relies conventionally on the reconstruction residual or, alternatively, on the reconstruction uncertainty. To improve the sharpness of the reconstruction, we consider an autoencoder architecture with skip connections. In the common scenario where only clean images are available for training, we propose to corrupt them with a synthetic noise model to prevent the convergence of the network towards the identity map**, and introduce an original Stain noise model for that purpose. We show that this model favors the reconstruction of clean images from arbitrary real-world images, regardless of the actual defects appearance. In addition to demonstrating the relevance of our approach, our validation provides the first consistent assessment of reconstruction-based methods, by comparing their performance over the MVTec AD dataset, both for pixel- and image-wise anomaly detection. △ Less

Submitted 4 November, 2020; v1 submitted 29 August, 2020; originally announced August 2020.

Comments: 8 pages, 7 figures, accepted at ICPR2020 -- Supplementary explanation about the stain noise model in section III (point B)

arXiv:2007.11876 [pdf, other]

doi 10.1145/3347318.3355517

Real-time CNN-based Segmentation Architecture for Ball Detection in a Single View Setup

Authors: Gabriel Van Zandycke, Christophe De Vleeschouwer

Abstract: This paper considers the task of detecting the ball from a single viewpoint in the challenging but common case where the ball interacts frequently with players while being poorly contrasted with respect to the background. We propose a novel approach by formulating the problem as a segmentation task solved by an efficient CNN architecture. To take advantage of the ball dynamics, the network is fed… ▽ More This paper considers the task of detecting the ball from a single viewpoint in the challenging but common case where the ball interacts frequently with players while being poorly contrasted with respect to the background. We propose a novel approach by formulating the problem as a segmentation task solved by an efficient CNN architecture. To take advantage of the ball dynamics, the network is fed with a pair of consecutive images. Our inference model can run in real time without the delay induced by a temporal analysis. We also show that test-time data augmentation allows for a significant increase the detection accuracy. As an additional contribution, we publicly release the dataset on which this work is based. △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: 8 pages, 10 figures

ACM Class: I.4

Journal ref: Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports (2019) 51-58

arXiv:2005.08768 [pdf, other]

doi 10.1109/CVPRW50498.2020.00090

Adapting JPEG XS gains and priorities to tasks and contents

Authors: Benoit Brummer, Christophe De Vleeschouwer

Abstract: Most current research in the domain of image compression focuses solely on achieving state of the art compression ratio, but that is not always usable in today's workflow due to the constraints on computing resources. Constant market requirements for a low-complexity image codec have led to the recent development and standardization of a lightweight image codec named JPEG XS. In this work we s… ▽ More Most current research in the domain of image compression focuses solely on achieving state of the art compression ratio, but that is not always usable in today's workflow due to the constraints on computing resources. Constant market requirements for a low-complexity image codec have led to the recent development and standardization of a lightweight image codec named JPEG XS. In this work we show that JPEG XS compression can be adapted to a specific given task and content, such as preserving visual quality on desktop content or maintaining high accuracy in neural network segmentation tasks, by optimizing its gain and priority parameters using the covariance matrix adaptation evolution strategy. △ Less

Submitted 6 January, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: CLIC at CVPR 2020

MSC Class: 68P30 (Primary); 68W50 (Secondary) ACM Class: E.4; I.2.6

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 164-165

arXiv:1906.00270 [pdf, other]

doi 10.1109/CVPRW.2019.00228

Natural Image Noise Dataset

Authors: Benoit Brummer, Christophe De Vleeschouwer

Abstract: Convolutional neural networks have been the focus of research aiming to solve image denoising problems, but their performance remains unsatisfactory for most applications. These networks are trained with synthetic noise distributions that do not accurately reflect the noise captured by image sensors. Some datasets of clean-noisy image pairs have been introduced but they are usually meant for bench… ▽ More Convolutional neural networks have been the focus of research aiming to solve image denoising problems, but their performance remains unsatisfactory for most applications. These networks are trained with synthetic noise distributions that do not accurately reflect the noise captured by image sensors. Some datasets of clean-noisy image pairs have been introduced but they are usually meant for benchmarking or specific applications. We introduce the Natural Image Noise Dataset (NIND), a dataset of DSLR-like images with varying levels of ISO noise which is large enough to train models for blind denoising over a wide range of noise. We demonstrate a denoising model trained with the NIND and show that it significantly outperforms BM3D on ISO noise from unseen images, even when generalizing to images from a different type of camera. The Natural Image Noise Dataset is published on Wikimedia Commons such that it remains open for curation and contributions. We expect that this dataset will prove useful for future image denoising applications. △ Less

Submitted 1 June, 2019; originally announced June 2019.

Comments: NTIRE at CVPR 2019

ACM Class: I.4.4

Showing 1–12 of 12 results for author: De Vleeschouwer, C