Skip to main content

Showing 1–50 of 165 results for author: Salzmann, M

.
  1. arXiv:2406.08894  [pdf, other

    cs.CV

    OpenMaterial: A Comprehensive Dataset of Complex Materials for 3D Reconstruction

    Authors: Zheng Dang, Jialu Huang, Fei Wang, Mathieu Salzmann

    Abstract: Recent advances in deep learning such as neural radiance fields and implicit neural representations have significantly propelled the field of 3D reconstruction. However, accurately reconstructing objects with complex optical properties, such as metals and glass, remains a formidable challenge due to their unique specular and light-transmission characteristics. To facilitate the development of solu… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2405.05858  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera

    Authors: Haixin Shi, Yinlin Hu, Daniel Koguciuk, Juan-Ting Lin, Mathieu Salzmann, David Ferstl

    Abstract: We propose an approach for reconstructing free-moving object from a monocular RGB video. Most existing methods either assume scene prior, hand pose prior, object category pose prior, or rely on local optimization with multiple sequence segments. We propose a method that allows free interaction with the object in front of a moving camera without relying on any prior, and optimizes the sequence glob… ▽ More

    Submitted 10 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2404.12378  [pdf, other

    cs.CV cs.AI cs.LG

    6Img-to-3D: Few-Image Large-Scale Outdoor Driving Scene Reconstruction

    Authors: Théo Gieruc, Marius Kästingschäfer, Sebastian Bernhard, Mathieu Salzmann

    Abstract: Current 3D reconstruction techniques struggle to infer unbounded scenes from a few images faithfully. Specifically, existing methods have high computational demands, require detailed pose information, and cannot reconstruct occluded regions reliably. We introduce 6Img-to-3D, an efficient, scalable transformer-based encoder-renderer method for single-shot image to 3D reconstruction. Our method outp… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Joint first authorship. Project page: https://6Img-to-3D.GitHub.io/ Code https://github.com/continental/6Img-to-3D

  4. arXiv:2404.07504  [pdf, other

    cs.CV cs.AI

    Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

    Authors: Yanhao Wu, Tong Zhang, Wei Ke, Congpei Qiu, Sabine Susstrunk, Mathieu Salzmann

    Abstract: In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can create a tendency for neural networks to exploit these strong dependencies, bypassing the individual object patterns. To address this challenge, we i… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  5. arXiv:2403.13683  [pdf, other

    cs.CV cs.RO

    DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses

    Authors: Chen Zhao, Tong Zhang, Zheng Dang, Mathieu Salzmann

    Abstract: Determining the relative pose of an object between two images is pivotal to the success of generalizable object pose estimation. Existing approaches typically approximate the continuous pose representation with a large number of discrete pose hypotheses, which incurs a computationally expensive process of scoring each hypothesis at test time. By contrast, we present a Deep Voxel Matching Network (… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  6. arXiv:2403.09050  [pdf, other

    cs.CV

    CLOAF: CoLlisiOn-Aware Human Flow

    Authors: Andrey Davydov, Martin Engilberge, Mathieu Salzmann, Pascal Fua

    Abstract: Even the best current algorithms for estimating body 3D shape and pose yield results that include body self-intersections. In this paper, we present CLOAF, which exploits the diffeomorphic nature of Ordinary Differential Equations to eliminate such self-intersections while still imposing body shape constraints. We show that, unlike earlier approaches to addressing this issue, ours completely elimi… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: CVPR 2024, 13 pages

  7. arXiv:2403.06546  [pdf, other

    cs.CV cs.LG

    OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation

    Authors: Baran Ozaydin, Tong Zhang, Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Unsupervised Semantic Segmentation (USS) involves segmenting images without relying on predefined labels, aiming to alleviate the burden of extensive human labeling. Existing methods utilize features generated by self-supervised models and specific priors for clustering. However, their clustering objectives are not involved in the optimization of the features during training. Additionally, due to… ▽ More

    Submitted 5 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 11 pages

  8. arXiv:2402.17062  [pdf, other

    cs.CV

    HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields

    Authors: Haozhe Qi, Chen Zhao, Mathieu Salzmann, Alexander Mathis

    Abstract: Human hands are highly articulated and versatile at handling objects. Jointly estimating the 3D poses of a hand and the object it manipulates from a monocular camera is challenging due to frequent occlusions. Thus, existing methods often rely on intermediate 3D shape representations to increase performance. These representations are typically explicit, such as 3D point clouds or meshes, and thus p… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted at CVPR 2024. 9 figures, many tables

  9. arXiv:2402.02736  [pdf, other

    cs.CV cs.LG

    Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes

    Authors: Andrey Davydov, Alexey Sidnev, Artsiom Sanakoyeu, Yuhua Chen, Mathieu Salzmann, Pascal Fua

    Abstract: When enough annotated training data is available, supervised deep-learning algorithms excel at estimating human body pose and shape using a single camera. The effects of too little such data being available can be mitigated by using other information sources, such as databases of body shapes, to learn priors. Unfortunately, such sources are not always available either. We show that, in such cases,… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 21 pages; TMLR

  10. arXiv:2312.03053  [pdf, other

    cs.CV

    DiffusionPCR: Diffusion Models for Robust Multi-Step Point Cloud Registration

    Authors: Zhi Chen, Yufan Ren, Tong Zhang, Zheng Dang, Wenbing Tao, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Point Cloud Registration (PCR) estimates the relative rigid transformation between two point clouds. We propose formulating PCR as a denoising diffusion probabilistic process, map** noisy transformations to the ground truth. However, using diffusion models for PCR has nontrivial challenges, such as adapting a generative model to a discriminative task and leveraging the estimated nonlinear transf… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  11. arXiv:2311.14155  [pdf, other

    cs.CV

    GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

    Authors: Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, Vincent Lepetit

    Abstract: We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative "templates", rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  12. arXiv:2310.18953  [pdf, other

    cs.LG cs.CV eess.IV

    TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression

    Authors: Megh Shukla, Mathieu Salzmann, Alexandre Alahi

    Abstract: Deep heteroscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood. However, recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation. While the literature addresses this by proposing alternate formulations to mitigate the impact of the predicted cov… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: ICML 2024. Please feel free to provide feedback!

  13. arXiv:2310.17359  [pdf, other

    cs.CV

    SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation

    Authors: Haobo Jiang, Mathieu Salzmann, Zheng Dang, ** Xie, Jian Yang

    Abstract: In this paper, we introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios. Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud to obtain a precise alignment with the model point cloud. Training our framework involves two operations:… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS-2023

  14. arXiv:2310.03534  [pdf, other

    cs.CV cs.RO

    3D-Aware Hypothesis & Verification for Generalizable Relative Object Pose Estimation

    Authors: Chen Zhao, Tong Zhang, Mathieu Salzmann

    Abstract: Prior methods that tackle the problem of generalizable object pose estimation highly rely on having dense views of the unseen object. By contrast, we address the scenario where only a single reference view of the object is available. Our goal then is to estimate the relative object pose between this reference view and a query image that depicts the object in a different pose. In this scenario, rob… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  15. arXiv:2309.11667  [pdf, other

    cs.CV

    Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation

    Authors: Krishna Kanth Nakka, Mathieu Salzmann

    Abstract: As 3D human pose estimation can now be achieved with very high accuracy in the supervised learning scenario, tackling the case where 3D pose annotations are not available has received increasing attention. In particular, several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one. The methods then only… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  16. arXiv:2309.11170  [pdf, other

    cs.CV

    AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration

    Authors: Zheng Dang, Mathieu Salzmann

    Abstract: In the current deep learning paradigm, the amount and quality of training data are as critical as the network architecture and its training details. However, collecting, processing, and annotating real data at scale is difficult, expensive, and time-consuming, particularly for tasks such as 3D object registration. While synthetic datasets can be created, they require expertise to design and includ… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: accepted by ICCV2023

  17. arXiv:2308.12372  [pdf, other

    cs.CV cs.CL

    Vision Transformer Adapters for Generalizable Multitask Learning

    Authors: Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contr… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023

  18. arXiv:2307.08071  [pdf, other

    cs.CV cs.HC

    Dense Multitask Learning to Reconfigure Comics

    Authors: Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives. Our MTL method can successfully identify the semantic units as well as the embedded notion of 3D in comic panels. This is a significantly c… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: CVPR 2023 Workshop. arXiv admin note: text overlap with arXiv:2205.08303

  19. arXiv:2304.10406  [pdf, other

    cs.CV

    LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

    Authors: Tang Tao, Longfei Gao, Guangrun Wang, Yixing Lao, Peng Chen, Hengshuang Zhao, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu

    Abstract: We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address thi… ▽ More

    Submitted 14 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: This paper introduces a new task of novel LiDAR view synthesis, and proposes a differentiable framework called LiDAR-NeRF with a structural regularization, as well as an object-centric multi-view LiDAR dataset called NeRF-MVL

  20. arXiv:2304.01514  [pdf, other

    cs.CV

    Robust Outlier Rejection for 3D Registration with Variational Bayes

    Authors: Haobo Jiang, Zheng Dang, Zhen Wei, ** Xie, Jian Yang, Mathieu Salzmann

    Abstract: Learning-based outlier (mismatched correspondence) rejection for robust 3D registration generally formulates the outlier removal as an inlier/outlier classification problem. The core for this to be successful is to learn the discriminative inlier/outlier feature representations. In this paper, we develop a novel variational non-local network-based outlier rejection framework for robust alignment.… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR2023

  21. arXiv:2303.16947  [pdf, other

    cs.CV cs.LG

    De-coupling and De-positioning Dense Self-supervised Learning

    Authors: Congpei Qiu, Tong Zhang, Wei Ke, Mathieu Salzmann, Sabine Süsstrunk

    Abstract: Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. Although the dense features extracted by employing segmentation maps and bounding boxes allow networks to perform SSL for each object, we show that they suffer from coupling and positional bias, which arise from the receptive field increasing… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  22. arXiv:2303.16235  [pdf, other

    cs.CV

    Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

    Authors: Yanhao Wu, Tong Zhang, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Self-supervised learning (SSL) has the potential to benefit many applications, particularly those where manually annotating data is cumbersome. One such situation is the semantic segmentation of point clouds. In this context, existing methods employ contrastive learning strategies and define positive pairs by performing various augmentation of point clusters in a single frame. As such, these metho… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: CVPR accepted

  23. arXiv:2303.13612  [pdf, other

    cs.CV

    NOPE: Novel Object Pose Estimation from a Single Image

    Authors: Van Nguyen Nguyen, Thibault Groueix, Yinlin Hu, Mathieu Salzmann, Vincent Lepetit

    Abstract: The practicality of 3D object pose estimation remains limited for many applications due to the need for prior knowledge of a 3D model and a training period for new objects. To address this limitation, we propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model and without requ… ▽ More

    Submitted 29 March, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: CVPR 2024

  24. arXiv:2303.12396  [pdf, other

    cs.CV

    Rigidity-Aware Detection for 6D Object Pose Estimation

    Authors: Yang Hai, Rui Song, Jiaojiao Li, Mathieu Salzmann, Yinlin Hu

    Abstract: Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose. However, the general object detection methods they use are ill-suited to handle cluttered scenes, thus producing poor initialization to the subsequent pose network. To address this, we propose a rigidity-aware detection method exploiting the fact that, in 6D pos… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  25. arXiv:2303.11516  [pdf, other

    cs.CV

    Linear-Covariance Loss for End-to-End Learning of 6D Pose Estimation

    Authors: Fulin Liu, Yinlin Hu, Mathieu Salzmann

    Abstract: Most modern image-based 6D object pose estimation methods learn to predict 2D-3D correspondences, from which the pose can be obtained using a PnP solver. Because of the non-differentiable nature of common PnP solvers, these methods are supervised via the individual correspondences. To address this, several methods have designed differentiable PnP strategies, thus imposing supervision on the pose o… ▽ More

    Submitted 8 October, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  26. arXiv:2303.09219  [pdf, other

    cs.CV

    MixCycle: Mixup Assisted Semi-Supervised 3D Single Object Tracking with Cycle Consistency

    Authors: Qiao Wu, Jiaqi Yang, Kun Sun, Chu'ai Zhang, Yanning Zhang, Mathieu Salzmann

    Abstract: 3D single object tracking (SOT) is an indispensable part of automated driving. Existing approaches rely heavily on large, densely labeled datasets. However, annotating point clouds is both costly and time-consuming. Inspired by the great success of cycle tracking in unsupervised 2D SOT, we introduce the first semi-supervised approach to 3D SOT. Specifically, we introduce two cycle-consistency stra… ▽ More

    Submitted 16 August, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV23

  27. arXiv:2303.06753  [pdf, other

    cs.CV cs.LG cs.RO

    Modular Quantization-Aware Training: Increasing Accuracy by Decreasing Precision in 6D Object Pose Estimation

    Authors: Saqib Javed, Chengkun Li, Andrew Price, Yinlin Hu, Mathieu Salzmann

    Abstract: Edge applications, such as collaborative robotics and spacecraft rendezvous, demand efficient 6D object pose estimation on resource-constrained embedded platforms. Existing 6D pose estimation networks are often too large for such deployments, necessitating compression while maintaining reliable performance. To address this challenge, we introduce Modular Quantization-Aware Training (MQAT), an adap… ▽ More

    Submitted 28 November, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

  28. arXiv:2301.05499  [pdf, other

    cs.CV

    CLIP the Gap: A Single Domain Generalization Approach for Object Detection

    Authors: Vidit Vidit, Martin Engilberge, Mathieu Salzmann

    Abstract: Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to levera… ▽ More

    Submitted 6 March, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

  29. arXiv:2301.05496  [pdf, other

    cs.CV

    Learning Transformations To Reduce the Geometric Shift in Object Detection

    Authors: Vidit Vidit, Martin Engilberge, Mathieu Salzmann

    Abstract: The performance of modern object detectors drops when the test distribution differs from the training one. Most of the methods that address this focus on object appearance changes caused by, e.g., different illumination conditions, or gaps between synthetic and real images. Here, by contrast, we tackle geometric shifts emerging from variations in the image capture process, or due to the constraint… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  30. arXiv:2301.02315  [pdf, other

    cs.CV

    TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

    Authors: Bahar Aydemir, Ludo Hoffstetter, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

    Abstract: Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information, such as scene context, semantic relationships, gaze direction, and object dissimilarity. However, none of these models consider the temporal nature of gaze shifts during image observation. We introduce a novel saliency prediction model that learns to output saliency maps i… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: 10 pages, 7 figures

  31. arXiv:2212.14397  [pdf, other

    cs.CV

    AttEntropy: Segmenting Unknown Objects in Complex Scenes using the Spatial Attention Entropy of Semantic Segmentation Transformers

    Authors: Krzysztof Lis, Matthias Rottmann, Sina Honari, Pascal Fua, Mathieu Salzmann

    Abstract: Vision transformers have emerged as powerful tools for many computer vision tasks. It has been shown that their features and class tokens can be used for salient object segmentation. However, the properties of segmentation transformers remain largely unstudied. In this work we conduct an in-depth study of the spatial attentions of different backbone layers of semantic segmentation transformers and… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    ACM Class: I.4.6; I.4.8; I.5.4

  32. arXiv:2212.13253  [pdf, other

    cs.CV

    DSI2I: Dense Style for Unpaired Image-to-Image Translation

    Authors: Baran Ozaydin, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar, without ground-truth input-translation pairs. Existing UEI2I methods represent style using one vector per image or rely on semantic supervision to define one style vector per object. Here, in contrast, we propose to represent style as a den… ▽ More

    Submitted 1 May, 2024; v1 submitted 26 December, 2022; originally announced December 2022.

    Comments: To appear on TMLR '24, Reviewed on OpenReview: https://openreview.net/forum?id=mrJi5kdKA4

  33. arXiv:2211.16290  [pdf, other

    cs.CV cs.RO

    LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation

    Authors: Chen Zhao, Yinlin Hu, Mathieu Salzmann

    Abstract: Object location prior is critical for the standard 6D object pose estimation setting. The prior can be used to initialize the 3D object translation and facilitate 3D object rotation estimation. Unfortunately, the object detectors that are used for this purpose do not generalize to unseen objects. Therefore, existing 6D pose estimation methods for unseen objects either assume the ground-truth objec… ▽ More

    Submitted 6 February, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted by 3DV2024

  34. arXiv:2211.12829  [pdf, other

    cs.CV cs.LG

    Unsupervised 3D Keypoint Discovery with Multi-View Geometry

    Authors: Sina Honari, Chen Zhao, Mathieu Salzmann, Pascal Fua

    Abstract: Analyzing and training 3D body posture models depend heavily on the availability of joint labels that are commonly acquired through laborious manual annotation of body joints or via marker-based joint localization using carefully curated markers and capturing systems. However, such annotations are not always available, especially for people performing unusual activities. In this paper, we propose… ▽ More

    Submitted 7 February, 2024; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted in "3DV 2024"

  35. arXiv:2211.11277  [pdf, other

    cs.CV

    DrapeNet: Garment Generation and Self-Supervised Dra**

    Authors: Luca De Luigi, Ren Li, Benoît Guillard, Mathieu Salzmann, Pascal Fua

    Abstract: Recent approaches to drape garments quickly over arbitrary human bodies leverage self-supervision to eliminate the need for large training sets. However, they are designed to train one network per clothing item, which severely limits their generalization abilities. In our work, we rely on self-supervision to train a single network to drape multiple garments. This is achieved by predicting a 3D def… ▽ More

    Submitted 22 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

  36. arXiv:2210.03954  [pdf, other

    cs.CV

    Contact-aware Human Motion Forecasting

    Authors: Wei Mao, Miaomiao Liu, Richard Hartley, Mathieu Salzmann

    Abstract: In this paper, we tackle the task of scene-aware 3D human motion forecasting, which consists of predicting future human poses given a 3D scene and a past human motion. A key challenge of this task is to ensure consistency between the human and the scene, accounting for human-scene interactions. Previous attempts to do so model such interactions only implicitly, and thus tend to produce artifacts s… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS2022

  37. Perspective Aware Road Obstacle Detection

    Authors: Krzysztof Lis, Sina Honari, Pascal Fua, Mathieu Salzmann

    Abstract: While road obstacle detection techniques have become increasingly effective, they typically ignore the fact that, in practice, the apparent size of the obstacles decreases as their distance to the vehicle increases. In this paper, we account for this by computing a scale map encoding the apparent size of a hypothetical object at every image location. We then leverage this perspective map to (i) ge… ▽ More

    Submitted 19 June, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    ACM Class: I.4.6; I.4.8; I.5.4

    Journal ref: IEEE Robotics and Automation Letters ( Volume: 8, Issue: 4, April 2023, Pages: 2150-2157)

  38. arXiv:2208.03257  [pdf, other

    cs.CV

    3D Pose Based Feedback for Physical Exercises

    Authors: Ziyi Zhao, Sena Kiciroglu, Hugues Vinzant, Yuan Cheng, Isinsu Katircioglu, Mathieu Salzmann, Pascal Fua

    Abstract: Unsupervised self-rehabilitation exercises and physical training can cause serious injuries if performed incorrectly. We introduce a learning-based framework that identifies the mistakes made by a user and proposes corrective measures for easier and safer individual training. Our framework does not rely on hard-coded, heuristic rules. Instead, it learns them from data, which facilitates its adapta… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    Comments: Video: https://youtu.be/W3kyyeHe0SI

  39. arXiv:2206.02417  [pdf, other

    cs.LG

    Fast Adversarial Training with Adaptive Step Size

    Authors: Zhichao Huang, Yanbo Fan, Chen Liu, Weizhong Zhang, Yong Zhang, Mathieu Salzmann, Sabine Süsstrunk, Jue Wang

    Abstract: While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet. The key idea of recent works to accelerate adversarial training is to substitute multi-step attacks (e.g., PGD) with single-step attacks (e.g., FGSM). However, these single-ste… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  40. arXiv:2205.15608  [pdf, other

    cs.CV cs.AI

    Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction

    Authors: Wei Mao, Miaomiao Liu, Mathieu Salzmann

    Abstract: We introduce the task of action-driven stochastic human motion prediction, which aims to predict multiple plausible future motions given a sequence of action labels and a short motion history. This differs from existing works, which predict motions that either do not respect any specific action category, or follow a single action label. In particular, addressing this task requires tackling two cha… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

    Comments: CVPR2022 (Oral)

  41. arXiv:2205.14971  [pdf, other

    cs.CV cs.LG

    Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions

    Authors: Shuxuan Guo, Yinlin Hu, Jose M. Alvarez, Mathieu Salzmann

    Abstract: Knowledge distillation facilitates the training of a compact student network by using a deep teacher one. While this has achieved great success in many tasks, it remains completely unstudied for image-based 6D object pose estimation. In this work, we introduce the first knowledge distillation method driven by the 6D pose estimation task. To this end, we observe that most modern 6D pose estimation… ▽ More

    Submitted 28 November, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

  42. arXiv:2205.08303  [pdf, other

    cs.CV

    MulT: An End-to-End Multitask Learning Transformer

    Authors: Deblina Bhattacharjee, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for e… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR 2022

  43. arXiv:2203.17234  [pdf, other

    cs.CV

    Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

    Authors: Van Nguyen Nguyen, Yinlin Hu, Yang Xiao, Mathieu Salzmann, Vincent Lepetit

    Abstract: We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions. Our method requires neither a training phase on these objects nor real images depicting them, only their CAD models. It relies on a small set of training objects to learn local object representations, which allow us to locally match the input image to a set of "templates", ren… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  44. arXiv:2203.17205  [pdf, other

    cs.CV

    Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy

    Authors: Tong Zhang, Congpei Qiu, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Self-supervised learning (SSL) methods aim to learn view-invariant representations by maximizing the similarity between the features extracted from different crops of the same image regardless of crop** size and content. In essence, this strategy ignores the fact that two crops may truly contain different image information, e.g., background and small objects, and thus tends to restrain the diver… ▽ More

    Submitted 13 April, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: accepted in CVPR 2022

  45. arXiv:2203.15309  [pdf, other

    cs.CV

    MatchNorm: Learning-based Point Cloud Registration for 6D Object Pose Estimation in the Real World

    Authors: Zheng Dang, Lizhou Wang, Yu Guo, Mathieu Salzmann

    Abstract: In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. While recent learning-based approaches to addressing this task have shown great success on synthetic datasets, we have observed them to fail in the presence of real-world data. We thus analyze the causes of these failures, which we trace back to the difference between the feature distributions of the sou… ▽ More

    Submitted 23 August, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: ECCV2022 accepted. arXiv admin note: text overlap with arXiv:2111.10399

  46. arXiv:2203.09836  [pdf, other

    cs.CV cs.RO

    Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation

    Authors: Yinlin Hu, Pascal Fua, Mathieu Salzmann

    Abstract: Most recent 6D object pose estimation methods, including unsupervised ones, require many real training images. Unfortunately, for some applications, such as those in space or deep under water, acquiring real images, even unannotated, is virtually impossible. In this paper, we propose a method that can be trained solely on synthetic images, or optionally using a few additional real ones. Given a ro… ▽ More

    Submitted 18 July, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: ECCV 2022

  47. arXiv:2203.08472  [pdf, other

    cs.CV cs.RO

    Fusing Local Similarities for Retrieval-based 3D Orientation Estimation of Unseen Objects

    Authors: Chen Zhao, Yinlin Hu, Mathieu Salzmann

    Abstract: In this paper, we tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images. This task contrasts with the one considered by most existing deep learning methods which typically assume that the testing objects have been observed during training. To handle the unseen objects, we follow a retrieval-based strategy and prevent the network from learning object-sp… ▽ More

    Submitted 22 July, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted by ECCV 2022

  48. arXiv:2202.01341  [pdf, other

    cs.LG

    Robust Binary Models by Pruning Randomly-initialized Networks

    Authors: Chen Liu, Ziqi Zhao, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: Robustness to adversarial attacks was shown to require a larger model capacity, and thus a larger memory footprint. In this paper, we introduce an approach to obtain robust yet compact models by pruning randomly-initialized binary networks. Unlike adversarial training, which learns the model parameters, we initialize the model parameters as either +1 or -1, keep them fixed, and find a subnetwork s… ▽ More

    Submitted 15 October, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Accepted as NeurIPS 2022 paper

  49. arXiv:2112.07324  [pdf, other

    cs.LG

    On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training

    Authors: Chen Liu, Zhichao Huang, Mathieu Salzmann, Tong Zhang, Sabine Süsstrunk

    Abstract: Adversarial training is a popular method to robustify models against adversarial attacks. However, it exhibits much more severe overfitting than training on clean inputs. In this work, we investigate this phenomenon from the perspective of training instances, i.e., training input-target pairs. Based on a quantitative metric measuring instances' difficulty, we analyze the model's behavior on traini… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

  50. arXiv:2112.04203  [pdf, other

    cs.CV

    Adversarial Parametric Pose Prior

    Authors: Andrey Davydov, Anastasia Remizova, Victor Constantin, Sina Honari, Mathieu Salzmann, Pascal Fua

    Abstract: The Skinned Multi-Person Linear (SMPL) model can represent a human body by map** pose and shape parameters to body meshes. This has been shown to facilitate inferring 3D human pose and shape from images via different learning models. However, not all pose and shape parameter values yield physically-plausible or even realistic body meshes. In other words, SMPL is under-constrained and may thus le… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.