Skip to main content

Showing 1–50 of 66 results for author: Kortylewski, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09613  [pdf, other

    cs.CV

    ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

    Authors: Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille

    Abstract: A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e.g., class name and bounding box) and 3D information (e.g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images. This is a challenging task, as it involves inferring 3D information from 2D signals and most importantly, generalizing to rigid objects from unseen categori… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2406.07163  [pdf, other

    cs.CV

    FaceGPT: Self-supervised Learning to Chat about 3D Human Faces

    Authors: Haoran Wang, Mohit Mendiratta, Christian Theobalt, Adam Kortylewski

    Abstract: We introduce FaceGPT, a self-supervised learning framework for Large Vision-Language Models (VLMs) to reason about 3D human faces from images and text. Typical 3D face reconstruction methods are specialized algorithms that lack semantic reasoning capabilities. FaceGPT overcomes this limitation by embedding the parameters of a 3D morphable face model (3DMM) into the token space of a VLM, enabling t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2406.04322  [pdf, other

    cs.CV

    DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

    Authors: Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, Alan Yuille

    Abstract: We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets (represented by Neural Radiance Fields) from text prompts. Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024. Code: https://github.com/qihao067/direct3d Project page: https://direct-3d.github.io/

  4. arXiv:2406.00622  [pdf, other

    cs.CV cs.AI

    Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

    Authors: Xingrui Wang, Wufei Ma, Angtian Wang, Shuo Chen, Adam Kortylewski, Alan Yuille

    Abstract: For vision-language models (VLMs), understanding the dynamic properties of objects and their interactions within 3D scenes from video is crucial for effective reasoning. In this work, we introduce a video question answering dataset SuperCLEVR-Physics that focuses on the dynamics properties of objects. We concentrate on physical concepts -- velocity, acceleration, and collisions within 4D scenes, w… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  5. arXiv:2405.17531  [pdf, other

    cs.CV

    Evolutive Rendering Models

    Authors: Fangneng Zhan, Hanxue Liang, Yifan Wang, Michael Niemeyer, Michael Oechsle, Adam Kortylewski, Cengiz Oztireli, Gordon Wetzstein, Christian Theobalt

    Abstract: The landscape of computer graphics has undergone significant transformations with the recent advances of differentiable rendering models. These rendering models often rely on heuristic designs that may not fully align with the final rendering objectives. We address this gap by pioneering \textit{evolutive rendering models}, a methodology where rendering models possess the ability to evolve and ada… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://fnzhan.com/Evolutive-Rendering-Models/

  6. arXiv:2405.16947  [pdf, other

    cs.CV

    Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models

    Authors: Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta, Fangneng Zhan, Adam Kortylewski, Christian Theobalt, Peter Wonka

    Abstract: We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep understanding of image semantics. Yet, the majority of these approaches have focused on image-related tasks like semantic correspondence and segmentation, w… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project webpage: https://qianwangx.github.io/VidSeg_diffusion/

  7. arXiv:2404.05626  [pdf, other

    cs.CV

    Learning a Category-level Object Pose Estimator without Pose Annotations

    Authors: Fengrui Tian, Yaoyao Liu, Adam Kortylewski, Yueqi Duan, Shaoyi Du, Alan Yuille, Angtian Wang

    Abstract: 3D object pose estimation is a challenging task. Previous works always require thousands of object images with annotated poses for learning the 3D pose correspondence, which is laborious and time-consuming for labeling. In this paper, we propose to learn a category-level 3D object pose estimator without pose annotations. Instead of using manually annotated images, we leverage diffusion models (e.g… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  8. arXiv:2403.07277  [pdf, other

    cs.CV cs.AI

    A Bayesian Approach to OOD Robustness in Image Classification

    Authors: Prakhar Kaushik, Adam Kortylewski, Alan Yuille

    Abstract: An important and unsolved problem in computer vision is to ensure that the algorithms are robust to changes in image domains. We address this problem in the scenario where we have access to images from the target domains but no annotations. Motivated by the challenges of the OOD-CV benchmark where we encounter real world Out-of-Domain (OOD) nuisances and occlusion, we introduce a novel Bayesian ap… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  9. arXiv:2401.10848  [pdf, other

    cs.CV cs.AI

    Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation

    Authors: Prakhar Kaushik, Aayush Mishra, Adam Kortylewski, Alan Yuille

    Abstract: We consider the problem of source-free unsupervised category-level pose estimation from only RGB images to a target domain without any access to source domain data or 3D annotations during adaptation. Collecting and annotating real-world 3D data and corresponding images is laborious, expensive, yet unavoidable process, since even 3D pose domain adaptation methods require 3D data in the target doma… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 36 pages, 9 figures, 50 tables; ICLR 2024 (Poster)

  10. arXiv:2312.11587  [pdf, other

    cs.CV

    Relightable Neural Actor with Intrinsic Decomposition and Pose Control

    Authors: Diogo Luvizon, Vladislav Golyanik, Adam Kortylewski, Marc Habermann, Christian Theobalt

    Abstract: Creating a digital human avatar that is relightable, drivable, and photorealistic is a challenging and important problem in Vision and Graphics. Humans are highly articulated creating pose-dependent appearance effects like self-shadows and wrinkles, and skin as well as clothing require complex and space-varying BRDF models. While recent human relighting approaches can recover plausible material-li… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Project page: https://people.mpi-inf.mpg.de/~dluvizon/relightable-neural-actor/

  11. arXiv:2312.05941  [pdf, other

    cs.CV

    ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering

    Authors: Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann

    Abstract: Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photore… ▽ More

    Submitted 15 April, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: For project page, see https://vcai.mpi-inf.mpg.de/projects/ash/

  12. arXiv:2311.18266  [pdf, other

    cs.CV

    Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental Learning

    Authors: Ruxiao Duan, Yaoyao Liu, Jieneng Chen, Adam Kortylewski, Alan Yuille

    Abstract: Replay-based methods in class-incremental learning (CIL) have attained remarkable success, as replaying the exemplars of old classes can significantly mitigate catastrophic forgetting. Despite their effectiveness, the inherent memory restrictions of CIL result in saving a limited number of exemplars with poor diversity, leading to data imbalance and overfitting issues. In this paper, we introduce… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: Code: https://github.com/KerryDRX/ESCORT

  13. arXiv:2311.12063  [pdf, other

    cs.CV

    DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields

    Authors: Yu Chi, Fangneng Zhan, Sibo Wu, Christian Theobalt, Adam Kortylewski

    Abstract: Progress in 3D computer vision tasks demands a huge amount of data, yet annotating multi-view images with 3D-consistent annotations, or point clouds with part segmentation is both time-consuming and challenging. This paper introduces DatasetNeRF, a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations, while utilizing minima… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  14. arXiv:2310.17914  [pdf, other

    cs.CV cs.CL

    3D-Aware Visual Question Answering about Parts, Poses and Occlusions

    Authors: Xingrui Wang, Wufei Ma, Zhuowan Li, Adam Kortylewski, Alan Yuille

    Abstract: Despite rapid progress in Visual question answering (VQA), existing datasets and models mainly focus on testing reasoning in 2D. However, it is important that VQA models also understand the 3D structure of visual scenes, for example to support tasks like navigation or manipulation. This includes an understanding of the 3D object pose, their parts and occlusions. In this work, we introduce the task… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS2023

  15. arXiv:2308.11737  [pdf, other

    cs.CV cs.LG

    Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape

    Authors: Jiacong Xu, Yi Zhang, Jiawei Peng, Wufei Ma, Artur Jesslen, Pengliang Ji, Qixin Hu, Jiehua Zhang, Qihao Liu, Jiahao Wang, Wei Ji, Chen Wang, Xiaoding Yuan, Prakhar Kaushik, Guofeng Zhang, Jie Liu, Yushan Xie, Yawen Cui, Alan Yuille, Adam Kortylewski

    Abstract: Accurately estimating the 3D pose and shape is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. However, research in this area is held back by the lack of a comprehensive and diverse dataset with high-quality 3D pose and shape annotations. In this paper, we propose Animal3D, the first comprehensive dat… ▽ More

    Submitted 20 January, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures, link to the dataset: https://xujiacong.github.io/Animal3D/

  16. arXiv:2308.10123  [pdf, other

    cs.CV cs.AI

    3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation

    Authors: Yi Zhang, Pengliang Ji, Angtian Wang, Jieru Mei, Adam Kortylewski, Alan Yuille

    Abstract: Regression-based methods for 3D human pose estimation directly predict the 3D pose parameters from a 2D image using deep networks. While achieving state-of-the-art performance on standard benchmarks, their performance degrades under occlusion. In contrast, optimization-based methods fit a parametric body model to 2D features in an iterative manner. The localized reconstruction loss can potentially… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: ICCV 2023, project page: https://3dnbf.github.io/

  17. arXiv:2306.08103  [pdf, other

    cs.CV

    Generating Images with 3D Annotations Using Diffusion Models

    Authors: Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille

    Abstract: Diffusion models have emerged as a powerful generative method, capable of producing stunning photo-realistic images from natural language descriptions. However, these models lack explicit control over the 3D structure in the generated images. Consequently, this hinders our ability to obtain detailed 3D annotations for the generated images or to craft instances with specific poses and distances. In… ▽ More

    Submitted 3 April, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 Spotlight. Code: https://ccvl.jhu.edu/3D-DST/

  18. arXiv:2306.00974  [pdf, other

    cs.CV

    Discovering Failure Modes of Text-guided Diffusion Models via Adversarial Search

    Authors: Qihao Liu, Adam Kortylewski, Yutong Bai, Song Bai, Alan Yuille

    Abstract: Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. Common failures include: (i) natural-looking text prompts generating images with the wrong content, or (ii) different random samples of the latent variables that generate vastly different, and even unrelated, outputs despite being conditioned on the same text prompt. In this work, we aim to study and understand the f… ▽ More

    Submitted 29 November, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Project page: https://sage-diffusion.github.io/

  19. arXiv:2306.00547  [pdf, other

    cs.CV cs.GR

    AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

    Authors: Mohit Mendiratta, Xingang Pan, Mohamed Elgharib, Kartik Teotia, Mallikarjun B R, Ayush Tewari, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt

    Abstract: Capturing and editing full head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled through different input data modalities, including RGB, audio, depth, IMUs and others. While these data modalities provide… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 17 pages, 17 figures. Project page: https://vcai.mpi-inf.mpg.de/projects/AvatarStudio/

  20. arXiv:2306.00118  [pdf, other

    cs.CV

    Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis

    Authors: Angtian Wang, Wufei Ma, Alan Yuille, Adam Kortylewski

    Abstract: Human vision demonstrates higher robustness than current AI algorithms under out-of-distribution scenarios. It has been conjectured such robustness benefits from performing analysis-by-synthesis. Our paper formulates triple vision tasks in a consistent manner using approximate analysis-by-synthesis by render-and-compare algorithms on neural features. In this work, we introduce Neural Textured Defo… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  21. arXiv:2305.16124  [pdf, other

    cs.CV

    Robust Category-Level 3D Pose Estimation from Synthetic Data

    Authors: Jiahao Yang, Wufei Ma, Angtian Wang, Xiaoding Yuan, Alan Yuille, Adam Kortylewski

    Abstract: Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training data is a viable alternative, the domain shift between real and synthetic data is a significant challenge. In this work, we aim to narrow the perform… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  22. arXiv:2305.14668  [pdf, other

    cs.CV

    Robust 3D-aware Object Classification via Discriminative Render-and-Compare

    Authors: Artur Jesslen, Guofeng Zhang, Angtian Wang, Alan Yuille, Adam Kortylewski

    Abstract: In real-world applications, it is essential to jointly estimate the 3D object pose and class label of objects, i.e., to perform 3D-aware classification.While current approaches for either image classification or pose estimation can be extended to 3D-aware classification, we observe that they are inherently limited: 1) Their performance is much lower compared to the respective single-task models, a… ▽ More

    Submitted 5 June, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  23. arXiv:2305.03462  [pdf, other

    cs.CV cs.GR

    General Neural Gauge Fields

    Authors: Fangneng Zhan, Lingjie Liu, Adam Kortylewski, Christian Theobalt

    Abstract: The recent advance of neural fields, such as neural radiance fields, has significantly pushed the boundary of scene representation learning. Aiming to boost the computation efficiency and rendering quality of 3D scenes, a popular line of research maps the 3D coordinate system to another measuring system, e.g., 2D manifolds and hash tables, for modeling neural fields. The conversion of coordinate s… ▽ More

    Submitted 7 February, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: ICLR 2023

  24. arXiv:2304.10266  [pdf, other

    cs.CV

    OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

    Authors: Bingchen Zhao, Jiahao Wang, Wufei Ma, Artur Jesslen, Siwei Yang, Shaozuo Yu, Oliver Zendel, Christian Theobalt, Alan Yuille, Adam Kortylewski

    Abstract: Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV-v2, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and th… ▽ More

    Submitted 26 July, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2111.14341

  25. arXiv:2303.07337  [pdf, other

    cs.CV

    PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

    Authors: Qihao Liu, Adam Kortylewski, Alan Yuille

    Abstract: Human pose and shape (HPS) estimation methods achieve remarkable results. However, current HPS benchmarks are mostly designed to test models in scenarios that are similar to the training data. This can lead to critical situations in real-world applications when the observed data differs significantly from the training data and hence is out-of-distribution (OOD). It is therefore important to test a… ▽ More

    Submitted 30 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023; Code: https://github.com/qihao067/PoseExaminer

  26. arXiv:2301.05175  [pdf, other

    cs.CV

    Scene-Aware 3D Multi-Human Motion Capture from a Single Camera

    Authors: Diogo Luvizon, Marc Habermann, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt

    Abstract: In this work, we consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. In contrast to expensive marker-based or multi-view systems, our lightweight setup is ideal for private users as it enables an affordable 3D motion capture that is easy to install and does not require e… ▽ More

    Submitted 27 March, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Accepted to Eurographics 2023. See also github: https://github.com/dluvizon/scene-aware-3d-multi-human project page: https://vcai.mpi-inf.mpg.de/projects/scene-aware-3d-multi-human/

  27. arXiv:2212.00259  [pdf, other

    cs.CV cs.CL

    Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

    Authors: Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille

    Abstract: Visual Question Answering (VQA) models often perform poorly on out-of-distribution data and struggle on domain generalization. Due to the multi-modal nature of this task, multiple factors of variation are intertwined, making generalization difficult to analyze. This motivates us to introduce a virtual benchmark, Super-CLEVR, where different factors in VQA domain shifts can be isolated in order tha… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

    Comments: Published in CVPR 2023 as Highlight. Data and code are released at https://github.com/Lizw14/Super-CLEVR

  28. arXiv:2210.15664  [pdf, other

    cs.CV cs.GR

    State of the Art in Dense Monocular Non-Rigid 3D Reconstruction

    Authors: Edith Tretschk, Navami Kairanda, Mallikarjun B R, Rishabh Dabral, Adam Kortylewski, Bernhard Egger, Marc Habermann, Pascal Fua, Christian Theobalt, Vladislav Golyanik

    Abstract: 3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since -- without additional prior assumptions -- it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational… ▽ More

    Submitted 24 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: 36 pages, 18 figures, 3 tables; State-of-the-Art Report at EUROGRAPHICS 2023

    Journal ref: Computer Graphics Forum, 2023

  29. arXiv:2210.01692  [pdf, other

    cs.CV

    HandFlow: Quantifying View-Dependent 3D Ambiguity in Two-Hand Reconstruction with Normalizing Flow

    Authors: Jiayi Wang, Diogo Luvizon, Franziska Mueller, Florian Bernard, Adam Kortylewski, Dan Casas, Christian Theobalt

    Abstract: Reconstructing two-hand interactions from a single image is a challenging problem due to ambiguities that stem from projective geometry and heavy occlusions. Existing methods are designed to estimate only a single pose, despite the fact that there exist other valid reconstructions that fit the image evidence equally well. In this paper we propose to address this issue by explicitly modeling the di… ▽ More

    Submitted 4 October, 2022; originally announced October 2022.

    Comments: VMV 2022 - Symposium on Vision, Modeling, and Visualization

  30. arXiv:2209.05624  [pdf, other

    cs.CV

    Robust Category-Level 6D Pose Estimation with Coarse-to-Fine Rendering of Neural Features

    Authors: Wufei Ma, Angtian Wang, Alan Yuille, Adam Kortylewski

    Abstract: We consider the problem of category-level 6D pose estimation from a single RGB image. Our approach represents an object category as a cuboid mesh and learns a generative model of the neural feature activations at each mesh vertex to perform pose estimation through differentiable rendering. A common problem of rendering-based approaches is that they rely on bounding box proposals, which do not conv… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Journal ref: ECCV 2022

  31. arXiv:2205.15401  [pdf, other

    cs.GR cs.CV

    VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis

    Authors: Angtian Wang, Peng Wang, Jian Sun, Adam Kortylewski, Alan Yuille

    Abstract: The Gaussian reconstruction kernels have been proposed by Westover (1990) and studied by the computer graphics community back in the 90s, which gives an alternative representation of object 3D geometry from meshes and point clouds. On the other hand, current state-of-the-art (SoTA) differentiable renderers, Liu et al. (2019), use rasterization to collect triangles or points on each image pixel and… ▽ More

    Submitted 28 January, 2024; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: Accepted by ICLR2023

  32. arXiv:2204.02285  [pdf, other

    cs.CV cs.CL cs.LG

    SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

    Authors: Vipul Gupta, Zhuowan Li, Adam Kortylewski, Chenyu Zhang, Yingwei Li, Alan Yuille

    Abstract: While Visual Question Answering (VQA) has progressed rapidly, previous works raise concerns about robustness of current VQA models. In this work, we study the robustness of VQA models from a novel perspective: visual context. We suggest that the models over-rely on the visual context, i.e., irrelevant objects in the image, to make predictions. To diagnose the model's reliance on visual context and… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 11 pages, Computer Vision and Pattern Recognition 2022

  33. arXiv:2112.13592  [pdf, other

    cs.CV

    Multimodal Image Synthesis and Editing: The Generative AI Era

    Authors: Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Shijian Lu, Lingjie Liu, Adam Kortylewski, Christian Theobalt, Eric Xing

    Abstract: As information exists in various modalities in real world, effective interaction and fusion among multimodal information plays a key role for the creation and perception of multimodal data in computer vision and deep learning research. With superb power in modeling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years.… ▽ More

    Submitted 24 August, 2023; v1 submitted 27 December, 2021; originally announced December 2021.

    Comments: TPAMI 2023

  34. arXiv:2112.00933  [pdf, other

    cs.CV

    PartImageNet: A Large, High-Quality Dataset of Parts

    Authors: Ju He, Shuo Yang, Shaokang Yang, Adam Kortylewski, Xiaoding Yuan, Jie-Neng Chen, Shuai Liu, Cheng Yang, Qihang Yu, Alan Yuille

    Abstract: It is natural to represent objects in terms of their parts. This has the potential to improve the performance of algorithms for object recognition and segmentation but can also help for downstream tasks like activity recognition. Research on part-based models, however, is hindered by the lack of datasets with per-pixel part annotations. This is partly due to the difficulty and high cost of annotat… ▽ More

    Submitted 16 December, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

  35. arXiv:2111.14341  [pdf, other

    cs.CV cs.AI

    OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

    Authors: Bingchen Zhao, Shaozuo Yu, Wufei Ma, Mingxin Yu, Shenxiao Mei, Angtian Wang, Ju He, Alan Yuille, Adam Kortylewski

    Abstract: Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the w… ▽ More

    Submitted 6 October, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Project webpage: http://bzhao.me/OOD-CV/, this work is accepted as Oral at ECCV 2022

  36. arXiv:2110.14213  [pdf, other

    cs.CV

    Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

    Authors: Angtian Wang, Shenxiao Mei, Alan Yuille, Adam Kortylewski

    Abstract: We study the problem of learning to estimate the 3D object pose from a few labelled examples and a collection of unlabelled data. Our main contribution is a learning framework, neural view synthesis and matching, that can transfer the 3D pose annotation from the labelled to unlabelled images reliably, despite unseen 3D views and nuisance variations such as the object shape, texture, illumination o… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021; Code is available under https://github.com/Angtian/NeuralVS

  37. arXiv:2110.13846  [pdf, other

    cs.CV

    A Light-weight Interpretable Compositional Model for Nuclei Detection and Weakly-Supervised Segmentation

    Authors: Yixiao Zhang, Adam Kortylewski, Qing Liu, Seyoun Park, Benjamin Green, Elizabeth Engle, Guillermo Almodovar, Ryan Walk, Sigfredo Soto-Diaz, Janis Taube, Alex Szalay, Alan Yuille

    Abstract: The field of computational pathology has witnessed great advancements since deep neural networks have been widely applied. These networks usually require large numbers of annotated data to train vast parameters. However, it takes significant effort to annotate a large histopathology dataset. We introduce a light-weight and interpretable model for nuclei detection and weakly-supervised segmentation… ▽ More

    Submitted 9 August, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  38. arXiv:2107.05637  [pdf, other

    cs.CV

    Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms

    Authors: Chenglin Yang, Siyuan Qiao, Adam Kortylewski, Alan Yuille

    Abstract: Self-Attention has become prevalent in computer vision models. Inspired by fully connected Conditional Random Fields (CRFs), we decompose self-attention into local and context terms. They correspond to the unary and binary terms in CRF and are implemented by attention mechanisms with projection matrices. We observe that the unary terms only make small contributions to the outputs, and meanwhile st… ▽ More

    Submitted 28 November, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

  39. arXiv:2106.09614  [pdf, other

    cs.CV

    Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation

    Authors: Chunlu Li, Andreas Morel-Forster, Thomas Vetter, Bernhard Egger, Adam Kortylewski

    Abstract: In this work, we aim to enhance model-based face reconstruction by avoiding fitting the model to outliers, i.e. regions that cannot be well-expressed by the model such as occluders or make-up. The core challenge for localizing outliers is that they are highly variable and difficult to annotate. To overcome this challenging problem, we introduce a joint Face-autoencoder and outlier segmentation app… ▽ More

    Submitted 21 March, 2023; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: 20 pages, CVPR2023

  40. arXiv:2106.04569  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Simulated Adversarial Testing of Face Recognition Models

    Authors: Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff

    Abstract: Most machine learning models are validated and tested on fixed datasets. This can give an incomplete picture of the capabilities and weaknesses of the model. Such weaknesses can be revealed at test time in the real world. The risks involved in such failures can be loss of profits, loss of time or even loss of life in certain critical applications. In order to alleviate this issue, simulators can b… ▽ More

    Submitted 31 May, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Published at IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  41. arXiv:2106.00209  [pdf, other

    cs.CV

    Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning

    Authors: Ju He, Adam Kortylewski, Shaokang Yang, Shuai Liu, Cheng Yang, Changhu Wang, Alan Yuille

    Abstract: Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce. However, most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets. In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations. In particular, we decouple the trainin… ▽ More

    Submitted 10 December, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

  42. arXiv:2104.07645  [pdf, other

    cs.CV

    A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

    Authors: Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, Xiaolong Wang

    Abstract: Recent work has made significant progress on using implicit functions, as a continuous representation for 3D rigid object shape reconstruction. However, much less effort has been devoted to modeling general articulated objects. Compared to rigid objects, articulated objects have higher degrees of freedom, which makes it hard to generalize to unseen shapes. To deal with the large shape variance, we… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Our project page is available at: https://jitengmu.github.io/A-SDF/

  43. arXiv:2103.14098  [pdf, other

    cs.CV

    Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles

    Authors: Qing Liu, Adam Kortylewski, Zhishuai Zhang, Zizhang Li, Mengqi Guo, Qihao Liu, Xiaoding Yuan, Jiteng Mu, Weichao Qiu, Alan Yuille

    Abstract: Part segmentations provide a rich and detailed part-level description of objects. However, their annotation requires an enormous amount of work, which makes it difficult to apply standard deep learning methods. In this paper, we propose the idea of learning part segmentation through unsupervised domain adaptation (UDA) from synthetic data. We first introduce UDA-Part, a comprehensive part segmenta… ▽ More

    Submitted 3 April, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: CVPR 2022 (Oral)

  44. arXiv:2103.07976  [pdf, other

    cs.CV

    TransFG: A Transformer Architecture for Fine-grained Recognition

    Authors: Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang

    Abstract: Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences. Most existing works mainly tackle this problem by reusing the backbone network to extract features of detected discriminative regions. However, this strategy inevitably complicates the pipeline and pushes the proposed region… ▽ More

    Submitted 1 December, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

  45. arXiv:2102.11343  [pdf, other

    cs.LG cs.CV

    Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Map**

    Authors: Prakhar Kaushik, Alex Gain, Adam Kortylewski, Alan Yuille

    Abstract: Catastrophic forgetting in neural networks is a significant problem for continual learning. A majority of the current methods replay previous data during training, which violates the constraints of an ideal continual learning system. Additionally, current approaches that deal with forgetting ignore the problem of catastrophic remembering, i.e. the worsening ability to discriminate between data fro… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  46. arXiv:2101.12378  [pdf, other

    cs.CV

    NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation

    Authors: Angtian Wang, Adam Kortylewski, Alan Yuille

    Abstract: 3D pose estimation is a challenging but important task in computer vision. In this work, we show that standard deep learning approaches to 3D pose estimation are not robust when objects are partially occluded or viewed from a previously unseen pose. Inspired by the robustness of generative vision models to partial occlusion, we propose to integrate deep neural networks with 3D generative represent… ▽ More

    Submitted 4 February, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

    Comments: Accepted by ICLR 2021. Code is publicly available

  47. arXiv:2101.11878  [pdf, other

    cs.CV

    CORL: Compositional Representation Learning for Few-Shot Classification

    Authors: Ju He, Adam Kortylewski, Alan Yuille

    Abstract: Few-shot image classification consists of two consecutive learning processes: 1) In the meta-learning stage, the model acquires a knowledge base from a set of training classes. 2) During meta-testing, the acquired knowledge is used to recognize unseen classes from very few examples. Inspired by the compositional representation of objects in humans, we train a neural network architecture that expli… ▽ More

    Submitted 16 December, 2022; v1 submitted 28 January, 2021; originally announced January 2021.

  48. arXiv:2012.02107  [pdf, other

    cs.CV

    Robust Instance Segmentation through Reasoning about Multi-Object Occlusion

    Authors: Xiaoding Yuan, Adam Kortylewski, Yihong Sun, Alan Yuille

    Abstract: Analyzing complex scenes with Deep Neural Networks is a challenging task, particularly when images contain multiple objects that partially occlude each other. Existing approaches to image analysis mostly process objects independently and do not take into account the relative occlusion of nearby objects. In this paper, we propose a deep network for multi-object instance segmentation that is robust… ▽ More

    Submitted 1 April, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: Accepted by CVPR 2021

  49. arXiv:2012.00558  [pdf, other

    cs.CV

    Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks

    Authors: Christian Cosgrove, Adam Kortylewski, Chenglin Yang, Alan Yuille

    Abstract: Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification. While progress has been made in defending against imperceptible attacks, it remains unclear how patch-based attacks can be resisted. In this work, we study two different approaches for defending against black-box patch attacks. First, we show that adversarial training, which is… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  50. arXiv:2012.00313  [pdf, other

    cs.CV

    Unsupervised Part Discovery via Feature Alignment

    Authors: Mengqi Guo, Yutong Bai, Zhishuai Zhang, Adam Kortylewski, Alan Yuille

    Abstract: Understanding objects in terms of their individual parts is important, because it enables a precise understanding of the objects' geometrical structure, and enhances object recognition when the object is seen in a novel pose or under partial occlusion. However, the manual annotation of parts in large scale datasets is time consuming and expensive. In this paper, we aim at discovering object parts… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: 10 pages, 9 figures, submitted to CVPR 2021