Search | arXiv e-print repository

ZeroShape: Regression-based Zero-shot Shape Reconstruction

Authors: Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James M. Rehg

Abstract: We study the problem of single-image zero-shot 3D shape reconstruction. Recent works learn zero-shot shape reconstruction through generative modeling of 3D assets, but these models are computationally expensive at train and inference time. In contrast, the traditional approach to this problem is regression-based, where deterministic models are trained to directly regress the object shape. Such reg… ▽ More We study the problem of single-image zero-shot 3D shape reconstruction. Recent works learn zero-shot shape reconstruction through generative modeling of 3D assets, but these models are computationally expensive at train and inference time. In contrast, the traditional approach to this problem is regression-based, where deterministic models are trained to directly regress the object shape. Such regression methods possess much higher computational efficiency than generative methods. This raises a natural question: is generative modeling necessary for high performance, or conversely, are regression-based approaches still competitive? To answer this, we design a strong regression-based model, called ZeroShape, based on the converging findings in this field and a novel insight. We also curate a large real-world evaluation benchmark, with objects from three different real-world 3D datasets. This evaluation benchmark is more diverse and an order of magnitude larger than what prior works use to quantitatively evaluate their models, aiming at reducing the evaluation variance in our field. We show that ZeroShape not only achieves superior performance over state-of-the-art methods, but also demonstrates significantly higher computational and data efficiency. △ Less

Submitted 16 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: Project page: https://zixuanh.com/projects/zeroshape.html

arXiv:2312.03533 [pdf, other]

Low-shot Object Learning with Mutual Exclusivity Bias

Authors: Anh Thai, Ahmad Humayun, Stefan Stojanov, Zixuan Huang, Bikram Boote, James M. Rehg

Abstract: This paper introduces Low-shot Object Learning with Mutual Exclusivity Bias (LSME), the first computational framing of mutual exclusivity bias, a phenomenon commonly observed in infants during word learning. We provide a novel dataset, comprehensive baselines, and a state-of-the-art method to enable the ML community to tackle this challenging learning task. The goal of LSME is to analyze an RGB im… ▽ More This paper introduces Low-shot Object Learning with Mutual Exclusivity Bias (LSME), the first computational framing of mutual exclusivity bias, a phenomenon commonly observed in infants during word learning. We provide a novel dataset, comprehensive baselines, and a state-of-the-art method to enable the ML community to tackle this challenging learning task. The goal of LSME is to analyze an RGB image of a scene containing multiple objects and correctly associate a previously-unknown object instance with a provided category label. This association is then used to perform low-shot learning to test category generalization. We provide a data generation pipeline for the LSME problem and conduct a thorough analysis of the factors that contribute to its difficulty. Additionally, we evaluate the performance of multiple baselines, including state-of-the-art foundation models. Finally, we present a baseline approach that outperforms state-of-the-art models in terms of low-shot accuracy. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: Accepted at NeurIPS 2023, Datasets and Benchmarks Track. Project website https://ngailapdi.github.io/projects/lsme/

arXiv:2304.06247 [pdf, other]

ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency

Authors: Zixuan Huang, Varun Jampani, Anh Thai, Yuanzhen Li, Stefan Stojanov, James M. Rehg

Abstract: We present ShapeClipper, a novel method that reconstructs 3D object shapes from real-world single-view RGB images. Instead of relying on laborious 3D, multi-view or camera pose annotation, ShapeClipper learns shape reconstruction from a set of single-view segmented images. The key idea is to facilitate shape learning via CLIP-based shape consistency, where we encourage objects with similar CLIP en… ▽ More We present ShapeClipper, a novel method that reconstructs 3D object shapes from real-world single-view RGB images. Instead of relying on laborious 3D, multi-view or camera pose annotation, ShapeClipper learns shape reconstruction from a set of single-view segmented images. The key idea is to facilitate shape learning via CLIP-based shape consistency, where we encourage objects with similar CLIP encodings to share similar shapes. We also leverage off-the-shelf normals as an additional geometric constraint so the model can learn better bottom-up reasoning of detailed surface geometry. These two novel consistency constraints, when used to regularize our model, improve its ability to learn both global shape structure and local geometric details. We evaluate our method over three challenging real-world datasets, Pix3D, Pascal3D+, and OpenImages, where we achieve superior performance over state-of-the-art methods. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: Accepted to CVPR 2023, project website at https://zixuanh.com/projects/shapeclipper.html

arXiv:2211.15059 [pdf, other]

Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization

Authors: Stefan Stojanov, Anh Thai, Zixuan Huang, James M. Rehg

Abstract: A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes. In this work, we aim to learn dense discriminative object representations for low-shot category recognition without requiri… ▽ More A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes. In this work, we aim to learn dense discriminative object representations for low-shot category recognition without requiring any category labels. To this end, we propose Deep Object Patch Encodings (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels. To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object, and use this to formulate a self-supervised learning task to learn discriminative object patches. We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines. Code and data available at https://github.com/rehg-lab/dope_selfsup. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: Accepted at NeurIPS 2022. Code and data available at https://github.com/rehg-lab/dope_selfsup

arXiv:2204.10235 [pdf, other]

Planes vs. Chairs: Category-guided 3D shape learning without any 3D cues

Authors: Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James M. Rehg

Abstract: We present a novel 3D shape reconstruction method which learns to predict an implicit 3D shape representation from a single RGB image. Our approach uses a set of single-view images of multiple object categories without viewpoint annotation, forcing the model to learn across multiple object categories without 3D supervision. To facilitate learning with such minimal supervision, we use category labe… ▽ More We present a novel 3D shape reconstruction method which learns to predict an implicit 3D shape representation from a single RGB image. Our approach uses a set of single-view images of multiple object categories without viewpoint annotation, forcing the model to learn across multiple object categories without 3D supervision. To facilitate learning with such minimal supervision, we use category labels to guide shape learning with a novel categorical metric learning approach. We also utilize adversarial and viewpoint regularization techniques to further disentangle the effects of viewpoint and shape. We obtain the first results for large-scale (more than 50 categories) single-viewpoint shape prediction using a single model without any 3D cues. We are also the first to examine and quantify the benefit of class information in single-view supervised 3D shape reconstruction. Our method achieves superior performance over state-of-the-art methods on ShapeNet-13, ShapeNet-55 and Pascal3D+. △ Less

Submitted 21 April, 2022; originally announced April 2022.

Comments: Project page: https://zixuanh.com/multiclass3D

arXiv:2101.07296 [pdf, other]

Using Shape to Categorize: Low-Shot Learning with an Explicit Shape Bias

Authors: Stefan Stojanov, Anh Thai, James M. Rehg

Abstract: It is widely accepted that reasoning about object shape is important for object recognition. However, the most powerful object recognition methods today do not explicitly make use of object shape during learning. In this work, motivated by recent developments in low-shot learning, findings in developmental psychology, and the increased use of synthetic data in computer vision research, we investig… ▽ More It is widely accepted that reasoning about object shape is important for object recognition. However, the most powerful object recognition methods today do not explicitly make use of object shape during learning. In this work, motivated by recent developments in low-shot learning, findings in developmental psychology, and the increased use of synthetic data in computer vision research, we investigate how reasoning about 3D shape can be used to improve low-shot learning methods' generalization performance. We propose a new way to improve existing low-shot learning approaches by learning a discriminative embedding space using 3D object shape, and using this embedding by learning how to map images into it. Our new approach improves the performance of image-only low-shot learning approaches on multiple datasets. We also introduce Toys4K, a 3D object dataset with the largest number of object categories currently available, which supports low-shot learning. △ Less

Submitted 20 June, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: Accepted at CVPR2021. Project page, code and data available at https://rehg-lab.github.io/publication-pages/lowshot-shapebias/

arXiv:2101.07295 [pdf, other]

The Surprising Positive Knowledge Transfer in Continual 3D Object Shape Reconstruction

Authors: Anh Thai, Stefan Stojanov, Zixuan Huang, Isaac Rehg, James M. Rehg

Abstract: Continual learning has been extensively studied for classification tasks with methods developed to primarily avoid catastrophic forgetting, a phenomenon where earlier learned concepts are forgotten at the expense of more recent samples. In this work, we present a set of continual 3D object shape reconstruction tasks, including complete 3D shape reconstruction from different input modalities, as we… ▽ More Continual learning has been extensively studied for classification tasks with methods developed to primarily avoid catastrophic forgetting, a phenomenon where earlier learned concepts are forgotten at the expense of more recent samples. In this work, we present a set of continual 3D object shape reconstruction tasks, including complete 3D shape reconstruction from different input modalities, as well as visible surface (2.5D) reconstruction which, surprisingly demonstrate positive knowledge (backward and forward) transfer when training with solely standard SGD and without additional heuristics. We provide evidence that continuously updated representation learning of single-view 3D shape reconstruction improves the performance on learned and novel categories over time. We provide a novel analysis of knowledge transfer ability by looking at the output distribution shift across sequential learning tasks. Finally, we show that the robustness of these tasks leads to the potential of having a proxy representation learning task for continual classification. The codebase, dataset and pre-trained models released with this article can be found at https://github.com/rehg-lab/CLRec △ Less

Submitted 8 September, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: Accepted to 3DV 2022

arXiv:2006.07752 [pdf, other]

3D Reconstruction of Novel Object Shapes from Single Images

Authors: Anh Thai, Stefan Stojanov, Vijay Upadhya, James M. Rehg

Abstract: Accurately predicting the 3D shape of any arbitrary object in any pose from a single image is a key goal of computer vision research. This is challenging as it requires a model to learn a representation that can infer both the visible and occluded portions of any object using a limited training set. A training set that covers all possible object shapes is inherently infeasible. Such learning-based… ▽ More Accurately predicting the 3D shape of any arbitrary object in any pose from a single image is a key goal of computer vision research. This is challenging as it requires a model to learn a representation that can infer both the visible and occluded portions of any object using a limited training set. A training set that covers all possible object shapes is inherently infeasible. Such learning-based approaches are inherently vulnerable to overfitting, and successfully implementing them is a function of both the architecture design and the training approach. We present an extensive investigation of factors specific to architecture design, training, experiment design, and evaluation that influence reconstruction performance and measurement. We show that our proposed SDFNet achieves state-of-the-art performance on seen and unseen shapes relative to existing methods GenRe and OccNet. We provide the first large-scale evaluation of single image shape reconstruction to unseen objects. The source code, data and trained models can be found on https://github.com/rehg-lab/3DShapeGen. △ Less

Submitted 1 September, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

Comments: First two authors contributed equally

arXiv:1909.04518 [pdf]

doi 10.1117/1.JBO.25.9.096009

Virtual organelle self-coding for fluorescence imaging via adversarial learning

Authors: Thanh Nguyen, Vy Bui, Anh Thai, Van Lam, Christopher B. Raub, Lin-Ching Chang, George Nehmetallah

Abstract: Fluorescence microscopy plays a vital role in understanding the subcellular structures of living cells. However, it requires considerable effort in sample preparation related to chemical fixation, staining, cost, and time. To reduce those factors, we present a virtual fluorescence staining method based on deep neural networks (VirFluoNet) to transform fluorescence images of molecular labels into o… ▽ More Fluorescence microscopy plays a vital role in understanding the subcellular structures of living cells. However, it requires considerable effort in sample preparation related to chemical fixation, staining, cost, and time. To reduce those factors, we present a virtual fluorescence staining method based on deep neural networks (VirFluoNet) to transform fluorescence images of molecular labels into other molecular fluorescence labels in the same field-of-view. To achieve this goal, we develop and train a conditional generative adversarial network (cGAN) to perform digital fluorescence imaging demonstrated on human osteosarcoma U2OS cell fluorescence images captured under Cell Painting staining protocol. A detailed comparative analysis is also conducted on the performance of the cGAN network between predicting fluorescence channels based on phase contrast or based on another fluorescence channel using human breast cancer MDA-MB-231 cell line as a test case. In addition, we implement a deep learning model to perform autofocusing on another human U2OS fluorescence dataset as a preprocessing step to defocus an out-focus channel in U2OS dataset. A quantitative index of image prediction error is introduced based on signal pixel-wise spatial and intensity differences with ground truth to evaluate the performance of prediction to high-complex and throughput fluorescence. This index provides a rational way to perform image segmentation on error signals and to understand the likelihood of mis-interpreting biology from the predicted image. In total, these findings contribute to the utility of deep learning image regression for fluorescence microscopy datasets of biological cells, balanced against savings of cost, time, and experimental effort. Furthermore, the approach introduced here holds promise for modeling the internal relationships between organelles and biomolecules within living cells. △ Less

Submitted 10 September, 2019; originally announced September 2019.

Comments: 20 pages, 9 figures

MSC Class: 92B20

Showing 1–9 of 9 results for author: Thai, A