Skip to main content

Showing 1–31 of 31 results for author: Neumann, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.12365  [pdf, other

    cs.CV

    GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

    Authors: Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann

    Abstract: Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  2. arXiv:2309.13516  [pdf, other

    cs.CV cs.RO

    InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation

    Authors: Cho-Ying Wu, Quankai Gao, Chin-Cheng Hsu, Te-Lin Wu, **g-Wen Chen, Ulrich Neumann

    Abstract: Indoor monocular depth estimation has attracted increasing research interest. Most previous works have been focusing on methodology, primarily experimenting with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall performance over the test set. However, little is known regarding robustness and generalization when it comes to applying monocular depth estimation methods to real-world… ▽ More

    Submitted 30 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Add Depth-Anything

  3. arXiv:2308.16154  [pdf, other

    cs.CV

    MMVP: Motion-Matrix-based Video Prediction

    Authors: Yiqi Zhong, Luming Liang, Ilya Zharkov, Ulrich Neumann

    Abstract: A central challenge of video prediction lies where the system has to reason the objects' future motions from image frames while simultaneously maintaining the consistency of their appearances across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. Unlike previous methods that usually h… ▽ More

    Submitted 30 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 (Oral)

  4. arXiv:2307.13226  [pdf, other

    cs.CV

    Strivec: Sparse Tri-Vector Radiance Fields

    Authors: Quankai Gao, Qiangeng Xu, Hao Su, Ulrich Neumann, Zexiang Xu

    Abstract: We propose Strivec, a novel neural representation that models a 3D scene as a radiance field with sparsely distributed and compactly factorized local tensor feature grids. Our approach leverages tensor decomposition, following the recent work TensoRF, to model the tensor grids. In contrast to TensoRF which uses a global tensor and focuses on their vector-matrix decomposition, we propose to utilize… ▽ More

    Submitted 24 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

  5. arXiv:2305.07269  [pdf, other

    cs.CV

    Meta-Optimization for Higher Model Generalizability in Single-Image Depth Prediction

    Authors: Cho-Ying Wu, Yiqi Zhong, Junying Wang, Ulrich Neumann

    Abstract: Model generalizability to unseen datasets, concerned with in-the-wild robustness, is less studied for indoor single-image depth prediction. We leverage gradient-based meta-learning for higher generalizability on zero-shot cross-dataset inference. Unlike the most-studied image classification in meta-learning, depth is pixel-level continuous range values, and map**s from each image to depth vary w… ▽ More

    Submitted 30 January, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: long version; short version accepted to CVPR 2023 Workshop on Adversarial Machine Learning on Computer Vision and CVPR 2023 Workshop on Computer Vision for Mixed Reality

  6. arXiv:2207.09646  [pdf, other

    cs.CV

    Aware of the History: Trajectory Forecasting with the Local Behavior Data

    Authors: Yiqi Zhong, Zhenyang Ni, Siheng Chen, Ulrich Neumann

    Abstract: The historical trajectories previously passing through a location may help infer the future trajectory of an agent currently at this location. Despite great improvements in trajectory forecasting with the guidance of high-definition maps, only a few works have explored such local historical information. In this work, we re-introduce this information as a new type of input data for trajectory forec… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: This paper has been accepted by ECCV 2022

  7. arXiv:2207.05195  [pdf, other

    cs.CV stat.ML

    Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting

    Authors: Bohan Tang, Yiqi Zhong, Chenxin Xu, Wei-Tao Wu, Ulrich Neumann, Yanfeng Wang, Ya Zhang, Siheng Chen

    Abstract: In multi-modal multi-agent trajectory forecasting, two major challenges have not been fully tackled: 1) how to measure the uncertainty brought by the interaction module that causes correlations among the predicted trajectories of multiple agents; 2) how to rank the multiple predictions and select the optimal predicted trajectory. In order to handle these challenges, this work first proposes a nove… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: text overlap with arXiv:2110.13947

  8. arXiv:2203.09824  [pdf, other

    cs.CV cs.LG eess.AS

    Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?

    Authors: Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann

    Abstract: This work digs into a root question in human perception: can face geometry be gleaned from one's voices? Previous works that study this question only adopt developments in image synthesis and convert voices into face images to show correlations, but working on the image domain unavoidably involves predicting attributes that voices cannot hint, including facial textures, hairstyles, and backgrounds… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022. Project page: https://choyingw.github.io/works/Voice2Mesh/index.html. This version supersedes arXiv:2104.10299

  9. arXiv:2201.08845  [pdf, other

    cs.CV

    Point-NeRF: Point-based Neural Radiance Fields

    Authors: Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, Ulrich Neumann

    Abstract: Volumetric neural rendering methods like NeRF generate high-quality view synthesis results but are optimized per-scene leading to prohibitive reconstruction time. On the other hand, deep multi-view stereo methods can quickly reconstruct scene geometry via direct network inference. Point-NeRF combines the advantages of these two approaches by using neural 3D point clouds, with associated neural fea… ▽ More

    Submitted 15 March, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

    Comments: Accepted to CVPR 2022 (Oral)

    Journal ref: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5438-5448) (2022)

  10. arXiv:2112.02306  [pdf, other

    cs.CV

    Toward Practical Monocular Indoor Depth Estimation

    Authors: Cho-Ying Wu, Jialiang Wang, Michael Hall, Ulrich Neumann, Shuochen Su

    Abstract: The majority of prior monocular depth estimation methods without groundtruth depth guidance focus on driving scenarios. We show that such methods generalize poorly to unseen complex indoor scenes, where objects are cluttered and arbitrarily arranged in the near field. To obtain more robustness, we propose a structure distillation approach to learn knacks from an off-the-shelf relative depth estima… ▽ More

    Submitted 28 March, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

    Comments: Accepted to CVPR 2022

  11. arXiv:2112.02205  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

    Authors: Qiangeng Xu, Yiqi Zhong, Ulrich Neumann

    Abstract: Advances in LiDAR sensors provide rich 3D data that supports 3D scene understanding. However, due to occlusion and signal miss, LiDAR point clouds are in practice 2.5D as they cover only partial underlying shapes, which poses a fundamental challenge to 3D perception. To tackle the challenge, we present a novel LiDAR-based 3D object detection model, dubbed Behind the Curtain Detector (BtcDet), whic… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Journal ref: AAAI2022

  12. arXiv:2110.13947  [pdf, other

    cs.CV cs.LG

    Collaborative Uncertainty in Multi-Agent Trajectory Forecasting

    Authors: Bohan Tang, Yiqi Zhong, Ulrich Neumann, Gang Wang, Ya Zhang, Siheng Chen

    Abstract: Uncertainty modeling is critical in trajectory forecasting systems for both interpretation and safety reasons. To better predict the future trajectories of multiple agents, recent works have introduced interaction modules to capture interactions among agents. This approach leads to correlations among the predicted trajectories. However, the uncertainty brought by such correlations is neglected. To… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: This paper has been accepted by NeurIPS 2021

  13. arXiv:2110.09772  [pdf, other

    cs.CV cs.GR

    Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry

    Authors: Cho-Ying Wu, Qiangeng Xu, Ulrich Neumann

    Abstract: This work studies learning from a synergy process of 3D Morphable Models (3DMM) and 3D facial landmarks to predict complete 3D facial geometry, including 3D alignment, face orientation, and 3D face modeling. Our synergy process leverages a representation cycle for 3DMM parameters and 3D landmarks. 3D landmarks can be extracted and refined from face meshes built by 3DMM parameters. We next reverse… ▽ More

    Submitted 17 January, 2024; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted at 3DV 2021. This conference version supersedes arXiv:2104.08403

  14. arXiv:2104.10299  [pdf, other

    cs.GR cs.CV cs.LG cs.SD eess.AS

    Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices

    Authors: Cho-Ying Wu, Ke Xu, Chin-Cheng Hsu, Ulrich Neumann

    Abstract: This work focuses on the analysis that whether 3D face models can be learned from only the speech inputs of speakers. Previous works for cross-modal face synthesis study image generation from voices. However, image synthesis includes variations such as hairstyles, backgrounds, and facial textures, that are arguably irrelevant to voice or without direct studies to show correlations. We instead inve… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: Project page: https://choyingw.github.io/works/Voice2Mesh/index.html

  15. arXiv:2104.08403  [pdf, other

    cs.CV

    Accurate 3D Facial Geometry Prediction by Multi-Task, Multi-Modal, and Multi-Representation Landmark Refinement Network

    Authors: Cho-Ying Wu, Qiangeng Xu, Ulrich Neumann

    Abstract: This work focuses on complete 3D facial geometry prediction, including 3D facial alignment via 3D face modeling and face orientation estimation using the proposed multi-task, multi-modal, and multi-representation landmark refinement network (M$^3$-LRN). Our focus is on the important facial attributes, 3D landmarks, and we fully utilize their embedded information to guide 3D facial geometry learnin… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

    Comments: Project page: https://choyingw.github.io/works/M3-LRN/index.html

  16. arXiv:2006.07802  [pdf, other

    cs.CV

    Geometry-Aware Instance Segmentation with Disparity Maps

    Authors: Cho-Ying Wu, Xiaoyan Hu, Michael Happold, Qiangeng Xu, Ulrich Neumann

    Abstract: Most previous works of outdoor instance segmentation for images only use color information. We explore a novel direction of sensor fusion to exploit stereo cameras. Geometric information from disparities helps separate overlap** objects of the same or different classes. Moreover, geometric information penalizes region proposals with unlikely 3D shapes thus suppressing false positive detections.… ▽ More

    Submitted 17 January, 2024; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: CVPR 2020 Workshop of Scalability in Autonomous Driving (WSAD). Please refer to WSAD site for details; fix typos

  17. arXiv:2003.06945  [pdf, other

    cs.CV cs.RO

    Scene Completeness-Aware Lidar Depth Completion for Driving Scenario

    Authors: Cho-Ying Wu, Ulrich Neumann

    Abstract: This paper introduces Scene Completeness-Aware Depth Completion (SCADC) to complete raw lidar scans into dense depth maps with fine and complete scene structures. Recent sparse depth completion for lidars only focuses on the lower scenes and produces irregular estimations on the upper because existing datasets, such as KITTI, do not provide groundtruth for upper areas. These areas are considered l… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 March, 2020; originally announced March 2020.

    Comments: Present at ICASSP 2021; fix typos

  18. arXiv:1912.02984  [pdf, other

    cs.CV cs.LG

    Grid-GCN for Fast and Scalable Point Cloud Learning

    Authors: Qiangeng Xu, Xudong Sun, Cho-Ying Wu, Panqu Wang, Ulrich Neumann

    Abstract: Due to the sparsity and irregularity of the point cloud data, methods that directly consume points have become popular. Among all point-based models, graph convolutional networks (GCN) lead to notable performance by fully preserving the data granularity and exploiting point interrelation. However, point-based networks spend a significant amount of time on data structuring (e.g., Farthest Point Sam… ▽ More

    Submitted 12 April, 2021; v1 submitted 6 December, 2019; originally announced December 2019.

    Journal ref: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020)

  19. arXiv:1906.08967  [pdf, other

    cs.CV

    Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion

    Authors: Yiqi Zhong, Cho-Ying Wu, Suya You, Ulrich Neumann

    Abstract: In this paper, we propose our Correlation For Completion Network (CFCNet), an end-to-end deep learning model that uses the correlation between two data sources to perform sparse depth completion. CFCNet learns to capture, to the largest extent, the semantically correlated features between RGB and depth information. Through pairs of image pixels and the visible measurements in a sparse depth map, C… ▽ More

    Submitted 15 March, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019. Code link https://github.com/choyingw/CFCNet

  20. arXiv:1906.02426  [pdf, other

    eess.IV cs.CV cs.GR

    Salient Building Outline Enhancement and Extraction Using Iterative L0 Smoothing and Line Enhancing

    Authors: Cho-Ying Wu, Ulrich Neumann

    Abstract: In this paper, our goal is salient building outline enhancement and extraction from images taken from consumer cameras using L0 smoothing. We address weak outlines and over-smoothing problem. Weak outlines are often undetected by edge extractors or easily smoothed out. We propose an iterative method, including the smoothing cell and sharpening cell. In the smoothing cell, we iteratively enlarge th… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

    Comments: Accepted to ICIP 2019

  21. arXiv:1905.10711  [pdf, other

    cs.CV

    DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction

    Authors: Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomir Mech, Ulrich Neumann

    Abstract: Reconstructing 3D shapes from single-view images has been a long-standing research problem. In this paper, we present DISN, a Deep Implicit Surface Network which can generate a high-quality detail-rich 3D mesh from an 2D image by predicting the underlying signed distance fields. In addition to utilizing global image features, DISN predicts the projected location for each 3D point on the 2D image,… ▽ More

    Submitted 25 March, 2024; v1 submitted 25 May, 2019; originally announced May 2019.

    Comments: This project was in part supported by the gift funding to the University of Southern California from Adobe Research

    Journal ref: 33rd Annual Conference on Neural Information Processing Systems (NeurIPS 2019)

  22. arXiv:1903.03322  [pdf, other

    cs.CV

    3DN: 3D Deformation Network

    Authors: Weiyue Wang, Duygu Ceylan, Radomir Mech, Ulrich Neumann

    Abstract: Applications in virtual and augmented reality create a demand for rapid creation and easy access to large sets of 3D models. An effective way to address this demand is to edit or deform existing 3D models based on a reference, e.g., a 2D image which is very easy to acquire. Given such a source 3D model and a target which can be a 2D image, 3D model, or a point cloud acquired as a depth scan, we in… ▽ More

    Submitted 8 March, 2019; originally announced March 2019.

  23. arXiv:1903.00719  [pdf, other

    cs.LG cs.IR stat.ML

    FRI -- Feature Relevance Intervals for Interpretable and Interactive Data Exploration

    Authors: Lukas Pfannschmidt, Christina Göpfert, Ursula Neumann, Dominik Heider, Barbara Hammer

    Abstract: Most existing feature selection methods are insufficient for analytic purposes as soon as high dimensional data or redundant sensor signals are dealt with since features can be selected due to spurious effects or correlations rather than causal effects. To support the finding of causal features in biomedical experiments, we hereby present FRI, an open source Python library that can be used to iden… ▽ More

    Submitted 21 June, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

    Comments: Addition of IEEE copyright notice. Accepted for CIBCB 2019 (https://cibcb2019.icas.xyz/)

  24. arXiv:1811.00274  [pdf, other

    cs.CV cs.LG

    Efficient Multi-Domain Dictionary Learning with GANs

    Authors: Cho Ying Wu, Ulrich Neumann

    Abstract: In this paper, we propose the multi-domain dictionary learning (MDDL) to make dictionary learning-based classification more robust to data representing in different domains. We use adversarial neural networks to generate data in different styles, and collect all the generated data into a miscellaneous dictionary. To tackle the dictionary learning with many samples, we compute the weighting matrix… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

  25. arXiv:1809.00263  [pdf, other

    cs.CV cs.AI cs.LG

    Stochastic Dynamics for Video Infilling

    Authors: Qiangeng Xu, Hanwang Zhang, Weiyue Wang, Peter N. Belhumeur, Ulrich Neumann

    Abstract: In this paper, we introduce a stochastic dynamics video infilling (SDVI) framework to generate frames between long intervals in a video. Our task differs from video interpolation which aims to produce transitional frames for a short interval between every two frames and increase the temporal resolution. Our task, namely video infilling, however, aims to infill long intervals with plausible frame s… ▽ More

    Submitted 7 June, 2019; v1 submitted 1 September, 2018; originally announced September 2018.

    Comments: Winter Conference on Applications of Computer Vision (WACV 2020)

  26. arXiv:1803.06791  [pdf, other

    cs.CV

    Depth-aware CNN for RGB-D Segmentation

    Authors: Weiyue Wang, Ulrich Neumann

    Abstract: Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure. The availability of depth data enables progress in RGB-D semantic segmentation with CNNs. State-of-the-art methods either use depth as additional images or process spatial information in 3D volumes or point clouds. These methods suffer from high computat… ▽ More

    Submitted 18 March, 2018; originally announced March 2018.

  27. arXiv:1802.04402  [pdf, other

    cs.CV

    Recurrent Slice Networks for 3D Segmentation of Point Clouds

    Authors: Qiangui Huang, Weiyue Wang, Ulrich Neumann

    Abstract: Point clouds are an efficient data format for 3D data. However, existing 3D segmentation methods for point clouds either do not model local dependencies \cite{pointnet} or require added computations \cite{kd-net,pointnet2}. This work presents a novel 3D segmentation framework, RSNet\footnote{Codes are released here https://github.com/qianguih/RSNet}, to efficiently model local structures in point… ▽ More

    Submitted 29 March, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: camera ready version for cvpr 2018 spotlight. codes are available here https://github.com/qianguih/RSNet

  28. arXiv:1801.07365  [pdf, other

    cs.CV

    Learning to Prune Filters in Convolutional Neural Networks

    Authors: Qiangui Huang, Kevin Zhou, Suya You, Ulrich Neumann

    Abstract: Many state-of-the-art computer vision algorithms use large scale convolutional neural networks (CNNs) as basic building blocks. These CNNs are known for their huge number of parameters, high redundancy in weights, and tremendous computing resource consumptions. This paper presents a learning algorithm to simplify and speed up these CNNs. Specifically, we introduce a "try-and-learn" algorithm to tr… ▽ More

    Submitted 22 January, 2018; originally announced January 2018.

  29. arXiv:1711.08588  [pdf, other

    cs.CV

    SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation

    Authors: Weiyue Wang, Ronald Yu, Qiangui Huang, Ulrich Neumann

    Abstract: We introduce Similarity Group Proposal Network (SGPN), a simple and intuitive deep learning framework for 3D object instance segmentation on point clouds. SGPN uses a single network to predict point grou** proposals and a corresponding semantic class for each proposal, from which we can directly extract instance segmentation results. Important to the effectiveness of SGPN is its novel representa… ▽ More

    Submitted 30 May, 2019; v1 submitted 23 November, 2017; originally announced November 2017.

  30. arXiv:1711.06375  [pdf, other

    cs.CV

    Shape Inpainting using 3D Generative Adversarial Network and Recurrent Convolutional Networks

    Authors: Weiyue Wang, Qiangui Huang, Suya You, Chao Yang, Ulrich Neumann

    Abstract: Recent advances in convolutional neural networks have shown promising results in 3D shape completion. But due to GPU memory limitations, these methods can only produce low-resolution outputs. To inpaint 3D models with semantic plausibility and contextual details, we introduce a hybrid framework that combines a 3D Encoder-Decoder Generative Adversarial Network (3D-ED-GAN) and a Long-term Recurrent… ▽ More

    Submitted 16 November, 2017; originally announced November 2017.

  31. arXiv:1611.07485  [pdf, other

    cs.CV

    Scene Labeling using Gated Recurrent Units with Explicit Long Range Conditioning

    Authors: Qiangui Huang, Weiyue Wang, Kevin Zhou, Suya You, Ulrich Neumann

    Abstract: Recurrent neural network (RNN), as a powerful contextual dependency modeling framework, has been widely applied to scene labeling problems. However, this work shows that directly applying traditional RNN architectures, which unfolds a 2D lattice grid into a sequence, is not sufficient to model structure dependencies in images due to the "impact vanishing" problem. First, we give an empirical analy… ▽ More

    Submitted 28 March, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

    Comments: updated version 2