Skip to main content

Showing 1–50 of 218 results for author: Cremers, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10079  [pdf, other

    cs.CV cs.AI

    Localizing Events in Videos with Multimodal Queries

    Authors: Gengyuan Zhang, Mang Ling Ada Fok, Yan Xia, Yansong Tang, Daniel Cremers, Philip Torr, Volker Tresp, **dong Gu

    Abstract: Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current resea… ▽ More

    Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages; fix some typos

  2. arXiv:2406.07550  [pdf, other

    cs.CV

    An Image is Worth 32 Tokens for Reconstruction and Generation

    Authors: Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

    Abstract: Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands compared to directly processing pixels and enhances the effectiveness and efficiency of the generation process. Prior methods, such as VQGAN, typically… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: A compact 1D Image Tokenization method, leading to SOTA generation performance while being substantially faster. Project page at https://yucornetto.github.io/projects/titok.html

  3. arXiv:2405.05079  [pdf, other

    cs.CV

    Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment

    Authors: Simon Weber, Je Hyeong Hong, Daniel Cremers

    Abstract: Initialization-free bundle adjustment (BA) remains largely uncharted. While Levenberg-Marquardt algorithm is the golden method to solve the BA problem, it generally relies on a good initialization. In contrast, the under-explored Variable Projection algorithm (VarPro) exhibits a wide convergence basin even without initialization. Coupled with object space error formulation, recent works have shown… ▽ More

    Submitted 9 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2404.12330  [pdf, other

    cs.CV cs.MM

    A Perspective on Deep Vision Performance with Standard Image and Video Codecs

    Authors: Christoph Reich, Oliver Hahn, Daniel Cremers, Stefan Roth, Biplob Debnath

    Abstract: Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and requir… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

  5. arXiv:2404.12209  [pdf, other

    cs.CV

    Partial-to-Partial Shape Matching with Geometric Consistency

    Authors: Viktoria Ehm, Maolin Gao, Paul Roetzer, Marvin Eisenberger, Daniel Cremers, Florian Bernard

    Abstract: Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. A prominent challenge are partial-to-partial shape matching settings, which occur when the shapes to match are only observed incompletely (e.g. from 3D scanning). Although partial-to-partial matching is a highly relevant setting in practice, it is rarely explored. Our work b… ▽ More

    Submitted 10 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  6. arXiv:2404.10960  [pdf, other

    cs.CL cs.AI

    Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations

    Authors: Christian Tomani, Kamalika Chaudhuri, Ivan Evtimov, Daniel Cremers, Mark Ibrahim

    Abstract: A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability. Three situations where this is particularly apparent are correctness, hallucinations when given unanswerable questions, and safety. In all three cases, models should ideally abstain from responding, much like humans, whose ability to understand uncertainty makes us refrain from answering… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  7. arXiv:2404.07933  [pdf, other

    cs.CV

    Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation

    Authors: Keonhee Han, Dominik Muhle, Felix Wimbauer, Daniel Cremers

    Abstract: Inferring scene geometry from images via Structure from Motion is a long-standing and fundamental problem in computer vision. While classical approaches and, more recently, depth map predictions only focus on the visible parts of a scene, the task of scene completion aims to reason about geometry even in occluded regions. With the popularity of neural radiance fields (NeRFs), implicit representati… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  8. arXiv:2404.03999  [pdf, other

    cs.CV

    Finsler-Laplace-Beltrami Operators with Application to Shape Analysis

    Authors: Simon Weber, Thomas Dagès, Maolin Gao, Daniel Cremers

    Abstract: The Laplace-Beltrami operator (LBO) emerges from studying manifolds equipped with a Riemannian metric. It is often called the Swiss army knife of geometry processing as it allows to capture intrinsic shape information and gives rise to heat diffusion, geodesic distances, and a multitude of shape descriptors. It also plays a central role in geometric deep learning. In this work, we explore Finsler… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  9. arXiv:2404.03778  [pdf, other

    cs.CV

    Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball

    Authors: Simon Weber, Barış Zöngür, Nikita Araslanov, Daniel Cremers

    Abstract: Hierarchy is a natural representation of semantic taxonomies, including the ones routinely used in image segmentation. Indeed, recent work on semantic segmentation reports improved accuracy from supervised training leveraging hierarchical label structures. Encouraged by these results, we revisit the fundamental assumptions behind that work. We postulate and then empirically verify that the reasons… ▽ More

    Submitted 15 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  10. arXiv:2404.00098  [pdf, other

    cs.CV

    Sparse Views, Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo

    Authors: Mohammed Brahimi, Bjoern Haefner, Zhenzhang Ye, Bastian Goldluecke, Daniel Cremers

    Abstract: Neural approaches have shown a significant progress on camera-based reconstruction. But they require either a fairly dense sampling of the viewing sphere, or pre-training on an existing dataset, thereby limiting their generalizability. In contrast, photometric stereo (PS) approaches have shown great potential for achieving high-quality reconstruction under sparse viewpoints. Yet, they are impracti… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: Accepted in CVPR 2024

  11. arXiv:2403.16605  [pdf, other

    cs.CV

    SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

    Authors: Aysim Toker, Marvin Eisenberger, Daniel Cremers, Laura Leal-Taixé

    Abstract: In recent years, semantic segmentation has become a pivotal tool in processing and interpreting satellite imagery. Yet, a prevalent limitation of supervised learning techniques remains the need for extensive manual annotations by experts. In this work, we explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks. The main idea is to le… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024

  12. arXiv:2403.14594  [pdf, other

    cs.CV cs.RO

    VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition

    Authors: Yun-** Li, Mariia Gladkova, Yan Xia, Rui Wang, Daniel Cremers

    Abstract: Recent works on the global place recognition treat the task as a retrieval problem, where an off-the-shelf global descriptor is commonly designed in image-based and LiDAR-based modalities. However, it is non-trivial to perform accurate image-LiDAR global place recognition since extracting consistent and robust global descriptors from different domains (2D images and 3D point clouds) is challenging… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page https://yun**li.github.io/projects-vxp/

  13. arXiv:2403.08498  [pdf, other

    cs.CV

    Gaussian Splatting in Style

    Authors: Abhishek Saroha, Mariia Gladkova, Cecilia Curreli, Tarun Yenamandra, Daniel Cremers

    Abstract: Scene stylization extends the work of neural style transfer to three spatial dimensions. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across a multi-view setting. A vast majority of the previous works achieve this by optimizing the scene with a specific style image. In contrast, we propose a novel architecture trained on a collection of style images, t… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  14. arXiv:2402.18920  [pdf, other

    cs.CV cs.AI cs.CG

    Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation

    Authors: Dongliang Cao, Marvin Eisenberger, Nafie El Amrani, Daniel Cremers, Florian Bernard

    Abstract: Although 3D shape matching and interpolation are highly interrelated, they are often studied separately and applied sequentially to relate different 3D shapes, thus resulting in sub-optimal performance. In this work we present a unified framework to predict both point-wise correspondences and shape interpolation between 3D shapes. To this end, we combine the deep functional map framework with clas… ▽ More

    Submitted 27 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: accepted by CVPR2024

  15. arXiv:2402.17641  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Variational Learning is Effective for Large Deep Networks

    Authors: Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

    Abstract: We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertaint… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: https://github.com/team-approx-bayes/ivon

  16. arXiv:2402.16748  [pdf, other

    cs.LG

    Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization

    Authors: Zhenzhang Ye, Gabriel Peyré, Daniel Cremers, Pierre Ablin

    Abstract: Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute the so-called hypergradient of the outer problem is to use the Implicit Function Theorem (IFT). As a function of the error of the inner problem resolution, we… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted in AISTATS 2024

  17. arXiv:2401.14325  [pdf, other

    cs.CV

    Unlocking Past Information: Temporal Embeddings in Cooperative Bird's Eye View Prediction

    Authors: Dominik Rößle, Jeremias Gerner, Klaus Bogenberger, Daniel Cremers, Stefanie Schmidtner, Torsten Schön

    Abstract: Accurate and comprehensive semantic segmentation of Bird's Eye View (BEV) is essential for ensuring safe and proactive navigation in autonomous driving. Although cooperative perception has exceeded the detection capabilities of single-agent systems, prevalent camera-based algorithms in cooperative perception neglect valuable information derived from historical observations. This limitation becomes… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  18. arXiv:2312.09800  [pdf, other

    cs.CV cs.RO

    Deep Event Visual Odometry

    Authors: Simon Klenk, Marvin Motzet, Lukas Koestler, Daniel Cremers

    Abstract: Event cameras offer the exciting possibility of tracking the camera's pose during high-speed motion and in adverse lighting conditions. Despite this promise, existing event-based monocular visual odometry (VO) approaches demonstrate limited performance on recent benchmarks. To address this limitation, some methods resort to additional sensors such as IMUs, stereo event cameras, or frame-based came… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by 3DV 2024

  19. arXiv:2312.03209  [pdf, other

    cs.CV

    Cache Me if You Can: Accelerating Diffusion Models through Block Caching

    Authors: Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang

    Abstract: Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of diffusion models is that the image generation process is costly. A large image-to-image network has to be applied many times to iteratively refine an image from random noise. While many recent works propose techniques to reduce th… ▽ More

    Submitted 12 January, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://fwmb.github.io/blockcaching/

  20. arXiv:2311.17810  [pdf, other

    cs.CV

    Coloring the Past: Neural Historical Buildings Reconstruction from Archival Photography

    Authors: David Komorowicz, Lu Sang, Ferdinand Maiwald, Daniel Cremers

    Abstract: Historical buildings are a treasure and milestone of human cultural heritage. Reconstructing the 3D models of these building hold significant value. The rapid development of neural rendering methods makes it possible to recover the 3D shape only based on archival photographs. However, this task presents considerable challenges due to the limitations of such datasets. Historical photographs are oft… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  21. arXiv:2311.17634  [pdf, other

    cs.CV

    Erasing the Ephemeral: Joint Camera Refinement and Transient Object Removal for Street View Synthesis

    Authors: Mreenav Shyam Deka, Lu Sang, Daniel Cremers

    Abstract: Synthesizing novel views for urban environments is crucial for tasks like autonomous driving and virtual tours. Compared to object-level or indoor situations, outdoor settings present unique challenges, such as inconsistency across frames due to moving vehicles and camera pose drift over lengthy sequences. In this paper, we introduce a method that tackles these challenges on view synthesis for out… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  22. arXiv:2311.15977  [pdf, other

    cs.CV

    Text2Loc: 3D Point Cloud Localization from Natural Language

    Authors: Yan Xia, Letian Shi, Zifeng Ding, João F. Henriques, Daniel Cremers

    Abstract: We tackle the problem of 3D point cloud localization based on a few natural linguistic descriptions and introduce a novel neural network, Text2Loc, that fully interprets the semantic relationship between points and text. Text2Loc follows a coarse-to-fine localization pipeline: text-submap global place recognition, followed by fine localization. In global place recognition, relational dynamics amon… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024

  23. arXiv:2311.03964  [pdf, other

    cs.CV

    Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining

    Authors: Ugur Sahin, Hang Li, Qadeer Khan, Daniel Cremers, Volker Tresp

    Abstract: Contemporary large-scale visual language models (VLMs) exhibit strong representation capacities, making them ubiquitous for enhancing image and text understanding tasks. They are often trained in a contrastive manner on a large and diverse corpus of images and corresponding text captions scraped from the internet. Despite this, VLMs often struggle with compositional reasoning tasks which require a… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV

  24. arXiv:2310.07522  [pdf, other

    cs.CV

    S4C: Self-Supervised Semantic Scene Completion with Neural Fields

    Authors: Adrian Hayler, Felix Wimbauer, Dominik Muhle, Christian Rupprecht, Daniel Cremers

    Abstract: 3D semantic scene understanding is a fundamental challenge in computer vision. It enables mobile agents to autonomously plan and navigate arbitrary environments. SSC formalizes this challenge as jointly estimating dense geometry and semantic information from sparse observations of a scene. Current methods for SSC are generally trained on 3D ground truth based on aggregated LiDAR scans. This proces… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

  25. arXiv:2310.06707  [pdf, other

    cs.CL cs.AI

    Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model

    Authors: Christian Tomani, David Vilar, Markus Freitag, Colin Cherry, Subhajit Naskar, Mara Finkelstein, Xavier Garcia, Daniel Cremers

    Abstract: Maximum-a-posteriori (MAP) decoding is the most widely used decoding strategy for neural machine translation (NMT) models. The underlying assumption is that model probability correlates well with human judgment, with better translations getting assigned a higher score by the model. However, research has shown that this assumption does not always hold, and generation quality can be improved by deco… ▽ More

    Submitted 25 March, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  26. arXiv:2310.02232  [pdf, other

    cs.LG cs.SI

    HoloNets: Spectral Convolutions do extend to Directed Graphs

    Authors: Christian Koke, Daniel Cremers

    Abstract: Within the graph learning community, conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs: Only there could the existence of a well-defined graph Fourier transform be guaranteed, so that information may be translated between spatial- and spectral domains. Here we show this traditional reliance on the graph Fourier transform to be superfluous a… ▽ More

    Submitted 10 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.00431

  27. arXiv:2310.00431  [pdf, other

    cs.LG

    ResolvNet: A Graph Convolutional Network with multi-scale Consistency

    Authors: Christian Koke, Abhishek Saroha, Yuesong Shen, Marvin Eisenberger, Daniel Cremers

    Abstract: It is by now a well known fact in the graph learning community that the presence of bottlenecks severely limits the ability of graph neural networks to propagate information over long distances. What so far has not been appreciated is that, counter-intuitively, also the presence of strongly connected sub-graphs may severely restrict information flow in common architectures. Motivated by this obser… ▽ More

    Submitted 30 October, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

  28. arXiv:2309.06199  [pdf, other

    cs.CV cs.AI

    SCP: Scene Completion Pre-training for 3D Object Detection

    Authors: Yiming Shan, Yan Xia, Yuhong Chen, Daniel Cremers

    Abstract: 3D object detection using LiDAR point clouds is a fundamental task in the fields of computer vision, robotics, and autonomous driving. However, existing 3D detectors heavily rely on annotated datasets, which are both time-consuming and prone to errors during the process of labeling 3D bounding boxes. In this paper, we propose a Scene Completion Pre-training (SCP) method to enhance the performance… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Wins the best paper award at ISPRS Geospatial Week 2023

  29. arXiv:2309.05013  [pdf, other

    cs.CV

    Geometrically Consistent Partial Shape Matching

    Authors: Viktoria Ehm, Paul Roetzer, Marvin Eisenberger, Maolin Gao, Florian Bernard, Daniel Cremers

    Abstract: Finding correspondences between 3D shapes is a crucial problem in computer vision and graphics, which is for example relevant for tasks like shape interpolation, pose transfer, or texture transfer. An often neglected but essential property of matchings is geometric consistency, which means that neighboring triangles in one shape are consistently matched to neighboring triangles in the other shape.… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  30. arXiv:2308.16215  [pdf, other

    eess.IV cs.CV cs.LG cs.MM

    Deep Video Codec Control for Vision Models

    Authors: Christoph Reich, Biplob Debnath, Deep Patel, Tim Prangemeier, Daniel Cremers, Srimat Chakradhar

    Abstract: Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that stan… ▽ More

    Submitted 16 April, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

  31. arXiv:2308.08393  [pdf, other

    cs.CV

    SIGMA: Scale-Invariant Global Sparse Shape Matching

    Authors: Maolin Gao, Paul Roetzer, Marvin Eisenberger, Zorah Lähner, Michael Moeller, Daniel Cremers, Florian Bernard

    Abstract: We propose a novel mixed-integer programming (MIP) formulation for generating precise sparse correspondences for highly non-rigid shapes. To this end, we introduce a projected Laplace-Beltrami operator (PLBO) which combines intrinsic and extrinsic geometric information to measure the deformation quality induced by predicted correspondences. We integrate the PLBO, together with an orientation-aware… ▽ More

    Submitted 3 April, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 14 pages

  32. arXiv:2308.08380  [pdf, other

    cs.RO cs.CV

    Robust Autonomous Vehicle Pursuit without Expert Steering Labels

    Authors: Jiaxin Pan, Changyao Zhou, Mariia Gladkova, Qadeer Khan, Daniel Cremers

    Abstract: In this work, we present a learning method for lateral and longitudinal motion control of an ego-vehicle for vehicle pursuit. The car being controlled does not have a pre-defined route, rather it reactively adapts to follow a target vehicle while maintaining a safety distance. To train our model, we do not rely on steering labels recorded from an expert driver but effectively leverage a classical… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 9 pages, 4 figures, 3 tables

  33. arXiv:2308.01766  [pdf, other

    cs.CV math.AP

    Neural Poisson Surface Reconstruction: Resolution-Agnostic Shape Reconstruction from Point Clouds

    Authors: Hector Andrade-Loarca, Julius Hege, Daniel Cremers, Gitta Kutyniok

    Abstract: We introduce Neural Poisson Surface Reconstruction (nPSR), an architecture for shape reconstruction that addresses the challenge of recovering 3D shapes from points. Traditional deep neural networks face challenges with common 3D shape discretization techniques due to their computational complexity at higher resolutions. To overcome this, we leverage Fourier Neural Operators to solve the Poisson e… ▽ More

    Submitted 28 November, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

  34. arXiv:2308.01424  [pdf, other

    cs.CV

    LiDAR View Synthesis for Robust Vehicle Navigation Without Expert Labels

    Authors: Jonathan Schmidt, Qadeer Khan, Daniel Cremers

    Abstract: Deep learning models for self-driving cars require a diverse training dataset to manage critical driving scenarios on public roads safely. This includes having data from divergent trajectories, such as the oncoming traffic lane or sidewalks. Such data would be too dangerous to collect in the real world. Data augmentation approaches have been proposed to tackle this issue using RGB images. However,… ▽ More

    Submitted 5 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  35. Multi Agent Navigation in Unconstrained Environments using a Centralized Attention based Graphical Neural Network Controller

    Authors: Yining Ma, Qadeer Khan, Daniel Cremers

    Abstract: In this work, we propose a learning based neural model that provides both the longitudinal and lateral control commands to simultaneously navigate multiple vehicles. The goal is to ensure that each vehicle reaches a desired target state without colliding with any other vehicle or obstacle in an unconstrained environment. The model utilizes an attention based Graphical Neural Network paradigm that… ▽ More

    Submitted 10 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    Journal ref: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 2023, pp. 2893-2900

  36. arXiv:2307.15063  [pdf, other

    cs.CV

    To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation

    Authors: Marc Botet Colomer, Pier Luigi Dovesi, Theodoros Panagiotakopoulos, Joao Frederico Carvalho, Linus Härenstam-Nielsen, Hossein Azizpour, Hedvig Kjellström, Daniel Cremers, Matteo Poggi

    Abstract: The goal of Online Domain Adaptation for semantic segmentation is to handle unforeseeable domain changes that occur during deployment, like sudden weather events. However, the high computational costs associated with brute-force adaptation make this paradigm unfeasible for real-world applications. In this paper we propose HAMLET, a Hardware-Aware Modular Least Expensive Training framework for real… ▽ More

    Submitted 7 August, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: ICCV 2023. The first two authors contributed equally. Project page: https://marcbotet.github.io/hamlet-web/

  37. arXiv:2307.07753  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks

    Authors: Dominik Schnaus, Jongseok Lee, Daniel Cremers, Rudolph Triebel

    Abstract: In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained mod… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Accepted to ICML 2023

  38. arXiv:2306.02099  [pdf, other

    cs.CV

    Enhancing Surface Neural Implicits with Curvature-Guided Sampling and Uncertainty-Augmented Representations

    Authors: Lu Sang, Abhishek Saroha, Maolin Gao, Daniel Cremers

    Abstract: Neural implicits have become popular for representing surfaces because they offer an adaptive resolution and support arbitrary topologies. While previous works rely on ground truth point clouds, they often ignore the effect of input quality and sampling methods during reconstructing process. In this paper, we introduce a sampling method with an uncertainty-augmented surface implicit representation… ▽ More

    Submitted 12 December, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: 8 pages

  39. arXiv:2305.09602  [pdf, other

    cs.CV

    Urban-StyleGAN: Learning to Generate and Manipulate Images of Urban Scenes

    Authors: George Eskandar, Youssef Farag, Tarun Yenamandra, Daniel Cremers, Karim Guirguis, Bin Yang

    Abstract: A promise of Generative Adversarial Networks (GANs) is to provide cheap photorealistic data for training and validating AI models in autonomous driving. Despite their huge success, their performance on complex images featuring multiple objects is understudied. While some frameworks produce high-quality street scenes with little to no control over the image content, others offer more control at the… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  40. arXiv:2305.09527  [pdf, other

    cs.CV

    Learning Correspondence Uncertainty via Differentiable Nonlinear Least Squares

    Authors: Dominik Muhle, Lukas Koestler, Krishna Murthy Jatavallabhula, Daniel Cremers

    Abstract: We propose a differentiable nonlinear least squares framework to account for uncertainty in relative pose estimation from feature correspondences. Specifically, we introduce a symmetric version of the probabilistic normal epipolar constraint, and an approach to estimate the covariance of feature positions by differentiating through the camera pose estimation procedure. We evaluate our approach on… ▽ More

    Submitted 18 May, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

  41. arXiv:2305.08628  [pdf, other

    cs.CV

    Non-Separable Multi-Dimensional Network Flows for Visual Computing

    Authors: Viktoria Ehm, Daniel Cremers, Florian Bernard

    Abstract: Flows in networks (or graphs) play a significant role in numerous computer vision tasks. The scalar-valued edges in these graphs often lead to a loss of information and thereby to limitations in terms of expressiveness. For example, oftentimes high-dimensional data (e.g. feature descriptors) are mapped to a single scalar value (e.g. the similarity between two feature descriptors). To overcome this… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  42. arXiv:2305.07524  [pdf

    physics.med-ph cs.AI

    Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution

    Authors: Hoai Nam Dang, Vladimir Golkov, Thomas Wimmer, Daniel Cremers, Andreas Maier, Moritz Zaiss

    Abstract: Current MRI super-resolution (SR) methods only use existing contrasts acquired from typical clinical sequences as input for the neural network (NN). In turbo spin echo sequences (TSE) the sequence parameters can have a strong influence on the actual resolution of the acquired image and have consequently a considera-ble impact on the performance of the NN. We propose a known-operator learning appro… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: 13 pages, 4 figures, 3 tables, submitted to MICCAI 2023 for review

  43. arXiv:2305.06314  [pdf, other

    cs.CV cs.AI cs.LG

    Scan2LoD3: Reconstructing semantic 3D building models at LoD3 using ray casting and Bayesian networks

    Authors: Olaf Wysocki, Yan Xia, Magdalena Wysocki, Eleonora Grilli, Ludwig Hoegner, Daniel Cremers, Uwe Stilla

    Abstract: Reconstructing semantic 3D building models at the level of detail (LoD) 3 is a long-standing challenge. Unlike mesh-based models, they require watertight geometry and object-wise semantics at the façade level. The principal challenge of such demanding semantic 3D reconstruction is reliable façade-level semantic segmentation of 3D input data. We present a novel method, called Scan2LoD3, that accura… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted for Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2023

  44. arXiv:2304.05864  [pdf, other

    cs.CV cs.LG

    Scale-Equivariant Deep Learning for 3D Data

    Authors: Thomas Wimmer, Vladimir Golkov, Hoai Nam Dang, Moritz Zaiss, Andreas Maier, Daniel Cremers

    Abstract: The ability of convolutional neural networks (CNNs) to recognize objects regardless of their position in the image is due to the translation-equivariance of the convolutional operation. Group-equivariant CNNs transfer this equivariance to other transformations of the input. Dealing appropriately with objects and object parts of different scale is challenging, and scale can vary for multiple reason… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 12 pages, 4 figures

  45. arXiv:2302.05118  [pdf, other

    cs.LG cs.AI

    Beyond In-Domain Scenarios: Robust Density-Aware Calibration

    Authors: Christian Tomani, Futa Waseda, Yuesong Shen, Daniel Cremers

    Abstract: Calibrating deep learning models to yield uncertainty-aware predictions is crucial as deep neural networks get increasingly deployed in safety-critical applications. While existing post-hoc calibration methods achieve impressive results on in-domain test datasets, they are limited by their inability to yield reliable uncertainty estimates in domain-shift and out-of-domain (OOD) scenarios. We aim t… ▽ More

    Submitted 4 July, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

    Comments: In Proceedings of the International Conference on Machine Learning (ICML), 2023. Code available at https://github.com/futakw/DensityAwareCalibration

  46. arXiv:2301.11431  [pdf, other

    cs.CV math.OC

    Semidefinite Relaxations for Robust Multiview Triangulation

    Authors: Linus Härenstam-Nielsen, Niclas Zeller, Daniel Cremers

    Abstract: We propose an approach based on convex relaxations for certifiably optimal robust multiview triangulation. To this end, we extend existing relaxation approaches to non-robust multiview triangulation by incorporating a truncated least squares cost function. We propose two formulations, one based on epipolar constraints and one based on fractional reprojection constraints. The first is lower dimensi… ▽ More

    Submitted 5 April, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

  47. arXiv:2301.07668  [pdf, other

    cs.CV

    Behind the Scenes: Density Fields for Single View Reconstruction

    Authors: Felix Wimbauer, Nan Yang, Christian Rupprecht, Daniel Cremers

    Abstract: Inferring a meaningful geometric scene representation from a single image is a fundamental problem in computer vision. Approaches based on traditional depth map prediction can only reason about areas that are visible in the image. Currently, neural radiance fields (NeRFs) can capture true 3D including color, but are too complex to be generated from a single image. As an alternative, we propose to… ▽ More

    Submitted 19 April, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Project Page: https://fwmb.github.io/bts/

  48. arXiv:2301.02561  [pdf, other

    cs.RO cs.AI

    Multi-Vehicle Trajectory Prediction at Intersections using State and Intention Information

    Authors: Dekai Zhu, Qadeer Khan, Daniel Cremers

    Abstract: Traditional approaches to prediction of future trajectory of road agents rely on knowing information about their past trajectory. This work rather relies only on having knowledge of the current state and intended direction to make predictions for multiple vehicles at intersections. Furthermore, message passing of this information between the vehicles provides each one of them a more holistic overv… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

  49. arXiv:2301.01147  [pdf, other

    cs.CV

    4Seasons: Benchmarking Visual SLAM and Long-Term Localization for Autonomous Driving in Challenging Conditions

    Authors: Patrick Wenzel, Nan Yang, Rui Wang, Niclas Zeller, Daniel Cremers

    Abstract: In this paper, we present a novel visual SLAM and long-term localization benchmark for autonomous driving in challenging conditions based on the large-scale 4Seasons dataset. The proposed benchmark provides drastic appearance variations caused by seasonal changes and diverse weather and illumination conditions. While significant progress has been made in advancing visual SLAM on small-scale datase… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.06364

  50. arXiv:2212.10368  [pdf, other

    cs.CV

    Masked Event Modeling: Self-Supervised Pretraining for Event Cameras

    Authors: Simon Klenk, David Bonello, Lukas Koestler, Nikita Araslanov, Daniel Cremers

    Abstract: Event cameras asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. However, annotation of event data is a costly and laborious process, which limits the use of deep learning methods for classification and other semantic tasks with the event modality. To reduce the dependency on labeled event data, we introduce Masked Event Modeling (MEM), a… ▽ More

    Submitted 23 December, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: To appear at WACV 2024. Code: https://github.com/tum-vision/mem