Skip to main content

Showing 1–50 of 137 results for author: Geiger, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15349  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking

    Authors: Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, Kashyap Chitta

    Abstract: Benchmarking vision-based driving policies is challenging. On one hand, open-loop evaluation with real data is easy, but these results do not reflect closed-loop performance. On the other, closed-loop evaluation is possible in simulation, but is hard to scale due to its significant computational demands. Further, the simulators available today exhibit a large domain gap to real data. This has resu… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.09458  [pdf, other

    cs.CV cs.AI cs.CL

    Updating CLIP to Prefer Descriptions Over Captions

    Authors: Amir Zur, Elisa Kreiss, Karel D'Oosterlinck, Christopher Potts, Atticus Geiger

    Abstract: Although CLIPScore is a powerful generic metric that captures the similarity between a text and an image, it fails to distinguish between a caption that is meant to complement the information in an image and a description that is meant to replace an image entirely, e.g., for accessibility. We address this shortcoming by updating the CLIP model with the Concadia dataset to assign higher scores to d… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2405.17398  [pdf, other

    cs.CV cs.AI

    Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

    Authors: Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, Hongyang Li

    Abstract: World models can foresee the outcomes of different actions, which is of paramount importance for autonomous driving. Nevertheless, existing driving world models still have limitations in generalization to unseen environments, prediction fidelity of critical details, and action controllability for flexible application. In this paper, we present Vista, a generalizable driving world model with high f… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Code and model: https://github.com/OpenDriveLab/Vista, video demos: https://vista-demo.github.io

  4. arXiv:2405.06336  [pdf, other

    cs.RO

    Efficient End-to-End Detection of 6-DoF Grasps for Robotic Bin Picking

    Authors: Yushi Liu, Alexander Qualmann, Zehao Yu, Miroslav Gabriel, Philipp Schillinger, Markus Spies, Ngo Anh Vien, Andreas Geiger

    Abstract: Bin picking is an important building block for many robotic systems, in logistics, production or in household use-cases. In recent years, machine learning methods for the prediction of 6-DoF grasps on diverse and unknown objects have shown promising progress. However, existing approaches only consider a single ground truth grasp orientation at a grasp location during training and therefore can onl… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  5. arXiv:2405.01126  [pdf, other

    cs.CV

    Detecting and clustering swallow events in esophageal long-term high-resolution manometry

    Authors: Alexander Geiger, Lars Wagner, Daniel Rueckert, Dirk Wilhelm, Alissa Jell

    Abstract: High-resolution manometry (HRM) is the gold standard in diagnosing esophageal motility disorders. As HRM is typically conducted under short-term laboratory settings, intermittently occurring disorders are likely to be missed. Therefore, long-term (up to 24h) HRM (LTHRM) is used to gain detailed insights into the swallowing behavior. However, analyzing the extensive data from LTHRM is challenging a… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  6. arXiv:2404.10772  [pdf, other

    cs.CV

    Gaussian Opacity Fields: Efficient and Compact Surface Reconstruction in Unbounded Scenes

    Authors: Zehao Yu, Torsten Sattler, Andreas Geiger

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficie… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Project page: https://niu**shuchong.github.io/gaussian-opacity-fields

  7. arXiv:2404.03592  [pdf, other

    cs.CL cs.AI cs.LG

    ReFT: Representation Finetuning for Language Models

    Authors: Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

    Abstract: Parameter-efficient finetuning (PEFT) methods seek to adapt large neural models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by develo** a family of Representation Finetuning (ReFT) methods.… ▽ More

    Submitted 22 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: preprint

  8. arXiv:2403.17933  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    SLEDGE: Synthesizing Simulation Environments for Driving Agents with Generative Models

    Authors: Kashyap Chitta, Daniel Dauner, Andreas Geiger

    Abstract: SLEDGE is the first generative simulator for vehicle motion planning trained on real-world driving logs. Its core component is a learned model that is able to generate agent bounding boxes and lane graphs. The model's outputs serve as an initial state for traffic simulation. The unique properties of the entities to be generated for SLEDGE, such as their connectivity and variable count per scene, r… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  9. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields

    Authors: Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao

    Abstract: 3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance… ▽ More

    Submitted 9 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 13 pages, 12 figures

  10. arXiv:2403.14627  [pdf, other

    cs.CV

    MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

    Authors: Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

    Abstract: We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane swee** in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian prim… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://donydchen.github.io/mvsplat Code: https://github.com/donydchen/mvsplat

  11. arXiv:2403.12722  [pdf, other

    cs.CV

    HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

    Authors: Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao

    Abstract: Holistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Our project page is at https://xdimlab.github.io/hugs_website

  12. arXiv:2403.09630  [pdf, other

    cs.CV

    Generalized Predictive Model for Autonomous Driving

    Authors: Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, ** Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li

    Abstract: In this paper, we introduce the first large-scale video prediction model in the autonomous driving discipline. To eliminate the restriction of high-cost data collection and empower the generalization ability of our model, we acquire massive data from the web and pair it with diverse and high-quality text descriptions. The resultant dataset accumulates over 2000 hours of driving videos, spanning ar… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  13. arXiv:2403.09593  [pdf, other

    cs.CV

    Renovating Names in Open-Vocabulary Segmentation Benchmarks

    Authors: Haiwen Huang, Songyou Peng, Dan Zhang, Andreas Geiger

    Abstract: Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation ben… ▽ More

    Submitted 24 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  14. arXiv:2403.07809  [pdf, other

    cs.LG cs.CL

    pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

    Authors: Zhengxuan Wu, Atticus Geiger, Aryaman Arora, **g Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

    Abstract: Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce $\textbf{pyvene}$, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. $\textbf{pyvene}$ supports complex intervention schemes with an intuiti… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures

  15. arXiv:2403.07071  [pdf, other

    cs.CV

    LISO: Lidar-only Self-Supervised 3D Object Detection

    Authors: Stefan Baur, Frank Moosmann, Andreas Geiger

    Abstract: 3D object detection is one of the most important components in any Self-Driving stack, but current state-of-the-art (SOTA) lidar object detectors require costly & slow manual annotation of 3D bounding boxes to perform well. Recently, several methods emerged to generate pseudo ground truth without human supervision, however, all of these methods have various drawbacks: Some methods require sensor r… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  16. arXiv:2402.17700  [pdf, other

    cs.CL cs.LG

    RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

    Authors: **g Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger

    Abstract: Individual neurons participate in the representation of multiple high-level concepts. To what extent can different interpretability methods successfully disentangle these roles? To help address this question, we introduce RAVEL (Resolving Attribute-Value Entanglements in Language Models), a dataset that enables tightly controlled, quantitative comparisons between a variety of existing interpretabi… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  17. arXiv:2402.12377  [pdf, other

    cs.CV

    Binary Opacity Grids: Capturing Fine Geometric Detail for Mesh-Based View Synthesis

    Authors: Christian Reiser, Stephan Garbin, Pratul P. Srinivasan, Dor Verbin, Richard Szeliski, Ben Mildenhall, Jonathan T. Barron, Peter Hedman, Andreas Geiger

    Abstract: While surface-based view synthesis algorithms are appealing due to their low computational requirements, they often struggle to reproduce thin structures. In contrast, more expensive methods that model the scene's geometry as a volumetric density field (e.g. NeRF) excel at reconstructing fine geometric detail. However, density fields often represent geometry in a "fuzzy" manner, which hinders exac… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Project page at https://binary-opacity-grid.github.io

  18. arXiv:2401.12631  [pdf, other

    cs.LG cs.AI cs.CL

    A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

    Authors: Zhengxuan Wu, Atticus Geiger, **g Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

    Abstract: We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions". We first review Makelov et al. (2023)'s technical notion of what an "interpretability illusion" is, and then we show that even intuitive and desirabl… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 20 pages, 14 figures

  19. arXiv:2312.14150  [pdf, other

    cs.CV

    DriveLM: Driving with Graph Visual Question Answering

    Authors: Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, ** Luo, Andreas Geiger, Hongyang Li

    Abstract: We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems to boost generalization and enable interactivity with human users. While recent approaches adapt VLMs to driving via single-round visual question answering (VQA), human drivers reason about decisions in multiple steps. Starting from the localization of key objects, humans estimate… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  20. arXiv:2312.13328  [pdf, other

    cs.CV

    NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis

    Authors: Zinuo You, Andreas Geiger, Anpei Chen

    Abstract: We present NeLF-Pro, a novel representation to model and reconstruct light fields in diverse natural scenes that vary in extent and spatial granularity. In contrast to previous fast reconstruction methods that represent the 3D scene globally, we model the light field of a scene as a set of local light field feature probes, parameterized with position and multi-channel 2D feature maps. Our central… ▽ More

    Submitted 22 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Conference Paper, Camera Ready Version

  21. arXiv:2312.09228  [pdf, other

    cs.CV

    3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

    Authors: Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang

    Abstract: We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of c… ▽ More

    Submitted 4 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Project page: https://neuralbodies.github.io/3DGS-Avatar

  22. arXiv:2312.08365  [pdf, other

    cs.LG cs.AI

    An Invitation to Deep Reinforcement Learning

    Authors: Bernhard Jaeger, Andreas Geiger

    Abstract: Training a deep neural network to maximize a target objective has become the standard recipe for successful machine learning over the last decade. These networks can be optimized with supervised learning, if the target objective is differentiable. For many interesting problems, this is however not the case. Common objectives like intersection over union (IoU), bilingual evaluation understudy (BLEU… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  23. arXiv:2312.05210  [pdf, other

    cs.CV

    IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing

    Authors: Shaofei Wang, Božidar Antić, Andreas Geiger, Siyu Tang

    Abstract: We present IntrinsicAvatar, a novel approach to recovering the intrinsic properties of clothed human avatars including geometry, albedo, material, and environment lighting from only monocular videos. Recent advancements in human-based neural rendering have enabled high-quality geometry and appearance reconstruction of clothed humans from just monocular videos. However, these methods bake intrinsic… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 24 pages, 10 figures. Project page: https://neuralbodies.github.io/IntrinsicAvatar

  24. arXiv:2312.04565  [pdf, other

    cs.CV

    MuRF: Multi-Baseline Radiance Fields

    Authors: Haofei Xu, Anpei Chen, Yuedong Chen, Christos Sakaridis, Yulun Zhang, Marc Pollefeys, Andreas Geiger, Fisher Yu

    Abstract: We present Multi-Baseline Radiance Fields (MuRF), a general feed-forward approach to solving sparse view synthesis under multiple different baseline settings (small and large baselines, and different number of input views). To render a target novel view, we discretize the 3D space into planes parallel to the target image plane, and accordingly construct a target view frustum volume. Such a target… ▽ More

    Submitted 9 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: CVPR 2024, Project Page: https://haofeixu.github.io/murf/, Code: https://github.com/autonomousvision/murf

  25. arXiv:2312.00093  [pdf, other

    cs.CV cs.GR cs.LG

    GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

    Authors: Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, Bernhard Schölkopf

    Abstract: As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized… ▽ More

    Submitted 10 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: CVPR 2024 (18 pages, 11 figures, https://graphdreamer.github.io/)

  26. arXiv:2311.16493  [pdf, other

    cs.CV

    Mip-Splatting: Alias-free 3D Gaussian Splatting

    Authors: Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, Andreas Geiger

    Abstract: Recently, 3D Gaussian Splatting has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, \eg, by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To ad… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Project page: https://niu**shuchong.github.io/mip-splatting/

  27. arXiv:2311.13570  [pdf, other

    cs.CV

    WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space

    Authors: Katja Schwarz, Seung Wook Kim, Jun Gao, Sanja Fidler, Andreas Geiger, Karsten Kreis

    Abstract: Modern learning-based approaches to 3D-aware image synthesis achieve high photorealism and 3D-consistent viewpoint changes for the generated images. Existing approaches represent instances in a shared canonical space. However, for in-the-wild datasets a shared canonical system can be difficult to define or might not even exist. In this work, we instead model instances in view space, alleviating th… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

  28. arXiv:2310.19813  [pdf, ps, other

    cs.SE cs.AI cs.LG cs.NE

    Enhancing Genetic Improvement Mutations Using Large Language Models

    Authors: Alexander E. I. Brownlee, James Callan, Karine Even-Mendoza, Alina Geiger, Carol Hanna, Justyna Petke, Federica Sarro, Dominik Sobania

    Abstract: Large language models (LLMs) have been successfully applied to software engineering tasks, including program repair. However, their application in search-based techniques such as Genetic Improvement (GI) is still largely unexplored. In this paper, we evaluate the use of LLMs as mutation operators for GI to improve the search process. We expand the Gin Java GI toolkit to call OpenAI's API to genera… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted for publication at the Symposium on Search-Based Software Engineering (SSBSE) 2023

    Journal ref: Arcaini, P., Yue, T., Fredericks, E.M. (eds) Search-Based Software Engineering. SSBSE 2023. Lecture Notes in Computer Science, vol 14415. Springer, Cham

  29. arXiv:2310.15154  [pdf, other

    cs.LG cs.AI cs.CL

    Linear Representations of Sentiment in Large Language Models

    Authors: Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda

    Abstract: Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs). In this study, we reveal that across a range of models, sentiment is represented linearly: a single direction in activation space mostly captures the feature across a range of tasks with one extreme for positive and the other for negative. Through… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  30. arXiv:2310.10375  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers

    Authors: Takeru Miyato, Bernhard Jaeger, Max Welling, Andreas Geiger

    Abstract: As transformers are equivariant to the permutation of input tokens, encoding the positional information of tokens is necessary for many tasks. However, since existing positional encoding schemes have been initially designed for NLP tasks, their suitability for vision tasks, which typically exhibit different structural properties in their data, is questionable. We argue that existing positional enc… ▽ More

    Submitted 7 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Published as a conference paper at ICLR 2024

  31. arXiv:2309.10815  [pdf, other

    cs.CV

    PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes

    Authors: Xiao Fu, Shangzhan Zhang, Tianrun Chen, Yichong Lu, Xiaowei Zhou, Andreas Geiger, Yiyi Liao

    Abstract: Training perception systems for self-driving cars requires substantial annotations. However, manual labeling in 2D images is highly labor-intensive. While existing datasets provide rich annotations for pre-recorded sequences, they fall short in labeling rarely encountered viewpoints, potentially hampering the generalization ability for perception models. In this paper, we present PanopticNeRF-360,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Project page: http://fuxiao0719.github.io/projects/panopticnerf360/. arXiv admin note: text overlap with arXiv:2203.15224

  32. arXiv:2309.10312  [pdf, other

    cs.CL

    Rigorously Assessing Natural Language Explanations of Neurons

    Authors: **g Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts

    Abstract: Natural language is an appealing medium for explaining how large language models process and store information, but evaluating the faithfulness of such explanations is challenging. To help address this, we develop two modes of evaluation for natural language explanations that claim individual neurons represent a concept in a text input. In the observational mode, we evaluate claims that a neuron… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  33. arXiv:2308.12779  [pdf, other

    cs.CV cs.RO

    On Offline Evaluation of 3D Object Detection for Autonomous Driving

    Authors: Tim Schreier, Katrin Renz, Andreas Geiger, Kashyap Chitta

    Abstract: Prior work in 3D object detection evaluates models using offline metrics like average precision since closed-loop online evaluation on the downstream driving task is costly. However, it is unclear how indicative offline results are of driving performance. In this work, we perform the first empirical evaluation measuring how predictive different detection metrics are of driving performance when det… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Appears in: IEEE International Conference on Computer Vision (ICCV'23) Workshops

  34. arXiv:2306.16927  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    End-to-end Autonomous Driving: Challenges and Frontiers

    Authors: Li Chen, Penghao Wu, Kashyap Chitta, Bernhard Jaeger, Andreas Geiger, Hongyang Li

    Abstract: The autonomous driving community has witnessed a rapid growth in approaches that embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle motion plans, instead of concentrating on individual tasks such as detection and motion prediction. End-to-end systems, in comparison to modular pipelines, benefit from joint feature optimization for perception and planning. This… ▽ More

    Submitted 21 April, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

  35. arXiv:2306.07962  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Parting with Misconceptions about Learning-based Vehicle Motion Planning

    Authors: Daniel Dauner, Marcel Hallgarten, Andreas Geiger, Kashyap Chitta

    Abstract: The release of nuPlan marks a new era in vehicle motion planning research, offering the first large-scale real-world dataset and evaluation schemes requiring both precise short-term planning and long-horizon ego-forecasting. Existing systems struggle to simultaneously meet both requirements. Indeed, we find that these tasks are fundamentally misaligned and should be addressed independently. We fur… ▽ More

    Submitted 2 November, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: CoRL 2023

  36. arXiv:2306.07957  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Hidden Biases of End-to-End Driving Models

    Authors: Bernhard Jaeger, Kashyap Chitta, Andreas Geiger

    Abstract: End-to-end driving systems have recently made rapid progress, in particular on CARLA. Independent of their major contribution, they introduce changes to minor system components. Consequently, the source of improvements is unclear. We identify two biases that recur in nearly all state-of-the-art methods and are critical for the observed progress on CARLA: (1) lateral recovery via a strong inductive… ▽ More

    Submitted 17 August, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted at ICCV 2023. Camera ready version

  37. arXiv:2306.03747  [pdf, other

    cs.CV

    Towards Scalable Multi-View Reconstruction of Geometry and Materials

    Authors: Carolin Schmitt, Božidar Antić, Andrei Neculai, Joo Ho Lee, Andreas Geiger

    Abstract: In this paper, we propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be captured with stationary light stages. The input are high-resolution RGB-D images captured by a mobile, hand-held capture system with point lights for active illumination.… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  38. ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

    Authors: **gyuan Selena She, Christopher Potts, Samuel R. Bowman, Atticus Geiger

    Abstract: A number of recent benchmarks seek to assess how well models handle natural language negation. However, these benchmarks lack the controlled example paradigms that would allow us to infer whether a model had learned how negation morphemes semantically scope. To fill these analytical gaps, we present the Scoped Negation NLI (ScoNe-NLI) benchmark, which contains contrast sets of six examples with up… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  39. arXiv:2305.08809  [pdf, other

    cs.CL

    Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

    Authors: Zhengxuan Wu, Atticus Geiger, Thomas Icard, Christopher Potts, Noah D. Goodman

    Abstract: Obtaining human-interpretable explanations of large, general-purpose language models is an urgent goal for AI safety. However, it is just as important that our interpretability methods are faithful to the causal dynamics underlying model behavior and able to robustly generalize to unseen inputs. Distributed Alignment Search (DAS) is a powerful gradient descent method grounded in a theory of causal… ▽ More

    Submitted 6 February, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 with Author Corrections

  40. arXiv:2305.02312  [pdf, other

    cs.CV

    AG3D: Learning to Generate 3D Avatars from 2D Image Collections

    Authors: Zijian Dong, Xu Chen, **long Yang, Michael J. Black, Otmar Hilliges, Andreas Geiger

    Abstract: While progress in 2D generative models of human appearance has been rapid, many applications require 3D avatars that can be animated and rendered. Unfortunately, most existing methods for learning generative models of 3D humans with diverse shape and appearance require 3D training data, which is limited and expensive to acquire. The key to progress is hence to learn generative models of 3D avatars… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: Project Page: https://zj-dong.github.io/AG3D/

  41. arXiv:2303.02536  [pdf, other

    cs.AI

    Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

    Authors: Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman

    Abstract: Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and… ▽ More

    Submitted 21 February, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

  42. arXiv:2302.12249  [pdf, other

    cs.CV cs.GR

    MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in Unbounded Scenes

    Authors: Christian Reiser, Richard Szeliski, Dor Verbin, Pratul P. Srinivasan, Ben Mildenhall, Andreas Geiger, Jonathan T. Barron, Peter Hedman

    Abstract: Neural radiance fields enable state-of-the-art photorealistic view synthesis. However, existing radiance field representations are either too compute-intensive for real-time rendering or require too much memory to scale to large scenes. We present a Memory-Efficient Radiance Field (MERF) representation that achieves real-time rendering of large-scale scenes in a browser. MERF reduces the memory co… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Video and interactive web demo available at https://merf42.github.io

  43. arXiv:2302.04301  [pdf, other

    cs.NE

    Down-Sampled Epsilon-Lexicase Selection for Real-World Symbolic Regression Problems

    Authors: Alina Geiger, Dominik Sobania, Franz Rothlauf

    Abstract: Epsilon-lexicase selection is a parent selection method in genetic programming that has been successfully applied to symbolic regression problems. Recently, the combination of random subsampling with lexicase selection significantly improved performance in other genetic programming domains such as program synthesis. However, the influence of subsampling on the solution quality of real-world symbol… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  44. arXiv:2302.03594  [pdf, other

    cs.CV

    NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM

    Authors: Zihan Zhu, Songyou Peng, Viktor Larsson, Zhaopeng Cui, Martin R. Oswald, Andreas Geiger, Marc Pollefeys

    Abstract: Neural implicit representations have recently become popular in simultaneous localization and map** (SLAM), especially in dense visual SLAM. However, previous works in this direction either rely on RGB-D sensors, or require a separate monocular SLAM approach for camera tracking and do not produce high-fidelity dense 3D scene reconstruction. In this paper, we present NICER-SLAM, a dense RGB SLAM… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: Video: https://youtu.be/tUXzqEZWg2w

  45. arXiv:2302.01226  [pdf, other

    cs.CV cs.GR cs.LG

    Factor Fields: A Unified Framework for Neural Fields and Beyond

    Authors: Anpei Chen, Zexiang Xu, Xinyue Wei, Siyu Tang, Hao Su, Andreas Geiger

    Abstract: We present Factor Fields, a novel framework for modeling and representing signals. Factor Fields decomposes a signal into a product of factors, each represented by a classical or neural field representation which operates on transformed input coordinates. This decomposition results in a unified framework that accommodates several recent signal representations including NeRF, Plenoxels, EG3D, Insta… ▽ More

    Submitted 27 July, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 13 pages, 7 figures; Project Page: https://apchenstu.github.io/FactorFields/

  46. arXiv:2301.09515  [pdf, other

    cs.LG cs.CV

    StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

    Authors: Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila

    Abstract: Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: Project page: https://sites.google.com/view/stylegan-t/

  47. arXiv:2301.04709  [pdf, ps, other

    cs.AI

    Causal Abstraction for Faithful Model Interpretation

    Authors: Atticus Geiger, Chris Potts, Thomas Icard

    Abstract: A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model. We argue that the theory of causal abstraction provides the mathematical foundations for the desired kinds of model explanations. In causal abstraction analysis, w… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

  48. arXiv:2212.11720  [pdf, other

    cs.CV cs.LG

    GOOD: Exploring Geometric Cues for Detecting Objects in an Open World

    Authors: Haiwen Huang, Andreas Geiger, Dan Zhang

    Abstract: We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are… ▽ More

    Submitted 3 February, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Published as a conference paper at ICLR 2023

  49. arXiv:2211.15601  [pdf, other

    cs.CV

    Fast-SNARF: A Fast Deformer for Articulated Neural Fields

    Authors: Xu Chen, Tianjian Jiang, Jie Song, Max Rietmann, Andreas Geiger, Michael J. Black, Otmar Hilliges

    Abstract: Neural fields have revolutionized the area of 3D reconstruction and novel view synthesis of rigid scenes. A key challenge in making such methods applicable to articulated objects, such as the human body, is to model the deformation of 3D locations between the rest pose (a canonical space) and the deformed space. We propose a new articulation module for neural fields, Fast-SNARF, which finds accura… ▽ More

    Submitted 1 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: github page: https://github.com/xuchen-ethz/fast-snarf

  50. arXiv:2211.12270  [pdf, other

    cs.AI

    Causal Abstraction with Soft Interventions

    Authors: Riccardo Massidda, Atticus Geiger, Thomas Icard, Davide Bacciu

    Abstract: Causal abstraction provides a theory describing how several causal models can represent the same system at different levels of detail. Existing theoretical proposals limit the analysis of abstract models to "hard" interventions fixing causal variables to be constant values. In this work, we extend causal abstraction to "soft" interventions, which assign possibly non-constant functions to variables… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.