Skip to main content

Showing 1–50 of 107 results for author: Koltun, V

.
  1. arXiv:2405.07515  [pdf, other

    cs.RO cs.AI cs.LG

    OpenBot-Fleet: A System for Collective Learning with Real Robots

    Authors: Matthias Müller, Samarth Brahmbhatt, Ankur Deka, Quentin Leboutet, David Hafner, Vladlen Koltun

    Abstract: We introduce OpenBot-Fleet, a comprehensive open-source cloud robotics system for navigation. OpenBot-Fleet uses smartphones for sensing, local compute and communication, Google Firebase for secure cloud storage and off-board compute, and a robust yet low-cost wheeled robot toact in real-world environments. The robots collect task data and upload it to the cloud where navigation policies can be le… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted at ICRA'24

  2. Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning

    Authors: Yunlong Song, Angel Romero, Matthias Mueller, Vladlen Koltun, Davide Scaramuzza

    Abstract: A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contri… ▽ More

    Submitted 18 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Journal ref: Science Robotics, 2023

  3. arXiv:2308.15856  [pdf, other

    cs.LG stat.ML

    Domain Generalization without Excess Empirical Risk

    Authors: Ozan Sener, Vladlen Koltun

    Abstract: Given data from diverse sets of distinct distributions, domain generalization aims to learn models that generalize to unseen distributions. A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty. We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Published at NeurIPS 2022

  4. arXiv:2305.09253  [pdf, other

    cs.CV cs.LG

    Online Continual Learning Without the Storage Constraint

    Authors: Ameya Prabhu, Zhipeng Cai, Puneet Dokania, Philip Torr, Vladlen Koltun, Ozan Sener

    Abstract: Traditional online continual learning (OCL) research has primarily focused on mitigating catastrophic forgetting with fixed and limited storage allocation throughout an agent's lifetime. However, a broad range of real-world applications are primarily constrained by computational costs rather than storage limitations. In this paper, we target such applications, investigating the online continual le… ▽ More

    Submitted 2 November, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Tech Report [Additional Experiments and Improved ACM]

  5. arXiv:2303.12134  [pdf, other

    cs.CV cs.RO

    Monocular Visual-Inertial Depth Estimation

    Authors: Diana Wofk, René Ranftl, Matthias Müller, Vladlen Koltun

    Abstract: We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach performs global scale and shift alignment against sparse metric depth, followed by learning-based dense alignment. We evaluate on the TartanAir and VOID datasets, observing up to 30% reduction in inverse RMSE… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at ICRA'23

  6. arXiv:2210.06401  [pdf, other

    cs.CV

    Improving information retention in large scale online continual learning

    Authors: Zhipeng Cai, Vladlen Koltun, Ozan Sener

    Abstract: Given a stream of data sampled from non-stationary distributions, online continual learning (OCL) aims to adapt efficiently to new data while retaining existing knowledge. The typical approach to address information retention (the ability to retain previous knowledge) is kee** a replay buffer of a fixed size and computing gradients using a mixture of new data and the replay buffer. Surprisingly,… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  7. arXiv:2210.06036  [pdf, other

    cs.LG cs.GR

    Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics

    Authors: Lukas Prantl, Benjamin Ummenhofer, Vladlen Koltun, Nils Thuerey

    Abstract: We present a novel method for guaranteeing linear momentum in learned physics simulations. Unlike existing methods, we enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. We combine these strict constraints with a hierarchical network architecture, a carefully constructed resampling scheme, and a training approach for tempo… ▽ More

    Submitted 2 November, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

  8. arXiv:2205.01758  [pdf, other

    cs.LG cs.GR cs.RO

    Differentiable Simulation of Soft Multi-body Systems

    Authors: Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin

    Abstract: We present a method for differentiable simulation of soft articulated bodies. Our work enables the integration of differentiable physical dynamics into gradient-based pipelines. We develop a top-down matrix assembly algorithm within Projective Dynamics and derive a generalized dry friction model for soft continuum using a new matrix splitting strategy. We derive a differentiable control framework… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2021

  9. arXiv:2204.08399  [pdf, other

    cs.CV

    Unsupervised Contrastive Domain Adaptation for Semantic Segmentation

    Authors: Feihu Zhang, Vladlen Koltun, Philip Torr, René Ranftl, Stephan R. Richter

    Abstract: Semantic segmentation models struggle to generalize in the presence of domain shift. In this paper, we introduce contrastive learning for feature alignment in cross-domain adaptation. We assemble both in-domain contrastive pairs and cross-domain contrastive pairs to learn discriminative features that align across domains. Based on the resulting well-aligned feature representations we introduce a l… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

  10. arXiv:2204.04210  [pdf, other

    cs.CV eess.IV

    Dancing under the stars: video denoising in starlight

    Authors: Kristina Monakhova, Stephan R. Richter, Laura Waller, Vladlen Koltun

    Abstract: Imaging in low light is extremely challenging due to low photon counts. Using sensitive CMOS cameras, it is currently possible to take videos at night under moonlight (0.05-0.3 lux illumination). In this paper, we demonstrate photorealistic video under starlight (no moon present, $<$0.001 lux) for the first time. To enable this, we develop a GAN-tuned physics-based noise model to more accurately r… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: CVPR 2022. Project page: https://kristinamonakhova.com/starlight_denoising/

  11. arXiv:2203.13250  [pdf, other

    cs.CV

    Global Tracking Transformers

    Authors: Xingyi Zhou, Tianwei Yin, Vladlen Koltun, Philipp Krähenbühl

    Abstract: We present a novel transformer-based architecture for global multi-object tracking. Our network takes a short sequence of frames as input and produces global trajectories for all objects. The core component is a global tracking transformer that operates on objects from all frames in the sequence. The transformer encodes object features from all frames, and uses trajectory queries to group them int… ▽ More

    Submitted 25 April, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: CVPR 2022. Code is available at https://github.com/xingyizhou/GTR

  12. Learning robust perceptive locomotion for quadrupedal robots in the wild

    Authors: Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter

    Abstract: Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into under-explored areas. Exteroceptive perception is crucial for fast and energy-efficient locomotion: perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, utilizing… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Journal ref: Science Robotics, 19 Jan 2022, Vol 7, Issue 62

  13. arXiv:2201.03546  [pdf, other

    cs.CV cs.CL cs.LG

    Language-driven Semantic Segmentation

    Authors: Boyi Li, Kilian Q. Weinberger, Serge Belongie, Vladlen Koltun, René Ranftl

    Abstract: We present LSeg, a novel model for language-driven semantic image segmentation. LSeg uses a text encoder to compute embeddings of descriptive input labels (e.g., "grass" or "building") together with a transformer-based image encoder that computes dense per-pixel embeddings of the input image. The image encoder is trained with a contrastive objective to align pixel embeddings to the text embedding… ▽ More

    Submitted 2 April, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: ICLR 2022

  14. arXiv:2112.13762  [pdf, other

    cs.CV

    MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

    Authors: John Lambert, Zhuang Liu, Ozan Sener, James Hays, Vladlen Koltun

    Abstract: We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. A naive merge of the constituent datasets yields poor performance due to inconsistent taxonomies and annotation practices. We reconcile the taxonomies and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images, requiring more tha… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

  15. arXiv:2112.11377  [pdf, other

    cs.CV

    Shape from Polarization for Complex Scenes in the Wild

    Authors: Chenyang Lei, Chenyang Qi, Jiaxin Xie, Na Fan, Vladlen Koltun, Qifeng Chen

    Abstract: We present a new data-driven approach with physics-based priors to scene-level normal estimation from a single polarization image. Existing shape from polarization (SfP) works mainly focus on estimating the normal of a single object rather than complex scenes in the wild. A key barrier to high-quality scene-level SfP is the lack of real-world SfP data in complex scenes. Hence, we contribute the fi… ▽ More

    Submitted 20 April, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

    Comments: Accepted to CVPR 2022; Github link: https://github.com/ChenyangLEI/sfp-wild ;Project website: https://chenyanglei.github.io/sfpwild/index.html

  16. arXiv:2110.07641  [pdf, other

    cs.CV cs.AI

    Non-deep Networks

    Authors: Ankit Goyal, Alexey Bochkovskiy, Jia Deng, Vladlen Koltun

    Abstract: Depth is the hallmark of deep neural networks. But more depth means more sequential computation and higher latency. This begs the question -- is it possible to build high-performing "non-deep" neural networks? We show that it is. To do so, we use parallel subnetworks instead of stacking one layer after another. This helps effectively reduce depth while maintaining high performance. By utilizing pa… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

  17. arXiv:2110.05113  [pdf, other

    cs.RO cs.LG eess.SY

    Learning High-Speed Flight in the Wild

    Authors: Antonio Loquercio, Elia Kaufmann, René Ranftl, Matthias Müller, Vladlen Koltun, Davide Scaramuzza

    Abstract: Quadrotors are agile. Unlike most other machines, they can traverse extremely complex environments at high speeds. To date, only expert human pilots have been able to fully exploit their capabilities. Autonomous operation with on-board sensing and computation has been limited to low speeds. State-of-the-art methods generally separate the navigation problem into subtasks: sensing, map**, and plan… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: 16 pages (+7 supplementary)

    Journal ref: Science Robotics 2021 Vol. 6, Issue 59, abg5810

  18. arXiv:2110.00511  [pdf, other

    cs.CV cs.DC cs.GR cs.RO

    ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception

    Authors: Wei Dong, Yixing Lao, Michael Kaess, Vladlen Koltun

    Abstract: We present ASH, a modern and high-performance framework for parallel spatial hashing on GPU. Compared to existing GPU hash map implementations, ASH achieves higher performance, supports richer functionality, and requires fewer lines of code (LoC) when used for implementing spatially varying operations from volumetric geometry reconstruction to differentiable appearance reconstruction. Unlike exist… ▽ More

    Submitted 29 January, 2023; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: 18 pages, 19 figures

  19. arXiv:2109.15048  [pdf, other

    cs.LG physics.comp-ph

    Scale-invariant Learning by Physics Inversion

    Authors: Philipp Holl, Vladlen Koltun, Nils Thuerey

    Abstract: Solving inverse problems, such as parameter estimation and optimal control, is a vital part of science. Many experiments repeatedly collect data and rely on machine learning algorithms to quickly infer solutions to the associated inverse problems. We find that state-of-the-art training techniques are not well-suited to many problems that involve physical processes. The highly nonlinear behavior, c… ▽ More

    Submitted 13 October, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: NeurIPS 2022 version, appendix included

  20. arXiv:2109.07719  [pdf, other

    cs.LG cs.GR cs.RO

    Efficient Differentiable Simulation of Articulated Bodies

    Authors: Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin

    Abstract: We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks, and gradient-based optimization of neural networks that operate on articulated bodies. We derive the gradients of the forward dynamics using spatial algebra and the adjoint method. Our approach is an order of magnitude faster than a… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: ICML 2021

  21. arXiv:2108.09020  [pdf, other

    cs.LG cs.CV

    Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data

    Authors: Zhipeng Cai, Ozan Sener, Vladlen Koltun

    Abstract: Continual learning is the problem of learning and retaining knowledge through time over multiple tasks and environments. Research has primarily focused on the incremental classification setting, where new tasks/classes are added at discrete time intervals. Such an "offline" setting does not evaluate the ability of agents to learn effectively and efficiently, since an agent can perform multiple lea… ▽ More

    Submitted 22 September, 2021; v1 submitted 20 August, 2021; originally announced August 2021.

    Comments: Accepted to ICCV 2021

  22. arXiv:2107.08170  [pdf, other

    cs.LG cs.AI

    Megaverse: Simulating Embodied Agents at One Million Experiences per Second

    Authors: Aleksei Petrenko, Erik Wijmans, Brennan Shacklett, Vladlen Koltun

    Abstract: We present Megaverse, a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of our engine enables physics-based simulation with high-dimensional egocentric observations at more than 1,000,000 actions per second on a single 8-GPU node. Megaverse is up to 70x faster than DeepMind Lab in fully-shaded 3D scenes with interactive objects. We achieve this… ▽ More

    Submitted 20 July, 2021; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: Paper published in ICML2021

  23. arXiv:2106.14405  [pdf, other

    cs.LG cs.RO

    Habitat 2.0: Training Home Assistants to Rearrange their Habitat

    Authors: Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra

    Abstract: We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack - data, simulation, and benchmark tasks. Specifically, we present: (i) ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments (matching real spa… ▽ More

    Submitted 1 July, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

  24. arXiv:2106.14342  [pdf, other

    cs.LG stat.ML

    Stabilizing Equilibrium Models by Jacobian Regularization

    Authors: Shaojie Bai, Vladlen Koltun, J. Zico Kolter

    Abstract: Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer. These models have been shown to achieve performance competitive with the state-of-the-art deep networks while using significantly less memory. Yet they are also slower, brittle to architectural choices, and introduce potential instability to the… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

    Comments: ICML 2021 Short Oral

  25. arXiv:2106.07476  [pdf, other

    cs.LG cs.AI cs.SI

    Training Graph Neural Networks with 1000 Layers

    Authors: Guohao Li, Matthias Müller, Bernard Ghanem, Vladlen Koltun

    Abstract: Deep graph neural networks (GNNs) have achieved excellent results on various tasks on increasingly large graph datasets with millions of nodes and edges. However, memory complexity has become a major obstacle when training deep GNNs for practical applications due to the immense number of nodes, edges, and intermediate activations. To improve the scalability of GNNs, prior works propose smart graph… ▽ More

    Submitted 11 April, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted at ICML'2021. Code available at https://www.deepgcns.org/arch/gnn1000. Work done during Guohao Li's internship at Intel Intelligent Systems Lab. Revised reference in v3

  26. arXiv:2105.08089  [pdf, other

    cs.DL cs.AI

    A Measure of Research Taste

    Authors: Vladlen Koltun, David Hafner

    Abstract: Researchers are often evaluated by citation-based metrics. Such metrics can inform hiring, promotion, and funding decisions. Concerns have been expressed that popular citation-based metrics incentivize researchers to maximize the production of publications. Such incentives may not be optimal for scientific progress. Here we present a citation-based measure that rewards both productivity and taste:… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Results can be explored at https://cap-measure.org/

  27. arXiv:2105.04619  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Enhancing Photorealism Enhancement

    Authors: Stephan R. Richter, Hassan Abu AlHaija, Vladlen Koltun

    Abstract: We present an approach to enhancing the realism of synthetic images. The images are enhanced by a convolutional network that leverages intermediate representations produced by conventional rendering pipelines. The network is trained via a novel adversarial objective, which provides strong supervision at multiple perceptual levels. We analyze scene layout distributions in commonly used datasets and… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: Code and data available at https://github.com/intel-isl/PhotorealismEnhancement Video available at https://youtu.be/P1IcaBn3ej0

    ACM Class: I.4.8

  28. arXiv:2105.00636  [pdf, other

    cs.RO cs.CV cs.LG

    Learning to drive from a world on rails

    Authors: Dian Chen, Vladlen Koltun, Philipp Krähenbühl

    Abstract: We learn an interactive vision-based driving policy from pre-recorded driving logs via a model-based approach. A forward model of the world supervises a driving policy that predicts the outcome of any potential driving trajectory. To support learning from pre-recorded logs, we assume that the world is on rails, meaning neither the agent nor its actions influence the environment. This assumption gr… ▽ More

    Submitted 2 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: Paper published in ICCV 2021(Oral); Code and data available at: https://dotchen.github.io/world_on_rails/

  29. arXiv:2103.13413  [pdf, other

    cs.CV

    Vision Transformers for Dense Prediction

    Authors: René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

    Abstract: We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks. We assemble tokens from various stages of the vision transformer into image-like representations at various resolutions and progressively combine them into full-resolution predictions using a convolutional decoder. The transformer b… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: 15 pages

  30. arXiv:2103.07461  [pdf, other

    cs.CV

    Probabilistic two-stage detection

    Authors: Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

    Abstract: We develop a probabilistic interpretation of two-stage object detection. We show that this probabilistic interpretation motivates a number of common empirical training practices. It also suggests changes to two-stage detection pipelines. Specifically, the first stage should infer proper object-vs-background likelihoods, which should then inform the overall score of the detector. A standard region… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Comments: Code is available at https://github.com/xingyizhou/CenterNet2

  31. arXiv:2103.07013  [pdf, other

    cs.LG cs.AI cs.CV cs.GR

    Large Batch Simulation for Deep Reinforcement Learning

    Authors: Brennan Shacklett, Erik Wijmans, Aleksei Petrenko, Manolis Savva, Dhruv Batra, Vladlen Koltun, Kayvon Fatahalian

    Abstract: We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: Published as a conference paper at ICLR 2021

  32. arXiv:2103.03114  [pdf, other

    cs.CV cs.LG cs.RO

    Self-supervised Geometric Perception

    Authors: Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun

    Abstract: We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations). Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: CVPR 2021, Oral presentation. 8 pages main results, 19 pages in total, including references and supplementary

  33. arXiv:2102.13086  [pdf, other

    cs.CV

    Simple multi-dataset detection

    Authors: Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl

    Abstract: How do we build a general and broad object detection system? We use all labels of all concepts ever annotated. These labels span diverse datasets with potentially inconsistent taxonomies. In this paper, we present a simple method for training a unified detector on multiple large-scale datasets. We use dataset-specific training protocols and losses, but share a common detection architecture with da… ▽ More

    Submitted 25 April, 2022; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: code is available at https://github.com/xingyizhou/UniDet

  34. The h-index is no longer an effective correlate of scientific reputation

    Authors: Vladlen Koltun, David Hafner

    Abstract: The impact of individual scientists is commonly quantified using citation-based measures. The most common such measure is the h-index. A scientist's h-index affects hiring, promotion, and funding decisions, and thus shapes the progress of science. Here we report a large-scale study of scientometric measures, analyzing millions of articles and hundreds of millions of citations across four scientifi… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

    Comments: An interactive visualization of our work can be found at https://h-frac.org

  35. arXiv:2012.09164  [pdf, other

    cs.CV

    Point Transformer

    Authors: Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun

    Abstract: Self-attention networks have revolutionized natural language processing and are making impressive strides in image analysis tasks such as image classification and object detection. Inspired by this success, we investigate the application of self-attention networks to 3D point cloud processing. We design self-attention layers for point clouds and use these to construct self-attention networks for t… ▽ More

    Submitted 26 September, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

  36. arXiv:2011.07233  [pdf, other

    cs.CV

    Stable View Synthesis

    Authors: Gernot Riegler, Vladlen Koltun

    Abstract: We present Stable View Synthesis (SVS). Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene. The method operates on a geometric scaffold computed via structure-from-motion and multi-view stereo. Each point on this 3D scaffold is associated with view rays and corresponding feature vectors that encode the appearance of this point… ▽ More

    Submitted 2 May, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

    Comments: Published at CVPR 2021, https://youtu.be/gqgXIY09htI

  37. arXiv:2011.01975  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    Rearrangement: A Challenge for Embodied AI

    Authors: Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su

    Abstract: We describe a framework for research and evaluation in Embodied AI. Our proposal is based on a canonical task: Rearrangement. A standard task can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings. In the rearrangement task, the goal is to bring a given physical environment into a specified state. The goal state can be specifie… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Authors are listed in alphabetical order

  38. arXiv:2010.11251  [pdf, other

    cs.RO cs.LG eess.SY

    Learning Quadrupedal Locomotion over Challenging Terrain

    Authors: Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter

    Abstract: Some of the most challenging environments on our planet are accessible to quadrupedal animals but remain out of reach for autonomous machines. Legged locomotion can dramatically expand the operational domains of robotics. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These desig… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Journal ref: Science Robotics 2020 Vol. 5, Issue 47, eabc5986

  39. arXiv:2010.07492  [pdf, other

    cs.CV

    NeRF++: Analyzing and Improving Neural Radiance Fields

    Authors: Kai Zhang, Gernot Riegler, Noah Snavely, Vladlen Koltun

    Abstract: Neural Radiance Fields (NeRF) achieve impressive view synthesis results for a variety of capture settings, including 360 capture of bounded scenes and forward-facing capture of bounded and unbounded scenes. NeRF fits multi-layer perceptrons (MLPs) representing view-invariant opacity and view-dependent color volumes to a set of training images, and samples novel views based on volume rendering tech… ▽ More

    Submitted 21 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: Code is available at https://github.com/Kai-46/nerfplusplus; fix a minor formatting issue in Fig. 4

  40. arXiv:2008.10631  [pdf, other

    cs.RO cs.CV cs.LG

    OpenBot: Turning Smartphones into Robots

    Authors: Matthias Müller, Vladlen Koltun

    Abstract: Current robots are either expensive or make significant compromises on sensory richness, computational power, and communication capabilities. We propose to leverage smartphones to equip robots with extensive sensor suites, powerful computational abilities, state-of-the-art communication channels, and access to a thriving software ecosystem. We design a small electric vehicle that costs $50 and ser… ▽ More

    Submitted 10 March, 2021; v1 submitted 24 August, 2020; originally announced August 2020.

    Comments: Accepted at ICRA'21. Documentation and code are available at www.openbot.org

  41. arXiv:2008.05511  [pdf, other

    cs.CV

    Free View Synthesis

    Authors: Gernot Riegler, Vladlen Koltun

    Abstract: We present a method for novel view synthesis from input images that are freely distributed around a scene. Our method does not rely on a regular arrangement of input views, can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts. We calibrate the input images via SfM and erect a coarse geometric scaffold via MVS. This scaf… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: published at ECCV 2020, https://youtu.be/JDJPn3ZtfZs

  42. arXiv:2007.09335  [pdf, other

    cs.LG cs.CL stat.ML

    Drinking from a Firehose: Continual Learning with Web-scale Natural Language

    Authors: Hexiang Hu, Ozan Sener, Fei Sha, Vladlen Koltun

    Abstract: Continual learning systems will interact with humans, with each other, and with the physical world through time -- and continue to learn and adapt as they do. An important open problem for continual learning is a large-scale benchmark that enables realistic evaluation of algorithms. In this paper, we study a natural setting for continual learning on a massive scale. We introduce the problem of per… ▽ More

    Submitted 1 November, 2020; v1 submitted 18 July, 2020; originally announced July 2020.

    Comments: Dataset Downloader: https://github.com/firehose-dataset/downloader Source Code: https://github.com/firehose-dataset/congrad

  43. arXiv:2007.08614  [pdf, other

    eess.IV cs.CV physics.optics

    Dynamic Low-light Imaging with Quanta Image Sensors

    Authors: Yiheng Chi, Abhiram Gnanasambandam, Vladlen Koltun, Stanley H. Chan

    Abstract: Imaging in low light is difficult because the number of photons arriving at the sensor is low. Imaging dynamic scenes in low-light environments is even more difficult because as the scene moves, pixels in adjacent frames need to be aligned before they can be denoised. Conventional CMOS image sensors (CIS) are at a particular disadvantage in dynamic low-light settings because the exposure cannot be… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: Published in the 16th European Conference on Computer Vision (ECCV) 2020

  44. arXiv:2007.02701  [pdf, other

    cs.LG cs.AI stat.ML

    Scaling Imitation Learning in Minecraft

    Authors: Artemij Amiranashvili, Nicolai Dorka, Wolfram Burgard, Vladlen Koltun, Thomas Brox

    Abstract: Imitation learning is a powerful family of techniques for learning sensorimotor coordination in immersive environments. We apply imitation learning to attain state-of-the-art performance on hard exploration problems in the Minecraft environment. We report experiments that highlight the influence of network architecture, loss function, and data augmentation. An early version of our approach reached… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  45. arXiv:2007.02168  [pdf, other

    cs.LG cs.GR stat.ML

    Scalable Differentiable Physics for Learning and Control

    Authors: Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin

    Abstract: Differentiable physics is a powerful approach to learning and control problems that involve physical objects and environments. While notable progress has been made, the capabilities of differentiable physics solvers remain limited. We develop a scalable framework for differentiable physics that can support a large number of objects and their interactions. To accommodate objects with arbitrary geom… ▽ More

    Submitted 4 July, 2020; originally announced July 2020.

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, ICML 2020

  46. arXiv:2006.11751  [pdf, other

    cs.LG cs.AI stat.ML

    Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

    Authors: Aleksei Petrenko, Zhehui Huang, Tushar Kumar, Gaurav Sukhatme, Vladlen Koltun

    Abstract: Increasing the scale of reinforcement learning experiments has allowed researchers to achieve unprecedented results in both training sophisticated agents for video games, and in sim-to-real transfer for robotics. Typically such experiments rely on large distributed systems and require expensive hardware setups, limiting wider access to this exciting area of research. In this work we aim to solve t… ▽ More

    Submitted 22 June, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

    Comments: Paper published in ICML2020. Visualizations of trained policies can be found at https://sites.google.com/view/sample-factory

  47. arXiv:2006.08656  [pdf, other

    cs.LG cs.CV stat.ML

    Multiscale Deep Equilibrium Models

    Authors: Shaojie Bai, Vladlen Koltun, J. Zico Kolter

    Abstract: We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ), suited to large-scale and highly hierarchical pattern recognition domains. An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously, using implicit differentiation to avoid storing intermediate states (and thus requiring only $O(1)$ memory c… ▽ More

    Submitted 24 November, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 Oral

  48. arXiv:2006.05768  [pdf, other

    cs.RO

    Deep Drone Acrobatics

    Authors: Elia Kaufmann, Antonio Loquercio, René Ranftl, Matthias Müller, Vladlen Koltun, Davide Scaramuzza

    Abstract: Performing acrobatic maneuvers with quadrotors is extremely challenging. Acrobatic flight requires high thrust and extreme angular accelerations that push the platform to its physical limits. Professional drone pilots often measure their level of mastery by flying such maneuvers in competitions. In this paper, we propose to learn a sensorimotor policy that enables an autonomous quadrotor to fly ex… ▽ More

    Submitted 11 June, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 8 pages + 2 pages references. Video: https://youtu.be/2N_wKXQ6MXA. Code: https://github.com/uzh-rpg/deep_drone_acrobatics

    Journal ref: Robotics, Science, and Systems (RSS), 2020

  49. arXiv:2006.02879  [pdf, other

    cs.LG stat.ML

    Auto-decoding Graphs

    Authors: Sohil Atul Shah, Vladlen Koltun

    Abstract: We present an approach to synthesizing new graph structures from empirically specified distributions. The generative model is an auto-decoder that learns to synthesize graphs from latent codes. The graph synthesis model is learned jointly with an empirical distribution over the latent codes. Graphs are synthesized using self-attention modules that are trained to identify likely connectivity patter… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

  50. arXiv:2005.08144  [pdf, other

    cs.CV cs.LG stat.ML

    High-dimensional Convolutional Networks for Geometric Pattern Recognition

    Authors: Christopher Choy, Junha Lee, Rene Ranftl, Jaesik Park, Vladlen Koltun

    Abstract: Many problems in science and engineering can be formulated in terms of geometric patterns in high-dimensional spaces. We present high-dimensional convolutional networks (ConvNets) for pattern recognition problems that arise in the context of geometric registration. We first study the effectiveness of convolutional networks in detecting linear subspaces in high-dimensional spaces with up to 32 dime… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: Accepted for CVPR 2020 oral presentation