Skip to main content

Showing 1–50 of 98 results for author: Daniilidis, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17421  [pdf, other

    cs.CV cs.GR

    MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

    Authors: Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, Kostas Daniilidis

    Abstract: We introduce 4D Motion Scaffolds (MoSca), a neural information processing system designed to reconstruct and synthesize novel views of dynamic scenes from monocular videos captured casually in the wild. To address such a challenging and ill-posed inverse problem, we leverage prior knowledge from foundational vision models, lift the video data to a novel Motion Scaffold (MoSca) representation, whic… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: project page: https://www.cis.upenn.edu/~leijh/projects/mosca

  2. arXiv:2403.18222  [pdf, other

    cs.RO cs.LG

    Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies

    Authors: Bo Wu, Bruce D. Lee, Kostas Daniilidis, Bernadette Bucher, Nikolai Matni

    Abstract: Large-scale robotic policies trained on data from diverse tasks and robotic platforms hold great promise for enabling general-purpose robots; however, reliable generalization to new environment conditions remains a major challenge. Toward addressing this challenge, we propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents. Specifical… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures

  3. arXiv:2403.17931  [pdf, other

    cs.CV

    Track Everything Everywhere Fast and Robustly

    Authors: Yunzhou Song, Jiahui Lei, Ziyun Wang, Lingjie Liu, Kostas Daniilidis

    Abstract: We propose a novel test-time optimization approach for efficiently and robustly tracking any pixel at any time in a video. The latest state-of-the-art optimization-based tracking technique, OmniMotion, requires a prohibitively long optimization time, rendering it impractical for downstream applications. OmniMotion is sensitive to the choice of random seeds, leading to unstable convergence. To impr… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: project page: https://timsong412.github.io/FastOmniTrack/

  4. arXiv:2403.17346  [pdf, other

    cs.CV

    TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

    Authors: Yufu Wang, Ziyun Wang, Lingjie Liu, Kostas Daniilidis

    Abstract: We propose TRAM, a two-stage method to reconstruct a human's global trajectory and motion from in-the-wild videos. TRAM robustifies SLAM to recover the camera motion in the presence of dynamic humans and uses the scene background to derive the motion scale. Using the recovered camera as a metric-scale reference frame, we introduce a video transformer model (VIMO) to regress the kinematic body moti… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: The project website: https://yufu-wang.github.io/tram4d/

  5. arXiv:2403.12279  [pdf, other

    cs.RO

    Scalable Networked Feature Selection with Randomized Algorithm for Robot Navigation

    Authors: Vivek Pandey, Arash Amini, Guangyi Liu, Ufuk Topcu, Qiyu Sun, Kostas Daniilidis, Nader Motee

    Abstract: We address the problem of sparse selection of visual features for localizing a team of robots navigating an unknown environment, where robots can exchange relative position measurements with neighbors. We select a set of the most informative features by anticipating their importance in robots localization by simulating trajectories of robots over a prediction horizon. Through theoretical proofs, w… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  6. arXiv:2403.11396  [pdf, other

    cs.RO

    Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF

    Authors: Guangyi Liu, Wen Jiang, Boshu Lei, Vivek Pandey, Kostas Daniilidis, Nader Motee

    Abstract: This work proposes a novel approach to bolster both the robot's risk assessment and safety measures while deepening its understanding of 3D scenes, which is achieved by leveraging Radiance Field (RF) models and 3D Gaussian Splatting. To further enhance these capabilities, we incorporate additional sampled views from the environment with the RF model. One of our key contributions is the introductio… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  7. arXiv:2312.00114  [pdf, other

    cs.CV

    Un-EvMoSeg: Unsupervised Event-based Independent Motion Segmentation

    Authors: Ziyun Wang, **yuan Guo, Kostas Daniilidis

    Abstract: Event cameras are a novel type of biologically inspired vision sensor known for their high temporal resolution, high dynamic range, and low power consumption. Because of these properties, they are well-suited for processing fast motions that require rapid reactions. Although event cameras have recently shown competitive performance in unsupervised optical flow estimation, performance in detecting… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  8. arXiv:2312.00113  [pdf, other

    cs.CV

    Event-based Continuous Color Video Decompression from Single Frames

    Authors: Ziyun Wang, Friedhelm Hamann, Kenneth Chaney, Wen Jiang, Guillermo Gallego, Kostas Daniilidis

    Abstract: We present ContinuityCam, a novel approach to generate a continuous video from a single static RGB image, using an event camera. Conventional cameras struggle with high-speed motion capture due to bandwidth and dynamic range limitations. Event cameras are ideal sensors to solve this problem because they encode compressed change information at high temporal resolution. In this work, we propose a no… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  9. arXiv:2312.00112  [pdf, other

    cs.CV cs.GR

    DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

    Authors: Agelos Kratimenos, Jiahui Lei, Kostas Daniilidis

    Abstract: Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit o… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: Project page: https://agelosk.github.io/dynmf/

  10. arXiv:2311.17874  [pdf, other

    cs.CV

    FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information

    Authors: Wen Jiang, Boshu Lei, Kostas Daniilidis

    Abstract: This study addresses the challenging problem of active view selection and uncertainty quantification within the domain of Radiance Fields. Neural Radiance Fields (NeRF) have greatly advanced image rendering and reconstruction, but the limited availability of 2D images poses uncertainties stemming from occlusions, depth ambiguities, and imaging errors. Efficiently selecting informative views become… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Project page: https://jiangwenpl.github.io/FisherRF/

  11. arXiv:2311.16099  [pdf, other

    cs.CV cs.GR

    GART: Gaussian Articulated Template Models

    Authors: Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis

    Abstract: We introduce Gaussian Articulated Template Model GART, an explicit, efficient, and expressive representation for non-rigid articulated subject capturing and rendering from monocular videos. GART utilizes a mixture of moving 3D Gaussians to explicitly approximate a deformable subject's geometry and appearance. It takes advantage of a categorical template model prior (SMPL, SMAL, etc.) with learnabl… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 13 pages, code available at https://www.cis.upenn.edu/~leijh/projects/gart/

  12. arXiv:2310.02437  [pdf, other

    cs.CV

    EvDNeRF: Reconstructing Event Data with Dynamic Neural Radiance Fields

    Authors: Anish Bhattacharya, Ratnesh Madaan, Fernando Cladera, Sai Vemprala, Rogerio Bonatti, Kostas Daniilidis, Ashish Kapoor, Vijay Kumar, Nikolai Matni, Jayesh K. Gupta

    Abstract: We present EvDNeRF, a pipeline for generating event data and training an event-based dynamic NeRF, for the purpose of faithfully reconstructing eventstreams on scenes with rigid and non-rigid deformations that may be too fast to capture with a standard camera. Event cameras register asynchronous per-pixel brightness changes at MHz rates with high dynamic range, making them ideal for observing fast… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: 16 pages, 20 figures, 2 tables

  13. arXiv:2308.11184  [pdf, other

    cs.CV

    ReFit: Recurrent Fitting Network for 3D Human Recovery

    Authors: Yufu Wang, Kostas Daniilidis

    Abstract: We present Recurrent Fitting (ReFit), a neural network architecture for single-image, parametric 3D human reconstruction. ReFit learns a feedback-update loop that mirrors the strategy of solving an inverse problem through optimization. At each iterative step, it reprojects keypoints from the human model to feature maps to query feedback, and uses a recurrent-based updater to adjust the model to fi… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  14. arXiv:2306.04781  [pdf, other

    cs.RO cs.LG cs.MA

    Learning to Navigate in Turbulent Flows with Aerial Robot Swarms: A Cooperative Deep Reinforcement Learning Approach

    Authors: Diego Patiño, Siddharth Mayya, Juan Calderon, Kostas Daniilidis, David Saldaña

    Abstract: Aerial operation in turbulent environments is a challenging problem due to the chaotic behavior of the flow. This problem is made even more complex when a team of aerial robots is trying to achieve coordinated motion in turbulent wind conditions. In this paper, we present a novel multi-robot controller to navigate in turbulent flows, decoupling the trajectory-tracking control from the turbulence c… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  15. arXiv:2305.16315  [pdf, other

    cs.CV

    NAP: Neural 3D Articulation Prior

    Authors: Jiahui Lei, Congyue Deng, Bokui Shen, Leonidas Guibas, Kostas Daniilidis

    Abstract: We propose Neural 3D Articulation Prior (NAP), the first 3D deep generative model to synthesize 3D articulated object models. Despite the extensive research on generating 3D objects, compositions, or scenes, there remains a lack of focus on capturing the distribution of articulated objects, a common object category for human and robot interaction. To generate articulated objects, we first design a… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: project page: https://www.cis.upenn.edu/~leijh/projects/nap

  16. arXiv:2305.16314  [pdf, other

    cs.CV

    Banana: Banach Fixed-Point Network for Pointcloud Segmentation with Inter-Part Equivariance

    Authors: Congyue Deng, Jiahui Lei, Bokui Shen, Kostas Daniilidis, Leonidas Guibas

    Abstract: Equivariance has gained strong interest as a desirable network property that inherently ensures robust generalization. However, when dealing with complex systems such as articulated objects or multi-object scenes, effectively capturing inter-part transformations poses a challenge, as it becomes entangled with the overall structure and local transformations. The interdependence of part assignment a… ▽ More

    Submitted 26 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

  17. EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks

    Authors: Ziyun Wang, Fernando Cladera Ojeda, Anthony Bisulco, Daewon Lee, Camillo J. Taylor, Kostas Daniilidis, M. Ani Hsieh, Daniel D. Lee, Volkan Isler

    Abstract: Event-based sensors have recently drawn increasing interest in robotic perception due to their lower latency, higher dynamic range, and lower bandwidth requirements compared to standard CMOS-based imagers. These properties make them ideal tools for real-time perception tasks in highly dynamic environments. In this work, we demonstrate an application where event cameras excel: accurately estimating… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: 8 pages, 6 figures, IEEE Robotics and Automation Letters ( Volume: 7, Issue: 4, October 2022)

  18. EvAC3D: From Event-based Apparent Contours to 3D Models via Continuous Visual Hulls

    Authors: Ziyun Wang, Kenneth Chaney, Kostas Daniilidis

    Abstract: 3D reconstruction from multiple views is a successful computer vision field with multiple deployments in applications. State of the art is based on traditional RGB frames that enable optimization of photo-consistency cross views. In this paper, we study the problem of 3D reconstruction from event-cameras, motivated by the advantages of event-based cameras in terms of low power and latency as well… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: 16 pages, 8 figures, European Conference on Computer Vision (ECCV) 2022

  19. arXiv:2303.15440  [pdf, other

    cs.CV

    EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision

    Authors: Jiahui Lei, Congyue Deng, Karl Schmeckpeper, Leonidas Guibas, Kostas Daniilidis

    Abstract: We introduce Equivariant Neural Field Expectation Maximization (EFEM), a simple, effective, and robust geometric algorithm that can segment objects in 3D scenes without annotations or training on scenes. We achieve such unsupervised segmentation by exploiting single object shape priors. We make two novel steps in that direction. First, we introduce equivariant shape representations to this problem… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023, project page https://www.cis.upenn.edu/~leijh/projects/efem

  20. arXiv:2212.14871  [pdf, other

    cs.CV

    Equivariant Light Field Convolution and Transformer

    Authors: Yinshuang Xu, Jiahui Lei, Kostas Daniilidis

    Abstract: 3D reconstruction and novel view rendering can greatly benefit from geometric priors when the input views are not sufficient in terms of coverage and inter-view baselines. Deep learning of geometric priors from 2D images often requires each image to be represented in a $2D$ canonical frame and the prior to be learned in a given or learned $3D$ canonical frame. In this paper, given only the relativ… ▽ More

    Submitted 7 June, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

    Comments: 46 pages

  21. arXiv:2212.05231  [pdf, other

    cs.CV cs.GR

    NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction

    Authors: Yiming Wang, Qin Han, Marc Habermann, Kostas Daniilidis, Christian Theobalt, Lingjie Liu

    Abstract: Recent methods for neural surface representation and rendering, for example NeuS, have demonstrated the remarkably high-quality reconstruction of static scenes. However, the training of NeuS takes an extremely long time (8 hours), which makes it almost impossible to apply them to dynamic scenes with thousands of frames. We propose a fast neural surface reconstruction approach, called NeuS2, which… ▽ More

    Submitted 16 November, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

    Comments: ICCV 2023

  22. arXiv:2212.03370  [pdf, other

    cs.CV

    Probabilistic Shape Completion by Estimating Canonical Factors with Hierarchical VAE

    Authors: Wen Jiang, Kostas Daniilidis

    Abstract: We propose a novel method for 3D shape completion from a partial observation of a point cloud. Existing methods either operate on a global latent code, which limits the expressiveness of their model, or autoregressively estimate the local features, which is highly computationally extensive. Instead, our method estimates the entire local feature field by a single feedforward network by formulating… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 10 pages, 5 figures

  23. arXiv:2212.00266  [pdf, other

    cs.CV

    Multi-view Tracking, Re-ID, and Social Network Analysis of a Flock of Visually Similar Birds in an Outdoor Aviary

    Authors: Shiting Xiao, Yufu Wang, Ammon Perkes, Bernd Pfrommer, Marc Schmidt, Kostas Daniilidis, Marc Badger

    Abstract: The ability to capture detailed interactions among individuals in a social group is foundational to our study of animal behavior and neuroscience. Recent advances in deep learning and computer vision are driving rapid progress in methods that can record the actions and interactions of multiple individuals simultaneously. Many social species, such as birds, however, live deeply embedded in a three-… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

  24. arXiv:2206.08362  [pdf, other

    cs.CV

    Unified Fourier-based Kernel and Nonlinearity Design for Equivariant Networks on Homogeneous Spaces

    Authors: Yinshuang Xu, Jiahui Lei, Edgar Dobriban, Kostas Daniilidis

    Abstract: We introduce a unified framework for group equivariant networks on homogeneous spaces derived from a Fourier perspective. We consider tensor-valued feature fields, before and after a convolutional layer. We present a unified derivation of kernels via the Fourier domain by leveraging the sparsity of Fourier coefficients of the lifted feature fields. The sparsity emerges when the stabilizer subgroup… ▽ More

    Submitted 25 August, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Accepted at ICML2022 Thirty-ninth International Conference on Machine Learning

  25. Semantic keypoint-based pose estimation from single RGB frames

    Authors: Karl Schmeckpeper, Philip R. Osteen, Yufu Wang, Georgios Pavlakos, Kenneth Chaney, Wyatt Jordan, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: This paper presents an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. Unlike prior investigators, we are agnostic to whether the object is textured or textureless, as the convnet learns the optimal representation from the available training-… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: https://sites.google.com/view/rcta-object-keypoints-dataset/home. arXiv admin note: substantial text overlap with arXiv:1703.04670

    Journal ref: Field Robotics, 2, 147-171, 2022

  26. arXiv:2204.02394  [pdf, other

    cs.CV cs.LG

    SE(3)-Equivariant Attention Networks for Shape Reconstruction in Function Space

    Authors: Evangelos Chatzipantazis, Stefanos Pertigkiozoglou, Edgar Dobriban, Kostas Daniilidis

    Abstract: We propose a method for 3D shape reconstruction from unoriented point clouds. Our method consists of a novel SE(3)-equivariant coordinate-based network (TF-ONet), that parametrizes the occupancy field of the shape and respects the inherent symmetries of the problem. In contrast to previous shape reconstruction methods that align the input to a regular grid, we operate directly on the irregular poi… ▽ More

    Submitted 9 February, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

  27. arXiv:2203.16529  [pdf, other

    cs.CV

    CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism

    Authors: Jiahui Lei, Kostas Daniilidis

    Abstract: While neural representations for static 3D shapes are widely studied, representations for deformable surfaces are limited to be template-dependent or lack efficiency. We introduce Canonical Deformation Coordinate Space (CaDeX), a unified representation of both shape and nonrigid motion. Our key insight is the factorization of the deformation between frames by continuous bijective canonical maps (h… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: CVPR2022 webpage https://www.cis.upenn.edu/~leijh/projects/cadex/

  28. arXiv:2203.05137  [pdf, other

    cs.CV cs.RO

    Cross-modal Map Learning for Vision and Language Navigation

    Authors: Georgios Georgakis, Karl Schmeckpeper, Karan Wanchoo, Soham Dan, Eleni Miltsakaki, Dan Roth, Kostas Daniilidis

    Abstract: We consider the problem of Vision-and-Language Navigation (VLN). The majority of current methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or using cross-modal attention over the egocentric observations of the agent. In contrast to other works, our key insight is that the association between language and vision is stronger when it occurs in explicit spatial repr… ▽ More

    Submitted 21 March, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

  29. arXiv:2202.11907  [pdf, other

    cs.RO cs.CV

    Uncertainty-driven Planner for Exploration and Navigation

    Authors: Georgios Georgakis, Bernadette Bucher, Anton Arapin, Karl Schmeckpeper, Nikolai Matni, Kostas Daniilidis

    Abstract: We consider the problems of exploration and point-goal navigation in previously unseen environments, where the spatial complexity of indoor scenes and partial observability constitute these tasks challenging. We argue that learning occupancy priors over indoor maps provides significant advantages towards addressing these problems. To this end, we present a novel planning framework that first learn… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  30. arXiv:2111.08190  [pdf, other

    cs.LG stat.ML

    Learning Augmentation Distributions using Transformed Risk Minimization

    Authors: Evangelos Chatzipantazis, Stefanos Pertigkiozoglou, Kostas Daniilidis, Edgar Dobriban

    Abstract: We propose a new \emph{Transformed Risk Minimization} (TRM) framework as an extension of classical risk minimization. In TRM, we optimize not only over predictive models, but also over data transformations; specifically over distributions thereof. As a key application, we focus on learning augmentations; for instance appropriate rotations of images, to improve classification performance with a giv… ▽ More

    Submitted 5 October, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

  31. arXiv:2110.09514  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Discovering and Achieving Goals via World Models

    Authors: Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak

    Abstract: How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021. First two authors contributed equally. Website at https://orybkin.github.io/lexa/

  32. arXiv:2109.13396  [pdf, other

    cs.RO cs.AI

    Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets

    Authors: Frederik Ebert, Yanlai Yang, Karl Schmeckpeper, Bernadette Bucher, Georgios Georgakis, Kostas Daniilidis, Chelsea Finn, Sergey Levine

    Abstract: Robot learning holds the promise of learning policies that generalize broadly. However, such generalization requires sufficiently diverse datasets of the task of interest, which can be prohibitively expensive to collect. In other fields, such as computer vision, it is common to utilize shared, reusable datasets, such as ImageNet, to overcome this challenge, but this has proven difficult in robotic… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  33. Single-Camera 3D Head Fitting for Mixed Reality Clinical Applications

    Authors: Tejas Mane, Aylar Bayramova, Kostas Daniilidis, Philippos Mordohai, Elena Bernardis

    Abstract: We address the problem of estimating the shape of a person's head, defined as the geometry of the complete head surface, from a video taken with a single moving camera, and determining the alignment of the fitted 3D head for all video frames, irrespective of the person's pose. 3D head reconstructions commonly tend to focus on perfecting the face reconstruction, leaving the scalp to a statistical a… ▽ More

    Submitted 7 March, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

  34. arXiv:2108.11944  [pdf, other

    cs.CV

    Probabilistic Modeling for Human Mesh Recovery

    Authors: Nikos Kolotouros, Georgios Pavlakos, Dinesh Jayaraman, Kostas Daniilidis

    Abstract: This paper focuses on the problem of 3D human reconstruction from 2D evidence. Although this is an inherently ambiguous problem, the majority of recent works avoid the uncertainty modeling and typically regress a single estimate for a given input. In contrast to that, in this work, we propose to embrace the reconstruction ambiguity and we recast the problem as learning a map** from the input to… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: ICCV 2021. Project page: https://www.seas.upenn.edu/~nkolot/projects/prohmr

  35. arXiv:2106.15648  [pdf, other

    cs.CV cs.RO

    Learning to Map for Active Semantic Goal Navigation

    Authors: Georgios Georgakis, Bernadette Bucher, Karl Schmeckpeper, Siddharth Singh, Kostas Daniilidis

    Abstract: We consider the problem of object goal navigation in unseen environments. Solving this problem requires learning of contextual semantic priors, a challenging endeavour given the spatial and semantic variability of indoor environments. Current methods learn to implicitly encode these priors through goal-oriented navigation policy functions operating on spatial representations that are limited to th… ▽ More

    Submitted 8 March, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

  36. arXiv:2106.13229  [pdf, other

    cs.LG cs.AI cs.RO

    Model-Based Reinforcement Learning via Latent-Space Collocation

    Authors: Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine

    Abstract: The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad capabilities. Visual model-based reinforcement learning (RL) methods that plan future actions directly have shown impressive results on tasks that require only short-horizon reasoning, however, these methods struggle on temporally extended tasks. We a… ▽ More

    Submitted 7 August, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: International Conference on Machine Learning (ICML), 2021. Videos and code at https://orybkin.github.io/latco/

  37. arXiv:2105.09396  [pdf, other

    cs.CV

    Birds of a Feather: Capturing Avian Shape Models from Images

    Authors: Yufu Wang, Nikos Kolotouros, Kostas Daniilidis, Marc Badger

    Abstract: Animals are diverse in shape, but building a deformable shape model for a new species is not always possible due to the lack of 3D data. We present a method to capture new species using an articulated template and images of that species. In this work, we focus mainly on birds. Although birds represent almost twice the number of species as mammals, no accurate shape model is available. To capture a… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: CVPR 2021. Project website: https://yufu-wang.github.io/aves/

  38. arXiv:2105.02799  [pdf, other

    cs.CV cs.RO

    Object-centric Video Prediction without Annotation

    Authors: Karl Schmeckpeper, Georgios Georgakis, Kostas Daniilidis

    Abstract: In order to interact with the world, agents must be able to predict the results of the world's dynamics. A natural approach to learn about these dynamics is through video prediction, as cameras are ubiquitous and powerful sensors. Direct pixel-to-pixel video prediction is difficult, does not take advantage of known priors, and does not provide an easy interface to utilize the learned dynamics. Obj… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

  39. arXiv:2103.14184  [pdf, other

    cs.CV

    Deformable Linear Object Prediction Using Locally Linear Latent Dynamics

    Authors: Wenbo Zhang, Karl Schmeckpeper, Pratik Chaudhari, Kostas Daniilidis

    Abstract: We propose a framework for deformable linear object prediction. Prediction of deformable objects (e.g., rope) is challenging due to their non-linear dynamics and infinite-dimensional configuration spaces. By map** the dynamics from a non-linear space to a linear space, we can use the good properties of linear dynamics for easier learning and more efficient prediction. We learn a locally linear,… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

  40. arXiv:2012.04153  [pdf, other

    cs.CV

    Learning Portrait Style Representations

    Authors: Sadat Shaik, Bernadette Bucher, Nephele Agrafiotis, Stephen Phillips, Kostas Daniilidis, William Schmenner

    Abstract: Style analysis of artwork in computer vision predominantly focuses on achieving results in target image generation through optimizing understanding of low level style characteristics such as brush strokes. However, fundamentally different techniques are required to computationally understand and control qualities of art which incorporate higher level style characteristics. We study style represent… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: Sadat Shaik and Bernadette Bucher contributed equally

  41. arXiv:2012.02903  [pdf, other

    cs.AI

    Joint Estimation of Image Representations and their Lie Invariants

    Authors: Christine Allen-Blanchette, Kostas Daniilidis

    Abstract: Images encode both the state of the world and its content. The former is useful for tasks such as planning and control, and the latter for classification. The automatic extraction of this information is challenging because of the high-dimensionality and entangled encoding inherent to the image representation. This article introduces two theoretical approaches aimed at the resolution of these chall… ▽ More

    Submitted 8 December, 2020; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: Resolves typographical errors

  42. arXiv:2011.06507  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Reinforcement Learning with Videos: Combining Offline Observations with Interaction

    Authors: Karl Schmeckpeper, Oleh Rybkin, Kostas Daniilidis, Sergey Levine, Chelsea Finn

    Abstract: Reinforcement learning is a powerful framework for robots to acquire skills from experience, but often requires a substantial amount of online data collection. As a result, it is difficult to collect sufficiently diverse experiences that are needed for robots to generalize broadly. Videos of humans, on the other hand, are a readily available source of broad and interesting experiences. In this pap… ▽ More

    Submitted 4 November, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

    Journal ref: Conference on Robot Learning (2020)

  43. 3D Bird Reconstruction: a Dataset, Model, and Shape Recovery from a Single View

    Authors: Marc Badger, Yufu Wang, Adarsh Modh, Ammon Perkes, Nikos Kolotouros, Bernd G. Pfrommer, Marc F. Schmidt, Kostas Daniilidis

    Abstract: Automated capture of animal pose is transforming how we study neuroscience and social behavior. Movements carry important social cues, but current methods are not able to robustly estimate pose and shape of animals, particularly for social animals such as birds, which are often occluded by each other and objects in the environment. To address this problem, we first introduce a model and multi-view… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: In ECCV 2020

    ACM Class: I.4.8

    Journal ref: ECCV 2020, vol 12363, pp 1-17

  44. arXiv:2007.01867  [pdf, other

    cs.RO cs.CV cs.LG eess.SP

    TLIO: Tight Learned Inertial Odometry

    Authors: Wenxin Liu, David Caruso, Eddy Ilg, **g Dong, Anastasios I. Mourikis, Kostas Daniilidis, Vijay Kumar, Jakob Engel

    Abstract: In this work we propose a tightly-coupled Extended Kalman Filter framework for IMU-only state estimation. Strap-down IMU measurements provide relative state estimates based on IMU kinematic motion model. However the integration of measurements is sensitive to sensor bias and noise, causing significant drift within seconds. Recent research by Yan et al. (RoNIN) and Chen et al. (IONet) showed the ca… ▽ More

    Submitted 10 July, 2020; v1 submitted 5 July, 2020; originally announced July 2020.

    Comments: Correcting graph and bibliography. Adding journal reference information and DOI, in IEEE Robotics and Automation Letters

  45. arXiv:2006.13202  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Simple and Effective VAE Training with Calibrated Decoders

    Authors: Oleh Rybkin, Kostas Daniilidis, Sergey Levine

    Abstract: Variational autoencoders (VAEs) provide an effective and simple method for modeling complex distributions. However, training VAEs often requires considerable hyperparameter tuning to determine the optimal amount of information retained by the latent variable. We study the impact of calibrated decoders, which learn the uncertainty of the decoding distribution and can determine this amount of inform… ▽ More

    Submitted 12 July, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: International Conference on Machine Learning (ICML), 2021. Project website is at https://orybkin.github.io/sigma-vae/

  46. arXiv:2006.10731  [pdf, other

    cs.CV cs.LG

    Spin-Weighted Spherical CNNs

    Authors: Carlos Esteves, Ameesh Makadia, Kostas Daniilidis

    Abstract: Learning equivariant representations is a promising way to reduce sample and model complexity and improve the generalization performance of deep neural networks. The spherical CNNs are successful examples, producing SO(3)-equivariant representations of spherical inputs. There are two main types of spherical CNNs. The first type lifts the inputs to functions on the rotation group SO(3) and applies… ▽ More

    Submitted 26 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS'20

  47. arXiv:2006.08586  [pdf, other

    cs.CV

    Coherent Reconstruction of Multiple Humans from a Single Image

    Authors: Wen Jiang, Nikos Kolotouros, Georgios Pavlakos, Xiaowei Zhou, Kostas Daniilidis

    Abstract: In this work, we address the problem of multi-person 3D pose estimation from a single image. A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently. However, this type of prediction suffers from incoherent results, e.g., interpenetration and inconsistent depth ordering between the people in the scene.… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: CVPR 2020. Project Page: https://jiangwenpl.github.io/multiperson/

  48. arXiv:2005.05960  [pdf, other

    cs.LG cs.AI cs.CV cs.NE cs.RO stat.ML

    Planning to Explore via Self-Supervised World Models

    Authors: Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

    Abstract: Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge. We present Plan2Explore, a self-supervised reinforcement learning agent that tackles both these challenges through a new approach to self-supervised exploration and fast adaptation to new tasks, which need not be known during exploration. During explor… ▽ More

    Submitted 30 June, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: Accepted at ICML 2020. Videos and code at https://ramanans1.github.io/plan2explore/

  49. arXiv:2003.06696  [pdf, other

    cs.NE

    Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks

    Authors: Chankyu Lee, Adarsh Kumar Kosta, Alex Zihao Zhu, Kenneth Chaney, Kostas Daniilidis, Kaushik Roy

    Abstract: Event-based cameras display great potential for a variety of tasks such as high-speed motion detection and navigation in low-light environments where conventional frame-based cameras suffer critically. This is attributed to their high temporal resolution, high dynamic range, and low-power consumption. However, conventional computer vision methods as well as deep Analog Neural Networks (ANNs) are n… ▽ More

    Submitted 14 September, 2020; v1 submitted 14 March, 2020; originally announced March 2020.

    Comments: European Conference on Computer Vision (ECCV) 2020

  50. arXiv:2003.06082  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    An Adversarial Objective for Scalable Exploration

    Authors: Bernadette Bucher, Karl Schmeckpeper, Nikolai Matni, Kostas Daniilidis

    Abstract: Model-based curiosity combines active learning approaches to optimal sampling with the information gain based incentives for exploration presented in the curiosity literature. Existing model-based curiosity methods look to approximate prediction uncertainty with approaches which struggle to scale to many prediction-planning pipelines used in robotics tasks. We address these scalability issues with… ▽ More

    Submitted 11 November, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: Additional visualizations of our results are available on our website at https://sites.google.com/view/action-for-better-prediction . Bernadette Bucher and Karl Schmeckpeper contributed equally