Skip to main content

Showing 101–150 of 173 results for author: Savarese, S

.
  1. arXiv:1908.06769  [pdf, other

    cs.AI cs.LG cs.RO

    Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

    Authors: De-An Huang, Danfei Xu, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei, Juan Carlos Niebles

    Abstract: We address one-shot imitation learning, where the goal is to execute a previously unseen task based on a single demonstration. While there has been exciting progress in this direction, most of the approaches still require a few hundred tasks for meta-training, which limits the scalability of the approaches. Our main contribution is to formulate one-shot imitation learning as a symbolic planning pr… ▽ More

    Submitted 4 November, 2019; v1 submitted 16 August, 2019; originally announced August 2019.

    Comments: IROS 2019

  2. arXiv:1907.13098  [pdf, other

    cs.RO cs.LG

    Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

    Authors: Michelle A. Lee, Yuke Zhu, Peter Zachares, Matthew Tan, Krishnan Srinivasan, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

    Abstract: Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is non-trivial to manually design a robot controller that combines these modalities which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy… ▽ More

    Submitted 27 July, 2019; originally announced July 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1810.10191

  3. Improving Social Awareness Through DANTE: A Deep Affinity Network for Clustering Conversational Interactants

    Authors: Mason Swofford, John Charles Peruzzi, Nathan Tsoi, Sydney Thompson, Roberto Martín-Martín, Silvio Savarese, Marynel Vázquez

    Abstract: We propose a data-driven approach to detect conversational groups by identifying spatial arrangements typical of these focused social encounters. Our approach uses a novel Deep Affinity Network (DANTE) to predict the likelihood that two individuals in a scene are part of the same conversational group, considering their social context. The predicted pair-wise affinities are then used in a graph clu… ▽ More

    Submitted 15 January, 2020; v1 submitted 24 July, 2019; originally announced July 2019.

  4. arXiv:1907.03395  [pdf, other

    cs.CV cs.LG

    Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

    Authors: Vineet Kosaraju, Amir Sadeghian, Roberto Martín-Martín, Ian Reid, S. Hamid Rezatofighi, Silvio Savarese

    Abstract: Predicting the future trajectories of multiple interacting agents in a scene has become an increasingly important problem for many different applications ranging from control of autonomous vehicles and social robots to security and surveillance. This problem is compounded by the presence of social interactions between humans and their physical interactions with the scene. While the existing litera… ▽ More

    Submitted 16 July, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

  5. arXiv:1906.10746  [pdf, other

    eess.SP cs.IT cs.LG eess.IV

    Time-Varying Interaction Estimation Using Ensemble Methods

    Authors: Brandon Oselio, Amir Sadeghian, Silvio Savarese, Alfred Hero

    Abstract: Directed information (DI) is a useful tool to explore time-directed interactions in multivariate data. However, as originally formulated DI is not well suited to interactions that change over time. In previous work, adaptive directed information was introduced to accommodate non-stationarity, while still preserving the utility of DI to discover complex dependencies between entities. There are many… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

    Comments: 2019 IEEE Data Science Workshop

  6. arXiv:1906.08880  [pdf, other

    cs.RO cs.AI cs.LG

    Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks

    Authors: Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg

    Abstract: Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to… ▽ More

    Submitted 2 August, 2019; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: IROS19

  7. arXiv:1905.07553  [pdf, other

    cs.CV

    Which Tasks Should Be Learned Together in Multi-task Learning?

    Authors: Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

    Abstract: Many computer vision applications require solving multiple tasks in real-time. A neural network can be trained to solve multiple tasks simultaneously using multi-task learning. This can save computation at inference time as only a single network needs to be evaluated. Unfortunately, this often leads to inferior overall performance as task objectives can compete, which consequently poses the questi… ▽ More

    Submitted 2 September, 2020; v1 submitted 18 May, 2019; originally announced May 2019.

    Comments: Presented to ICML 2020 See project website at http://taskgrou**.stanford.edu/

  8. Deep Local Trajectory Replanning and Control for Robot Navigation

    Authors: Ashwini Pokle, Roberto Martín-Martín, Patrick Goebel, Vincent Chow, Hans M. Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, Marynel Vázquez

    Abstract: We present a navigation system that combines ideas from hierarchical planning and machine learning. The system uses a traditional global planner to compute optimal paths towards a goal, and a deep local trajectory planner and velocity controller to compute motion commands. The latter components of the system adjust the behavior of the robot through attention mechanisms such that it moves towards t… ▽ More

    Submitted 13 May, 2019; originally announced May 2019.

    Report number: 18904288

    Journal ref: 2019 International Conference on Robotics and Automation (ICRA)

  9. arXiv:1904.08755  [pdf, other

    cs.CV cs.AI

    4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

    Authors: Christopher Choy, JunYoung Gwak, Silvio Savarese

    Abstract: In many robotics and VR/AR applications, 3D-videos are readily-available sources of input (a continuous sequence of depth images, or LIDAR scans). However, those 3D-videos are processed frame-by-frame either through 2D convnets or 3D perception algorithms. In this work, we propose 4-dimensional convolutional neural networks for spatio-temporal perception that can directly process such 3D-videos us… ▽ More

    Submitted 13 June, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: CVPR'19

  10. arXiv:1904.08500  [pdf, other

    cs.CV cs.LG eess.IV

    Machine Vision for Natural Gas Methane Emissions Detection Using an Infrared Camera

    Authors: **gfan Wang, Lyne P. Tchapmi, Arvind P. Ravikumara, Mike McGuire, Clay S. Bell, Daniel Zimmerle, Silvio Savarese, Adam R. Brandt

    Abstract: It is crucial to reduce natural gas methane emissions, which can potentially offset the climate benefits of replacing coal with gas. Optical gas imaging (OGI) is a widely-used method to detect methane leaks, but is labor-intensive and cannot provide leak detection results without operators' judgment. In this paper, we develop a computer vision approach to OGI-based leak detection using convolution… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: This paper was submitted to Applied Energy

  11. arXiv:1903.03878  [pdf, other

    cs.LG cs.CV cs.RO stat.ML

    Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

    Authors: Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese

    Abstract: Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). The prop… ▽ More

    Submitted 9 March, 2019; originally announced March 2019.

    Comments: CVPR 2019 paper with supplementary material

  12. arXiv:1903.02749  [pdf, other

    cs.RO

    Deep Visual MPC-Policy Learning for Navigation

    Authors: Noriaki Hirose, Fei Xia, Roberto Martin-Martin, Amir Sadeghian, Silvio Savarese

    Abstract: Humans can routinely follow a trajectory defined by a list of images/landmarks. However, traditional robot navigation methods require accurate map** of the environment, localization, and planning. Moreover, these methods are sensitive to subtle changes in the environment. In this paper, we propose a Deep Visual MPC-policy learning method that can perform visual navigation while avoiding collisio… ▽ More

    Submitted 29 May, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: 11pages, 11 figures, 5 tables

  13. Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter

    Authors: Michael Danielczuk, Andrey Kurenkov, Ashwin Balakrishna, Matthew Matl, David Wang, Roberto Martín-Martín, Animesh Garg, Silvio Savarese, Ken Goldberg

    Abstract: When operating in unstructured environments such as warehouses, homes, and retail centers, robots are frequently required to interactively search for and retrieve specific objects from cluttered bins, shelves, or tables. Mechanical Search describes the class of tasks where the goal is to locate and extract a known target object. In this paper, we formalize Mechanical Search and study a version whe… ▽ More

    Submitted 4 March, 2019; originally announced March 2019.

    Comments: To appear in IEEE International Conference on Robotics and Automation (ICRA), 2019. 9 pages with 4 figures

  14. arXiv:1903.00445  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    A Behavioral Approach to Visual Navigation with Graph Localization Networks

    Authors: Kevin Chen, Juan Pablo de Vicente, Gabriel Sepulveda, Fei Xia, Alvaro Soto, Marynel Vazquez, Silvio Savarese

    Abstract: Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual input and the topological map of the environment. We propose using graph neural networks for localizing the agent in the map, and decompose the action space into primitive behaviors im… ▽ More

    Submitted 1 March, 2019; originally announced March 2019.

    Comments: Video: https://youtu.be/nN3B1F90CFM

  15. arXiv:1902.09630  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

    Authors: Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese

    Abstract: Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that $IoU$ c… ▽ More

    Submitted 14 April, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

    Comments: accepted in CVPR 2019

  16. arXiv:1901.04780  [pdf, other

    cs.CV cs.RO

    DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

    Authors: Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese

    Abstract: A key technical challenge in performing 6D object pose estimation from RGB-D image is to fully leverage the two complementary data sources. Prior works either extract information from the RGB image and depth separately or use costly post-processing steps, limiting their performances in highly cluttered scenes and real-time applications. In this work, we present DenseFusion, a generic framework for… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

  17. arXiv:1812.11971  [pdf, other

    cs.CV cs.AI cs.LG cs.NE cs.RO

    Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

    Authors: Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio Savarese, Jitendra Malik

    Abstract: How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. delivering a package)? We study this question by integrating a generic perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within a reinforcement learning framework--see Figure 1. This skill set (hereafter mid-level perception) prov… ▽ More

    Submitted 22 April, 2019; v1 submitted 31 December, 2018; originally announced December 2018.

    Comments: See project website, demos, and code at http://perceptual.actor

  18. arXiv:1812.10071  [pdf, other

    cs.CV cs.LG

    Coupled Recurrent Network (CRN)

    Authors: Lin Sun, Kui Jia, Yuejia Shen, Silvio Savarese, Dit Yan Yeung, Bertram E. Shi

    Abstract: Many semantic video analysis tasks can benefit from multiple, heterogenous signals. For example, in addition to the original RGB input sequences, sequences of optical flow are usually used to boost the performance of human action recognition in videos. To learn from these heterogenous input sources, existing methods reply on two-stream architectural designs that contain independent, parallel strea… ▽ More

    Submitted 25 March, 2019; v1 submitted 25 December, 2018; originally announced December 2018.

  19. arXiv:1811.02790  [pdf, other

    cs.RO cs.AI cs.LG

    RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

    Authors: Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, Li Fei-Fei

    Abstract: Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification. However, research in this area has been limited to modest-sized datasets due to the difficulty of collecting large quantities of task demonstrations through existing mechanisms. This work introduces RoboTurk to ad… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: Published at the Conference on Robot Learning (CoRL) 2018

  20. arXiv:1810.10191  [pdf, other

    cs.RO cs.AI cs.LG

    Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

    Authors: Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg

    Abstract: Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on… ▽ More

    Submitted 7 March, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

    Comments: ICRA 2019

  21. arXiv:1810.00663  [pdf, other

    cs.CL cs.AI

    Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation

    Authors: Xiaoxue Zang, Ashwini Pokle, Marynel Vázquez, Kevin Chen, Juan Carlos Niebles, Alvaro Soto, Silvio Savarese

    Abstract: We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. We use attention models to connect information from both the user instructions and a topological representation of the environment. We evaluate our model's performance on a new dataset containing 10,050 pairs of navigation instructions. Our mode… ▽ More

    Submitted 24 September, 2018; originally announced October 2018.

  22. arXiv:1808.10654  [pdf, other

    cs.AI cs.CV cs.GR cs.LG cs.RO

    Gibson Env: Real-World Perception for Embodied Agents

    Authors: Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese

    Abstract: Develo** visual perception models for active agents and sensorimotor control are cumbersome to be done in the physical world, as existing algorithms are too slow to efficiently learn in real-time and robots are fragile and costly. This has given rise to learning-in-simulation which consequently casts a question on whether the results transfer to real-world. In this paper, we are concerned with t… ▽ More

    Submitted 31 August, 2018; originally announced August 2018.

    Comments: Access the code, dataset, and project website at http://gibsonenv.vision/ . CVPR 2018

    Journal ref: CVPR 2018

  23. arXiv:1807.03480  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration

    Authors: De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles

    Abstract: Our goal is to generate a policy to complete an unseen task given just a single video demonstration of the task in a given domain. We hypothesize that to successfully generalize to unseen complex tasks from a single video demonstration, it is necessary to explicitly incorporate the compositional structure of the tasks into the model. To this end, we propose Neural Task Graph (NTG) Networks, which… ▽ More

    Submitted 6 March, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: CVPR 2019

  24. arXiv:1806.09266  [pdf, other

    cs.RO cs.CV cs.LG stat.ML

    Learning Task-Oriented Gras** for Tool Manipulation from Simulated Self-Supervision

    Authors: Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, Silvio Savarese

    Abstract: Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and thus properly gras** and manipulating the tool to achieve the task. Task-agnostic gras** optimizes for grasp robustness while ignoring crucial task-specific constraints. In this paper, we propose the Task-Oriented Gras** Network (TOG-Net) to jo… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: RSS 2018

  25. arXiv:1806.08864  [pdf, other

    cs.CV cs.RO

    VUNet: Dynamic Scene View Synthesis for Traversability Estimation using an RGB Camera

    Authors: Noriaki Hirose, Amir Sadeghian, Fei Xia, Roberto Martin-Martin, Silvio Savarese

    Abstract: We present VUNet, a novel view(VU) synthesis method for mobile robots in dynamic environments, and its application to the estimation of future traversability. Our method predicts future images for given virtual robot velocity commands using only RGB images at previous and current time steps. The future images result from applying two types of image changes to the previous and current images: 1) ch… ▽ More

    Submitted 10 January, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

    Comments: website: http://svl.stanford.edu/projects/vunet/

    Journal ref: IEEE Robotics and Automation Letters 2019

  26. arXiv:1806.01482  [pdf, other

    cs.CV

    SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints

    Authors: Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, S. Hamid Rezatofighi, Silvio Savarese

    Abstract: This paper addresses the problem of path prediction for multiple interacting agents in a scene, which is a crucial step for many autonomous platforms such as self-driving cars and social robots. We present \textit{SoPhie}; an interpretable framework based on Generative Adversarial Network (GAN), which leverages two sources of information, the path history of all the agents in a scene, and the scen… ▽ More

    Submitted 20 September, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

  27. arXiv:1805.12018  [pdf, other

    cs.CV

    Generalizing to Unseen Domains via Adversarial Data Augmentation

    Authors: Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John Duchi, Vittorio Murino, Silvio Savarese

    Abstract: We are concerned with learning models that generalize well to different \emph{unseen} domains. We consider a worst-case formulation over data distributions that are near the source domain in the feature space. Only using training data from a single source distribution, we propose an iterative procedure that augments the dataset with examples from a fictitious target domain that is "hard" under the… ▽ More

    Submitted 6 November, 2018; v1 submitted 30 May, 2018; originally announced May 2018.

    Comments: Accepted to NIPS 2018 (camera ready)

  28. arXiv:1805.11614  [pdf, other

    cs.LG stat.ML

    Deep Learning under Privileged Information Using Heteroscedastic Dropout

    Authors: John Lambert, Ozan Sener, Silvio Savarese

    Abstract: Unlike machines, humans learn through rapid, abstract model-building. The role of a teacher is not simply to hammer home right or wrong answers, but rather to provide intuitive comments, comparisons, and explanations to a pupil. This is what the Learning Under Privileged Information (LUPI) paradigm endeavors to model by utilizing extra knowledge only available during training. We propose a new LUP… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

    Comments: CVPR 2018

  29. arXiv:1804.08328  [pdf, other

    cs.CV cs.AI cs.LG cs.NE cs.RO

    Taskonomy: Disentangling Task Transfer Learning

    Authors: Amir Zamir, Alexander Sax, William Shen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

    Abstract: Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies acros… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: CVPR 2018 (Oral). See project website and live demos at http://taskonomy.vision/

  30. arXiv:1803.10892  [pdf, other

    cs.CV

    Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

    Authors: Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, Alexandre Alahi

    Abstract: Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments. This is challenging because human motion is inherently multimodal: given a history of human motion paths, there are many socially plausible ways that people could move in the future. We tackle this problem by combining tools… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.

  31. arXiv:1803.08495  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings

    Authors: Kevin Chen, Christopher B. Choy, Manolis Savva, Angel X. Chang, Thomas Funkhouser, Silvio Savarese

    Abstract: We present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model combines and extends learning by association and metric learning approaches to learn implicit cross-modal connections, and produces a joint representation that captures the many-to-many relations between language and… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  32. arXiv:1803.03254  [pdf, other

    cs.RO cs.CV cs.LG

    GONet: A Semi-Supervised Deep Learning Approach For Traversability Estimation

    Authors: Noriaki Hirose, Amir Sadeghian, Marynel Vázquez, Patrick Goebel, Silvio Savarese

    Abstract: We present semi-supervised deep learning approaches for traversability estimation from fisheye images. Our method, GONet, and the proposed extensions leverage Generative Adversarial Networks (GANs) to effectively predict whether the area seen in the input image(s) is safe for a robot to traverse. These methods are trained with many positive images of traversable places, but just a small set of neg… ▽ More

    Submitted 8 March, 2018; originally announced March 2018.

    Comments: 8 pages, 7 figures, 3 tables

  33. arXiv:1712.04569  [pdf, other

    cs.CV

    Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View

    Authors: Shuran Song, Andy Zeng, Angel X. Chang, Manolis Savva, Silvio Savarese, Thomas Funkhouser

    Abstract: We present Im2Pano3D, a convolutional neural network that generates a dense prediction of 3D structure and a probability distribution of semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation (<= 50%) in the form of an RGB-D image. To make this possible, Im2Pano3D leverages strong contextual priors learned from large-scale synthetic and real-world in… ▽ More

    Submitted 12 December, 2017; originally announced December 2017.

    Comments: Video summary: https://youtu.be/Au3GmktK-So

  34. arXiv:1711.10061  [pdf, other

    cs.CV

    CAR-Net: Clairvoyant Attentive Recurrent Network

    Authors: Amir Sadeghian, Ferdinand Legros, Maxime Voisin, Ricky Vesel, Alexandre Alahi, Silvio Savarese

    Abstract: We present an interpretable framework for path prediction that leverages dependencies between agents' behaviors and their spatial navigation environment. We exploit two sources of information: the past motion trajectory of the agent of interest and a wide top-view image of the navigation scene. We propose a Clairvoyant Attentive Recurrent Network (CAR-Net) that learns where to look in a large imag… ▽ More

    Submitted 31 July, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

    Comments: The 2nd and 3rd authors contributed equally

    Journal ref: ECCV 2018

  35. arXiv:1711.08561  [pdf, other

    cs.CV

    Adversarial Feature Augmentation for Unsupervised Domain Adaptation

    Authors: Riccardo Volpi, Pietro Morerio, Silvio Savarese, Vittorio Murino

    Abstract: Recent works showed that Generative Adversarial Networks (GANs) can be successfully applied in unsupervised domain adaptation, where, given a labeled source dataset and an unlabeled target dataset, the goal is to train powerful classifiers for the target samples. In particular, it was shown that a GAN objective function can be used to learn target features indistinguishable from the source ones. I… ▽ More

    Submitted 4 May, 2018; v1 submitted 22 November, 2017; originally announced November 2017.

    Comments: Accepted to CVPR 2018

  36. arXiv:1711.02741  [pdf, other

    cs.CV cs.AI cs.LG

    Recurrent Autoregressive Networks for Online Multi-Object Tracking

    Authors: Kuan Fang, Yu Xiang, Xiaocheng Li, Silvio Savarese

    Abstract: The main challenge of online multi-object tracking is to reliably associate object trajectories with detections in each video frame based on their tracking history. In this work, we propose the Recurrent Autoregressive Network (RAN), a temporal generative modeling framework to characterize the appearance and motion dynamics of multiple objects over time. The RAN couples an external memory and an i… ▽ More

    Submitted 3 March, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

    Comments: 10 pages, 3 figures, 6 tables

  37. arXiv:1710.08247  [pdf, other

    cs.CV cs.LG cs.NE cs.RO

    Generic 3D Representation via Pose Estimation and Matching

    Authors: Amir R. Zamir, Tilman Wekel, Pulkit Argrawal, Colin Weil, Jitendra Malik, Silvio Savarese

    Abstract: Though a large body of computer vision research has investigated develo** generic semantic representations, efforts towards develo** a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the prem… ▽ More

    Submitted 23 October, 2017; originally announced October 2017.

    Comments: Published in ECCV16. See the project website http://3drepresentation.stanford.edu/ and dataset website https://github.com/amir32002/3D_Street_View

    Journal ref: ECCV 2016 535-553

  38. arXiv:1710.07563  [pdf, other

    cs.CV

    SEGCloud: Semantic Segmentation of 3D Point Clouds

    Authors: Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, Silvio Savarese

    Abstract: 3D semantic scene labeling is fundamental to agents operating in the real world. In particular, labeling raw 3D point sets from sensors provides fine-grained semantics. Recent works leverage the capabilities of Neural Networks (NNs), but are limited to coarse voxel predictions and do not explicitly enforce global consistency. We present SEGCloud, an end-to-end framework to obtain 3D point-level se… ▽ More

    Submitted 20 October, 2017; originally announced October 2017.

    Comments: Accepted as a spotlight at the International Conference of 3D Vision (3DV 2017)

  39. arXiv:1710.06422  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Multi-Task Domain Adaptation for Deep Learning of Instance Gras** from Simulation

    Authors: Kuan Fang, Yunfei Bai, Stefan Hinterstoisser, Silvio Savarese, Mrinal Kalakrishnan

    Abstract: Learning-based approaches to robotic manipulation are limited by the scalability of data collection and accessibility of labels. In this paper, we present a multi-task domain adaptation framework for instance gras** in cluttered scenes by utilizing simulated robot experiments. Our neural network takes monocular RGB images and the instance segmentation mask of a specified target object as inputs,… ▽ More

    Submitted 3 March, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

    Comments: ICRA 2018

  40. arXiv:1710.06104  [pdf, other

    cs.CV

    Large-Scale 3D Shape Reconstruction and Segmentation from ShapeNet Core55

    Authors: Li Yi, Lin Shao, Manolis Savva, Haibin Huang, Yang Zhou, Qirui Wang, Benjamin Graham, Martin Engelcke, Roman Klokov, Victor Lempitsky, Yuan Gan, Pengyu Wang, Kun Liu, Fenggen Yu, Panpan Shui, Bingyang Hu, Yan Zhang, Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Minki Jeong, Jaehoon Choi, Changick Kim, Angom Geetchandra , et al. (25 additional authors not shown)

    Abstract: We introduce a large-scale 3D shape understanding benchmark using data and annotation from ShapeNet 3D object database. The benchmark consists of two tasks: part-level segmentation of 3D shapes and 3D reconstruction from single view images. Ten teams have participated in the challenge and the best performing teams have outperformed state-of-the-art approaches on both tasks. A few novel deep learni… ▽ More

    Submitted 27 October, 2017; v1 submitted 17 October, 2017; originally announced October 2017.

  41. arXiv:1710.01813  [pdf, other

    cs.AI cs.LG cs.RO

    Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

    Authors: Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese

    Abstract: In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it into finer sub-task specifications. These specifications are fed to a hierarchical neural program, wher… ▽ More

    Submitted 14 March, 2018; v1 submitted 4 October, 2017; originally announced October 2017.

    Comments: ICRA 2018

  42. arXiv:1709.05439  [pdf, other

    cs.CV cs.RO

    To Go or Not To Go? A Near Unsupervised Learning Approach For Robot Navigation

    Authors: Noriaki Hirose, Amir Sadeghian, Patrick Goebel, Silvio Savarese

    Abstract: It is important for robots to be able to decide whether they can go through a space or not, as they navigate through a dynamic environment. This capability can help them avoid injury or serious damage, e.g., as a result of running into people and obstacles, getting stuck, or falling off an edge. To this end, we propose an unsupervised and a near-unsupervised method based on Generative Adversarial… ▽ More

    Submitted 15 September, 2017; originally announced September 2017.

    Comments: Noriaki Hirose and Amir Sadeghian contributed equally

  43. arXiv:1708.04672  [pdf, other

    cs.CV cs.GR

    DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image

    Authors: Andrey Kurenkov, **gwei Ji, Animesh Garg, Viraj Mehta, JunYoung Gwak, Christopher Choy, Silvio Savarese

    Abstract: 3D reconstruction from a single image is a key problem in multiple applications ranging from robotic manipulation to augmented reality. Prior methods have tackled this problem through generative models which predict 3D reconstructions as voxels or point clouds. However, these methods can be computationally expensive and miss fine details. We introduce a new differentiable layer for 3D data deforma… ▽ More

    Submitted 10 August, 2017; originally announced August 2017.

    Comments: 11 pages, 9 figures, NIPS

  44. arXiv:1708.03958  [pdf, other

    cs.CV

    Lattice Long Short-Term Memory for Human Action Recognition

    Authors: Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese

    Abstract: Human actions captured in video sequences are three-dimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short-Term Memory (… ▽ More

    Submitted 13 August, 2017; originally announced August 2017.

    Comments: ICCV2017

  45. arXiv:1708.00489  [pdf, other

    stat.ML cs.CV cs.LG

    Active Learning for Convolutional Neural Networks: A Core-Set Approach

    Authors: Ozan Sener, Silvio Savarese

    Abstract: Convolutional neural networks (CNNs) have been successfully applied to many recognition and learning tasks using a universal recipe; training a deep model on a very large dataset of supervised examples. However, this approach is rather restrictive in practice since collecting a large set of labeled images is very expensive. One way to ease this problem is coming up with smart ways for choosing ima… ▽ More

    Submitted 1 June, 2018; v1 submitted 1 August, 2017; originally announced August 2017.

    Comments: ICLR 2018 Paper

  46. arXiv:1707.04674  [pdf, other

    cs.RO

    ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems

    Authors: James Harrison, Animesh Garg, Boris Ivanovic, Yuke Zhu, Silvio Savarese, Li Fei-Fei, Marco Pavone

    Abstract: Model-free policy learning has enabled robust performance of complex tasks with relatively simple algorithms. However, this simplicity comes at the cost of requiring an Oracle and arguably very poor sample complexity. This renders such methods unsuitable for physical systems. Variants of model-based methods address this problem through the use of simulators, however, this gives rise to the problem… ▽ More

    Submitted 8 November, 2017; v1 submitted 14 July, 2017; originally announced July 2017.

    Comments: International Symposium on Robotics Research (ISRR), 2017

  47. arXiv:1705.10904  [pdf, other

    cs.CV

    Weakly supervised 3D Reconstruction with Adversarial Constraint

    Authors: JunYoung Gwak, Christopher B. Choy, Animesh Garg, Manmohan Chandraker, Silvio Savarese

    Abstract: Supervised 3D reconstruction has witnessed a significant progress through the use of deep neural networks. However, this increase in performance requires large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D supervision as an alternative for expensive 3D CAD annotation. Specifically, we use foreground masks as weak supervision through a raytrace pooling layer that enables… ▽ More

    Submitted 4 October, 2017; v1 submitted 30 May, 2017; originally announced May 2017.

  48. arXiv:1703.02168  [pdf, other

    cs.CV

    Deep View Morphing

    Authors: Dinghuang Ji, Junghyun Kwon, Max McFarland, Silvio Savarese

    Abstract: Recently, convolutional neural networks (CNN) have been successfully applied to view synthesis problems. However, such CNN-based methods can suffer from lack of texture details, shape distortions, or high computational complexity. In this paper, we propose a novel CNN architecture for view synthesis called "Deep View Morphing" that does not suffer from these issues. To synthesize a middle view of… ▽ More

    Submitted 6 March, 2017; originally announced March 2017.

    Comments: Accepted to CVPR 2017

  49. arXiv:1702.01105  [pdf, other

    cs.CV cs.RO

    Joint 2D-3D-Semantic Data for Indoor Scene Understanding

    Authors: Iro Armeni, Sasha Sax, Amir R. Zamir, Silvio Savarese

    Abstract: We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. The dataset covers over 6,000m2 and contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360° equi… ▽ More

    Submitted 5 April, 2017; v1 submitted 3 February, 2017; originally announced February 2017.

    Comments: The dataset is available http://3Dsemantics.stanford.edu/

  50. arXiv:1701.01909  [pdf, other

    cs.CV

    Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies

    Authors: Amir Sadeghian, Alexandre Alahi, Silvio Savarese

    Abstract: The majority of existing solutions to the Multi-Target Tracking (MTT) problem do not combine cues in a coherent end-to-end fashion over a long period of time. However, we present an online method that encodes long-term temporal dependencies across multiple cues. One key challenge of tracking methods is to accurately track occluded targets or those which share similar appearance properties with sur… ▽ More

    Submitted 3 April, 2017; v1 submitted 7 January, 2017; originally announced January 2017.