Skip to main content

Showing 1–16 of 16 results for author: Stent, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.03052  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Tracking through Containers and Occluders in the Wild

    Authors: Basile Van Hoorick, Pavel Tokmakov, Simon Stent, Jie Li, Carl Vondrick

    Abstract: Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems. In this paper, we introduce $\textbf{TCOW}$, a new benchmark and model for visual tracking through heavy occlusion and containment. We set up a task where the goal is to, given a video sequence, segment both the projected extent of the target object, as well as the sur… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted at CVPR 2023. Project webpage is available at: https://tcow.cs.columbia.edu/

  2. arXiv:2210.02174  [pdf, other

    cs.LG cs.RO

    CW-ERM: Improving Autonomous Driving Planning with Closed-loop Weighted Empirical Risk Minimization

    Authors: Eesha Kumar, Yiming Zhang, Stefano Pini, Simon Stent, Ana Ferreira, Sergey Zagoruyko, Christian S. Perone

    Abstract: The imitation learning of self-driving vehicle policies through behavioral cloning is often carried out in an open-loop fashion, ignoring the effect of actions to future states. Training such policies purely with Empirical Risk Minimization (ERM) can be detrimental to real-world performance, as it biases policy networks towards matching only open-loop behavior, showing poor results when evaluated… ▽ More

    Submitted 11 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: v2: minor update in dataset and results (no changes in improvements or conclusions)

  3. arXiv:2208.03826  [pdf, other

    cs.CV

    Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications

    Authors: Lingzhi Zhang, Shenghao Zhou, Simon Stent, Jianbo Shi

    Abstract: Egocentric videos offer fine-grained information for high-fidelity modeling of human behaviors. Hands and interacting objects are one crucial aspect of understanding a viewer's behaviors and intentions. We provide a labeled dataset consisting of 11,243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with during a diverse array of daily activities. Our dat… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: 25 pages, 17 figures, ECCV 2022

  4. arXiv:2207.09619  [pdf, other

    cs.HC cs.AI cs.RO

    Learning Latent Traits for Simulated Cooperative Driving Tasks

    Authors: Jonathan A. DeCastro, Deepak Gopinath, Guy Rosman, Emily Sumner, Shabnam Hakimi, Simon Stent

    Abstract: To construct effective teaming strategies between humans and AI systems in complex, risky situations requires an understanding of individual preferences and behaviors of humans. Previously this problem has been treated in case-specific or data-agnostic ways. In this paper, we build a framework capable of capturing a compact latent representation of the human in terms of their behavior and preferen… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

  5. arXiv:2206.08990  [pdf, other

    cs.CV cs.GR

    Shadows Shed Light on 3D Objects

    Authors: Ruoshi Liu, Sachit Menon, Chengzhi Mao, Dennis Park, Simon Stent, Carl Vondrick

    Abstract: 3D reconstruction is a fundamental problem in computer vision, and the task is especially challenging when the object to reconstruct is partially or fully occluded. We introduce a method that uses the shadows cast by an unobserved object in order to infer the possible 3D volumes behind the occlusion. We create a differentiable image formation model that allows us to jointly infer the 3D shape of a… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: 19 pages, 10 figures

  6. arXiv:2204.10916  [pdf, other

    cs.CV cs.LG

    Revealing Occlusions with 4D Neural Fields

    Authors: Basile Van Hoorick, Purva Tendulkar, Didac Suris, Dennis Park, Simon Stent, Carl Vondrick

    Abstract: For computer vision systems to operate in dynamic situations, they need to be able to represent and reason about object permanence. We introduce a framework for learning to estimate 4D visual representations from monocular RGB-D, which is able to persist objects, even once they become obstructed by occlusions. Unlike traditional video representations, we encode point clouds into a continuous repre… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: CVPR 2022 (Oral)

  7. arXiv:2111.09748  [pdf, other

    cs.CV cs.HC

    The Way to my Heart is through Contrastive Learning: Remote Photoplethysmography from Unlabelled Video

    Authors: John Gideon, Simon Stent

    Abstract: The ability to reliably estimate physiological signals from video is a powerful tool in low-cost, pre-clinical health monitoring. In this work we propose a new approach to remote photoplethysmography (rPPG) - the measurement of blood volume changes from observations of a person's face or skin. Similar to current state-of-the-art methods for rPPG, we apply neural networks to learn deep representati… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: Code available at https://github.com/ToyotaResearchInstitute/RemotePPG

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 3995-4004

  8. arXiv:2110.08610  [pdf, other

    cs.HC cs.CV cs.LG cs.RO

    MAAD: A Model and Dataset for "Attended Awareness" in Driving

    Authors: Deepak Gopinath, Guy Rosman, Simon Stent, Katsuya Terahata, Luke Fletcher, Brenna Argall, John Leonard

    Abstract: We propose a computational model to estimate a person's attended awareness of their environment. We define attended awareness to be those parts of a potentially dynamic scene which a person has attended to in recent history and which they are still likely to be physically aware of. Our model takes as input scene information in the form of a video and noisy gaze estimates, and outputs visual salien… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

    Comments: 25 pages, 13 figures, 14 tables, Accepted at EPIC@ICCV 2021 Workshop. Main paper + Supplementary Material

  9. arXiv:2108.11950  [pdf, other

    cs.CV cs.CL

    LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

    Authors: Zhijian Liu, Simon Stent, Jie Li, John Gideon, Song Han

    Abstract: Computer vision tasks such as object detection and semantic/instance segmentation rely on the painstaking annotation of large training datasets. In this paper, we propose LocTex that takes advantage of the low-cost localized textual annotations (i.e., captions and synchronized mouse-over gestures) to reduce the annotation effort. We introduce a contrastive pre-training framework between images and… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: ICCV 2021. Project page: https://loctex.mit.edu/

  10. arXiv:1910.10088  [pdf, other

    cs.CV

    Gaze360: Physically Unconstrained Gaze Estimation in the Wild

    Authors: Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, Antonio Torralba

    Abstract: Understanding where people are looking is an informative social cue. In this work, we present Gaze360, a large-scale gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images. Our dataset consists of 238 subjects in indoor and outdoor environments with labelled 3D gaze across a wide range of head poses and distances. It is the largest publicly available dataset of its… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: International Conference in Computer Vision, 2019

  11. arXiv:1809.03355  [pdf, other

    cs.CV

    Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks

    Authors: AdriĆ  Recasens, Petr Kellnhofer, Simon Stent, Wojciech Matusik, Antonio Torralba

    Abstract: We introduce a saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task. Our differentiable layer can be added as a preprocessing block to existing task networks and trained altogether in an end-to-end fashion. The effect of the layer is to efficiently estimate how to sample from the original data in order to boost… ▽ More

    Submitted 10 September, 2018; originally announced September 2018.

    Comments: European Conference on Computer Vision, 2018, 14 pages, 6 figures

  12. arXiv:1802.08936  [pdf, other

    cs.CV

    A Dataset To Evaluate The Representations Learned By Video Prediction Models

    Authors: Ryan Szeto, Simon Stent, German Ros, Jason J. Corso

    Abstract: We present a parameterized synthetic dataset called Moving Symbols to support the objective study of video prediction networks. Using several instantiations of the dataset in which variation is explicitly controlled, we highlight issues in an existing state-of-the-art approach and propose the use of a performance metric with greater semantic meaning to improve experimental interpretability. Our da… ▽ More

    Submitted 21 March, 2018; v1 submitted 24 February, 2018; originally announced February 2018.

    Comments: Accepted to ICLR 2018 Workshop Track. Fixed Figure 2

  13. arXiv:1607.07405  [pdf, other

    cs.CV cs.LG

    gvnn: Neural Network Library for Geometric Computer Vision

    Authors: Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, Andrew Davison

    Abstract: We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spi… ▽ More

    Submitted 12 August, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

    Comments: Submitted to ECCV Workshop on Deep Geometry

  14. arXiv:1604.01545  [pdf, ps, other

    cs.CV

    Training Constrained Deconvolutional Networks for Road Scene Semantic Segmentation

    Authors: German Ros, Simon Stent, Pablo F. Alcantarilla, Tomoki Watanabe

    Abstract: In this work we investigate the problem of road scene semantic segmentation using Deconvolutional Networks (DNs). Several constraints limit the practical performance of DNs in this context: firstly, the paucity of existing pixel-wise labelled training data, and secondly, the memory constraints of embedded hardware, which rule out the practical use of state-of-the-art DN architectures such as fully… ▽ More

    Submitted 6 April, 2016; originally announced April 2016.

    Comments: submitted as a conference paper

  15. arXiv:1511.07041  [pdf, other

    cs.CV

    SceneNet: Understanding Real World Indoor Scenes With Synthetic Data

    Authors: Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, Roberto Cipolla

    Abstract: Scene understanding is a prerequisite to many high level tasks for any automated intelligent machine operating in real world environments. Recent attempts with supervised learning have shown promise in this direction but also highlighted the need for enormous quantity of supervised data --- performance increases in proportion to the amount of data used. However, this quickly becomes prohibitive wh… ▽ More

    Submitted 26 November, 2015; v1 submitted 22 November, 2015; originally announced November 2015.

  16. arXiv:1505.00171  [pdf, other

    cs.CV

    SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes

    Authors: Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent, Roberto Cipolla

    Abstract: We are interested in automatic scene understanding from geometric cues. To this end, we aim to bring semantic segmentation in the loop of real-time reconstruction. Our semantic segmentation is built on a deep autoencoder stack trained exclusively on synthetic depth data generated from our novel 3D scene library, SynthCam3D. Importantly, our network is able to segment real world scenes without any… ▽ More

    Submitted 1 May, 2015; originally announced May 2015.