Skip to main content

Showing 1–4 of 4 results for author: Ramanishka, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:1811.02307  [pdf, other

    cs.CV

    Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning

    Authors: Vasili Ramanishka, Yi-Ting Chen, Teruhisa Misu, Kate Saenko

    Abstract: Driving Scene understanding is a key ingredient for intelligent transportation systems. To achieve systems that can operate in a complex physical and social environment, they need to understand and learn how humans drive and interact with traffic scenes. We present the Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on learning driver behavior in real-life… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

    Comments: The dataset is available at https://usa.honda-ri.com/hdd

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7699-7707

  2. arXiv:1802.10250  [pdf, other

    cs.CV

    Joint Event Detection and Description in Continuous Video Streams

    Authors: Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

    Abstract: Dense video captioning is a fine-grained video understanding task that involves two sub-problems: localizing distinct events in a long video stream, and generating captions for the localized events. We propose the Joint Event Detection and Description Network (JEDDi-Net), which solves the dense video captioning task in an end-to-end fashion. Our model continuously encodes the input video stream wi… ▽ More

    Submitted 25 December, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: WACV2019

  3. arXiv:1612.07360  [pdf, other

    cs.CV

    Top-down Visual Saliency Guided by Captions

    Authors: Vasili Ramanishka, Abir Das, Jianming Zhang, Kate Saenko

    Abstract: Neural image/video captioning models can generate accurate descriptions, but their internal process of map** regions to words is a black box and therefore difficult to explain. Top-down neural saliency methods can find important regions given a high-level semantic task such as object classification, but cannot use a natural language sentence as the top-down input for the task. In this paper, we… ▽ More

    Submitted 12 April, 2017; v1 submitted 21 December, 2016; originally announced December 2016.

    Comments: CVPR 2017 camera ready version

  4. arXiv:1505.05914  [pdf, other

    cs.CV

    A Multi-scale Multiple Instance Video Description Network

    Authors: Huijuan Xu, Subhashini Venugopalan, Vasili Ramanishka, Marcus Rohrbach, Kate Saenko

    Abstract: Generating natural language descriptions for in-the-wild videos is a challenging task. Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (AlexNet, GoogLeNet) to extract a visual representation of the input video. However, these deep CNN architectures are designed for single-label centered-positioned object classification. W… ▽ More

    Submitted 18 March, 2016; v1 submitted 21 May, 2015; originally announced May 2015.

    Comments: ICCV15 workshop on Closing the Loop Between Vision and Language