Skip to main content

Showing 1–50 of 63 results for author: Corso, J J

.
  1. arXiv:2207.05845  [pdf, other

    cs.CV

    Learning to Estimate External Forces of Human Motion in Video

    Authors: Nathan Louis, Tylan N. Templin, Travis D. Eliason, Daniel P. Nicolella, Jason J. Corso

    Abstract: Analyzing sports performance or preventing injuries requires capturing ground reaction forces (GRFs) exerted by the human body during certain movements. Standard practice uses physical markers paired with force plates in a controlled environment, but this is marred by high costs, lengthy implementation time, and variance in repeat experiments; hence, we propose GRF inference from video. While rece… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted to ACMMM 2022

  2. arXiv:2204.07024  [pdf, other

    cs.CV cs.LG

    Q-TART: Quickly Training for Adversarial Robustness and in-Transferability

    Authors: Madan Ravi Ganesh, Salimeh Yasaei Sekeh, Jason J. Corso

    Abstract: Raw deep neural network (DNN) performance is not enough; in real-world settings, computational load, training efficiency and adversarial security are just as or even more important. We propose to simultaneously tackle Performance, Efficiency, and Robustness, using our proposed algorithm Q-TART, Quickly Train for Adversarial Robustness and in-Transferability. Q-TART follows the intuition that sampl… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: 13 pages

  3. arXiv:2110.10206  [pdf, other

    cs.CV

    Evaluating and Improving Interactions with Hazy Oracles

    Authors: Stephan J. Lemmer, Jason J. Corso

    Abstract: Many AI systems integrate sensor inputs, world knowledge, and human-provided information to perform inference. While such systems often treat the human input as flawless, humans are better thought of as hazy oracles whose input may be ambiguous or outside of the AI system's understanding. In such situations it makes sense for the AI system to defer its inference while it disambiguates the human-pr… ▽ More

    Submitted 30 November, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: To be published in the Proceedings of the 2023 AAAI Conference on Artificial Intelligence (AAAI) 14 pages, 5 tables, 9 figures

  4. arXiv:2105.05332  [pdf, other

    cs.CV

    The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting

    Authors: Ryan Szeto, Jason J. Corso

    Abstract: Quantitative evaluation has increased dramatically among recent video inpainting work, but the video and mask content used to gauge performance has received relatively little attention. Although attributes such as camera and background scene motion inherently change the difficulty of the task and affect methods differently, existing evaluation schemes fail to control for them, thereby providing mi… ▽ More

    Submitted 25 April, 2022; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR 2022

  5. arXiv:2103.01468  [pdf, other

    cs.CV cs.RO

    Depth from Camera Motion and Object Detection

    Authors: Brent A. Griffin, Jason J. Corso

    Abstract: This paper addresses the problem of learning to estimate the depth of detected objects given some measurement of camera motion (e.g., from robot kinematics or vehicle odometry). We achieve this by 1) designing a recurrent neural network (DBox) that estimates the depth of objects using a generalized representation of bounding boxes and uncalibrated camera movement and 2) introducing the Object Dept… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: CVPR 2021

  6. arXiv:2101.04281  [pdf, other

    cs.CV

    Temporally Guided Articulated Hand Pose Tracking in Surgical Videos

    Authors: Nathan Louis, Luowei Zhou, Steven J. Yule, Roger D. Dias, Milisa Manojlovich, Francis D. Pagani, Donald S. Likosky, Jason J. Corso

    Abstract: Articulated hand pose tracking is an under-explored problem that carries the potential for use in an extensive number of applications, especially in the medical domain. With a robust and accurate tracking system on in-vivo surgical videos, the motion dynamics and movement patterns of the hands can be captured and analyzed for many rich tasks. In this work, we propose a novel hand pose estimation m… ▽ More

    Submitted 20 October, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: 10 pages

  7. arXiv:2011.03920  [pdf, other

    cs.CV

    Integrating Human Gaze into Attention for Egocentric Activity Recognition

    Authors: Kyle Min, Jason J. Corso

    Abstract: It is well known that human gaze carries significant information about visual attention. However, there are three main difficulties in incorporating the gaze data in an attention mechanism of deep neural networks: 1) the gaze fixation points are likely to have measurement errors due to blinking and rapid eye movements; 2) it is unclear when and how much the gaze data is correlated with visual atte… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

    Comments: WACV 2021 camera ready (Supplementary material: on CVF soon)

  8. arXiv:2010.12639  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    The RobotSlang Benchmark: Dialog-guided Robot Localization and Navigation

    Authors: Shurjo Banerjee, Jesse Thomason, Jason J. Corso

    Abstract: Autonomous robot systems for applications from search and rescue to assistive guidance should be able to engage in natural language dialog with people. To study such cooperative communication, we introduce Robot Simultaneous Localization and Map** with Natural Language (RobotSlang), a benchmark of 169 natural language dialogs between a human Driver controlling a robot and a human Commander provi… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: Conference on Robot Learning 2020

  9. arXiv:2009.07414  [pdf, other

    cs.CV

    Ground-truth or DAER: Selective Re-query of Secondary Information

    Authors: Stephan J. Lemmer, Jason J. Corso

    Abstract: Many vision tasks use secondary information at inference time -- a seed -- to assist a computer vision model in solving a problem. For example, an initial bounding box is needed to initialize visual object tracking. To date, all such work makes the assumption that the seed is a good one. However, in practice, from crowdsourcing to noisy automated seeds, this is often not the case. We hence propose… ▽ More

    Submitted 2 September, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

    Comments: Accepted to ICCV2021 Main: 12 pages, 7 figures. Supplementary: 4 pages, 4 figures

  10. Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection

    Authors: Duygu Sarikaya, Jason J. Corso, Khurshid A. Guru

    Abstract: Video understanding of robot-assisted surgery (RAS) videos is an active research area. Modeling the gestures and skill level of surgeons presents an interesting problem. The insights drawn may be applied in effective skill acquisition, objective skill assessment, real-time feedback, and human-robot collaborative surgeries. We propose a solution to the tool detection and localization open problem i… ▽ More

    Submitted 29 July, 2020; originally announced August 2020.

    Journal ref: IEEE Transactions on Medical Imaging 36 (2017) 1542-1549

  11. arXiv:2007.06643  [pdf, other

    cs.CV

    Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization

    Authors: Kyle Min, Jason J. Corso

    Abstract: Temporally localizing activities within untrimmed videos has been extensively studied in recent years. Despite recent advances, existing methods for weakly-supervised temporal activity localization struggle to recognize when an activity is not occurring. To address this issue, we propose a novel method named A2CL-PT. Two triplets of the feature space are considered in our approach: one triplet is… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: ECCV 2020 camera ready (Supplementary material: on ECVA soon)

  12. arXiv:2007.05676  [pdf, other

    cs.CV

    Learning Object Depth from Camera Motion and Video Object Segmentation

    Authors: Brent A. Griffin, Jason J. Corso

    Abstract: Video object segmentation, i.e., the separation of a target object from background in video, has made significant progress on real and challenging videos in recent years. To leverage this progress in 3D applications, this paper addresses the problem of learning to estimate the depth of segmented objects given some measurement of camera motion (e.g., from robot kinematics or vehicle odometry). We a… ▽ More

    Submitted 18 December, 2020; v1 submitted 10 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  13. arXiv:2006.12463  [pdf, other

    cs.CV cs.IT cs.LG

    Slimming Neural Networks using Adaptive Connectivity Scores

    Authors: Madan Ravi Ganesh, Dawsin Blanchard, Jason J. Corso, Salimeh Yasaei Sekeh

    Abstract: In general, deep neural network (DNN) pruning methods fall into two categories: 1) Weight-based deterministic constraints, and 2) Probabilistic frameworks. While each approach has its merits and limitations there are a set of common practical issues such as, trial-and-error to analyze sensitivity and hyper-parameters to prune DNNs, which plague them both. In this work, we propose a new single-shot… ▽ More

    Submitted 17 December, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: 18 pages

  14. arXiv:2006.03586  [pdf, other

    cs.CV

    Novel Object Viewpoint Estimation through Reconstruction Alignment

    Authors: Mohamed El Banani, Jason J. Corso, David F. Fouhey

    Abstract: The goal of this paper is to estimate the viewpoint for a novel object. Standard viewpoint estimation approaches generally fail on this task due to their reliance on a 3D model for alignment or large amounts of class-specific training data and their corresponding canonical pose. We overcome those limitations by learning a reconstruct and align approach. Our key insight is that although we do not h… ▽ More

    Submitted 5 June, 2020; originally announced June 2020.

    Comments: To appear at CVPR 2020. Project page: https://mbanani.github.io/novelviewpoints/

  15. arXiv:2003.08472  [pdf, other

    cs.LG cs.CV cs.IT

    MINT: Deep Network Compression via Mutual Information-based Neuron Trimming

    Authors: Madan Ravi Ganesh, Jason J. Corso, Salimeh Yasaei Sekeh

    Abstract: Most approaches to deep neural network compression via pruning either evaluate a filter's importance using its weights or optimize an alternative objective function with sparsity constraints. While these methods offer a useful way to approximate contributions from similar filters, they often either ignore the dependency between layers or solve a more difficult optimization objective than standard… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: 12 pages

  16. arXiv:2001.04529  [pdf, other

    cs.CV cs.AI

    Rethinking Curriculum Learning with Incremental Labels and Adaptive Compensation

    Authors: Madan Ravi Ganesh, Jason J. Corso

    Abstract: Like humans, deep networks have been shown to learn better when samples are organized and introduced in a meaningful order or curriculum. Conventional curriculum learning schemes introduce samples in their order of difficulty. This forces models to begin learning from a subset of the available data while adding the external overhead of evaluating the difficulty of samples. In this work, we propose… ▽ More

    Submitted 13 August, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: 15 pages

  17. arXiv:1912.04950  [pdf, other

    cs.CV

    HyperCon: Image-To-Video Model Transfer for Video-To-Video Translation Tasks

    Authors: Ryan Szeto, Mostafa El-Khamy, Jungwon Lee, Jason J. Corso

    Abstract: Video-to-video translation is more difficult than image-to-image translation due to the temporal consistency problem that, if unaddressed, leads to distracting flickering effects. Although video models designed from scratch produce temporally consistent results, training them to match the vast visual knowledge captured by image models requires an intractable number of videos. To combine the benefi… ▽ More

    Submitted 10 November, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Accepted to WACV 2021

  18. arXiv:1910.01182  [pdf, other

    cs.LG cs.CV stat.ML

    A Geometric Approach to Online Streaming Feature Selection

    Authors: Salimeh Yasaei Sekeh, Madan Ravi Ganesh, Shurjo Banerjee, Jason J. Corso, Alfred O. Hero

    Abstract: Online Streaming Feature Selection (OSFS) is a sequential learning problem where individual features across all samples are made available to algorithms in a streaming fashion. In this work, firstly, we assert that OSFS's main assumption of having data from all the samples available at runtime is unrealistic and introduce a new setting where features and samples are streamed concurrently called OS… ▽ More

    Submitted 16 March, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: 10 page, 5 figures, 4 tables

  19. arXiv:1909.11059  [pdf, other

    cs.CV

    Unified Vision-Language Pre-Training for Image Captioning and VQA

    Authors: Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J. Corso, Jianfeng Gao

    Abstract: This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and d… ▽ More

    Submitted 4 December, 2019; v1 submitted 24 September, 2019; originally announced September 2019.

    Comments: AAAI 2020 camera-ready version. The code and the pre-trained models are available at https://github.com/LuoweiZhou/VLP

  20. arXiv:1908.05786  [pdf, other

    cs.CV

    TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

    Authors: Kyle Min, Jason J. Corso

    Abstract: TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: ICCV 2019 camera ready (Supplementary material: on CVF soon)

  21. arXiv:1904.06807  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation

    Authors: Hao Tang, Dan Xu, Nicu Sebe, Yanzhi Wang, Jason J. Corso, Yan Yan

    Abstract: Cross-view image translation is challenging because it involves images with drastically different views and severe deformation. In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible to generate images of natural scenes in arbitrary viewpoints, based on an image of the scene and a novel semantic map. The proposed SelectionGAN exp… ▽ More

    Submitted 16 April, 2019; v1 submitted 14 April, 2019; originally announced April 2019.

    Comments: 20 pages, 16 figures, accepted to CVPR 2019 as an oral paper

    Journal ref: CVPR 2019

  22. arXiv:1904.00952  [pdf, other

    cs.RO

    Robot-Supervised Learning for Object Segmentation

    Authors: Victoria Florence, Jason J. Corso, Brent Griffin

    Abstract: To be effective in unstructured and changing environments, robots must learn to recognize new objects. Deep learning has enabled rapid progress for object detection and segmentation in computer vision; however, this progress comes at the price of human annotators labeling many training examples. This paper addresses the problem of extending learning-based segmentation methods to robotics applicati… ▽ More

    Submitted 4 March, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

  23. arXiv:1903.11779  [pdf, other

    cs.CV

    BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames

    Authors: Brent A. Griffin, Jason J. Corso

    Abstract: Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years. The current paradigm for segmentation methods and benchmark datasets is to segment objects in video provided a single annotation in the first frame. However, we find that segmentation performance across the entire video varies dramatically when selecting an alternative frame for… ▽ More

    Submitted 23 November, 2020; v1 submitted 27 March, 2019; originally announced March 2019.

    Comments: CVPR 2019

  24. arXiv:1903.08336  [pdf, other

    cs.RO

    Video Object Segmentation-based Visual Servo Control and Object Depth Estimation on a Mobile Robot

    Authors: Brent A. Griffin, Victoria Florence, Jason J. Corso

    Abstract: To be useful in everyday environments, robots must be able to identify and locate real-world objects. In recent years, video object segmentation has made significant progress on densely separating such objects from background in real and challenging videos. Building off of this progress, this paper addresses the problem of identifying generic objects and locating them in 3D using a mobile robot wi… ▽ More

    Submitted 9 January, 2020; v1 submitted 20 March, 2019; originally announced March 2019.

    Comments: WACV 2020

  25. arXiv:1901.09774  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Attribute-Guided Sketch Generation

    Authors: Hao Tang, Xinya Chen, Wei Wang, Dan Xu, Jason J. Corso, Nicu Sebe, Yan Yan

    Abstract: Facial attributes are important since they provide a detailed description and determine the visual appearance of human faces. In this paper, we aim at converting a face image to a sketch while simultaneously generating facial attributes. To this end, we propose a novel Attribute-Guided Sketch Generative Adversarial Network (ASGAN) which is an end-to-end framework and contains two pairs of generato… ▽ More

    Submitted 14 April, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

    Comments: 7 pages, 6 figures, accepted to FG 2019

  26. arXiv:1901.05580  [pdf, other

    cs.RO

    Kinematically-Informed Interactive Perception: Robot-Generated 3D Models for Classification

    Authors: Abhishek Venkataraman, Brent Griffin, Jason J. Corso

    Abstract: To be useful in everyday environments, robots must be able to observe and learn about objects. Recent datasets enable progress for classifying data into known object categories; however, it is unclear how to collect reliable object data when operating in cluttered, partially-observable environments. In this paper, we address the problem of building complete 3D models for real-world objects using a… ▽ More

    Submitted 16 January, 2019; originally announced January 2019.

  27. arXiv:1812.06587  [pdf, other

    cs.CV

    Grounded Video Description

    Authors: Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach

    Abstract: Video description is one of the most challenging problems in vision and language understanding due to the large variability both on the video and language side. Models, hence, typically shortcut the difficulty in recognition and generate plausible sentences that are based on priors but are not necessarily grounded in the video. In this work, we explicitly link the sentence to the evidence in the v… ▽ More

    Submitted 5 May, 2019; v1 submitted 16 December, 2018; originally announced December 2018.

    Comments: CVPR 2019 oral, camera-ready version including appendix

  28. arXiv:1812.05637  [pdf, other

    cs.CV

    Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition

    Authors: Hao Huang, Luowei Zhou, Wei Zhang, Jason J. Corso, Chenliang Xu

    Abstract: Video action recognition, a critical problem in video understanding, has been gaining increasing attention. To identify actions induced by complex object-object interactions, we need to consider not only spatial relations among objects in a single frame, but also temporal relations among different or the same objects across multiple frames. However, existing approaches that model video representat… ▽ More

    Submitted 7 May, 2019; v1 submitted 13 December, 2018; originally announced December 2018.

    Comments: 14 pages, 5 figures

  29. arXiv:1811.07958  [pdf, other

    cs.CV

    Tukey-Inspired Video Object Segmentation

    Authors: Brent A. Griffin, Jason J. Corso

    Abstract: We investigate the problem of strictly unsupervised video object segmentation, i.e., the separation of a primary object from background in video without a user-provided object mask or any training on an annotated dataset. We find foreground objects in low-level vision data using a John Tukey-inspired measure of "outlierness". This Tukey-inspired measure also estimates the reliability of each data… ▽ More

    Submitted 29 November, 2018; v1 submitted 19 November, 2018; originally announced November 2018.

  30. arXiv:1809.09318  [pdf, other

    cs.LG stat.ML

    Floyd-Warshall Reinforcement Learning: Learning from Past Experiences to Reach New Goals

    Authors: Vikas Dhiman, Shurjo Banerjee, Jeffrey M. Siskind, Jason J. Corso

    Abstract: Consider mutli-goal tasks that involve static environments and dynamic goals. Examples of such tasks, such as goal-directed navigation and pick-and-place in robotics, abound. Two types of Reinforcement Learning (RL) algorithms are used for such tasks: model-free or model-based. Each of these approaches has limitations. Model-free RL struggles to transfer learned information when the goal location… ▽ More

    Submitted 4 January, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

  31. arXiv:1805.02834  [pdf, other

    cs.CV

    Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction

    Authors: Luowei Zhou, Nathan Louis, Jason J. Corso

    Abstract: We study weakly-supervised video object grounding: given a video segment and a corresponding descriptive sentence, the goal is to localize objects that are mentioned from the sentence in the video. During training, no object bounding boxes are available, but the set of possible objects to be grounded is known beforehand. Existing approaches in the image domain use Multiple Instance Learning (MIL)… ▽ More

    Submitted 20 July, 2018; v1 submitted 8 May, 2018; originally announced May 2018.

    Comments: 16 pages including Appendix

  32. arXiv:1805.00721  [pdf, ps, other

    cs.CV

    Joint Surgical Gesture and Task Classification with Multi-Task and Multimodal Learning

    Authors: Duygu Sarikaya, Khurshid A. Guru, Jason J. Corso

    Abstract: We propose a novel multi-modal and multi-task architecture for simultaneous low level gesture and surgical task classification in Robot Assisted Surgery (RAS) videos.Our end-to-end architecture is based on the principles of a long short-term memory network (LSTM) that jointly learns temporal dynamics on rich representations of visual and motion features, while simultaneously classifying activities… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

    Comments: Keywords Robot-Assisted Surgery, Surgical Gesture Classification, Multi-task Learning, Multimodal Learning, Long Short-term Recurrent Neural Networks, Convolutional Neural Networks

  33. arXiv:1804.05879  [pdf, other

    cs.CV

    M-PACT: An Open Source Platform for Repeatable Activity Classification Research

    Authors: Eric Hofesmann, Madan Ravi Ganesh, Jason J. Corso

    Abstract: There are many hurdles that prevent the replication of existing work which hinders the development of new activity classification models. These hurdles include switching between multiple deep learning libraries and the development of boilerplate experimental pipelines. We present M-PACT to overcome existing issues by removing the need to develop boilerplate code which allows users to quickly proto… ▽ More

    Submitted 5 October, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

  34. arXiv:1804.00819  [pdf, other

    cs.CV

    End-to-End Dense Video Captioning with Masked Transformer

    Authors: Luowei Zhou, Yingbo Zhou, Jason J. Corso, Richard Socher, Caiming Xiong

    Abstract: Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevents d… ▽ More

    Submitted 3 April, 2018; originally announced April 2018.

    Comments: To appear at CVPR18

  35. arXiv:1803.11147  [pdf, other

    cs.CV cs.RO

    Learning Kinematic Descriptions using SPARE: Simulated and Physical ARticulated Extendable dataset

    Authors: Abhishek Venkataraman, Brent Griffin, Jason J. Corso

    Abstract: Next generation robots will need to understand intricate and articulated objects as they cooperate in human environments. To do so, these robots will need to move beyond their current abilities--- working with relatively simple objects in a task-indifferent manner--- toward more sophisticated abilities that dynamically estimate the properties of complex, articulated objects. To that end, we make t… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

  36. arXiv:1803.08094  [pdf, other

    cs.CV

    T-RECS: Training for Rate-Invariant Embeddings by Controlling Speed for Action Recognition

    Authors: Madan Ravi Ganesh, Eric Hofesmann, Byungsu Min, Nadha Gafoor, Jason J. Corso

    Abstract: An action should remain identifiable when modifying its speed: consider the contrast between an expert chef and a novice chef each chop** an onion. Here, we expect the novice chef to have a relatively measured and slow approach to chop** when compared to the expert. In general, the speed at which actions are performed, whether slower or faster than average, should not dictate how they are reco… ▽ More

    Submitted 23 March, 2018; v1 submitted 21 March, 2018; originally announced March 2018.

  37. arXiv:1803.07218  [pdf, other

    cs.CV

    A Temporally-Aware Interpolation Network for Video Frame Inpainting

    Authors: Ximeng Sun, Ryan Szeto, Jason J. Corso

    Abstract: We propose the first deep learning solution to video frame inpainting, a challenging instance of the general video inpainting problem with applications in video editing, manipulation, and forensics. Our task is less ambiguous than frame interpolation and video prediction because we have access to both the temporal context and a partial glimpse of the future, allowing us to better evaluate the qual… ▽ More

    Submitted 3 November, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

  38. arXiv:1802.08936  [pdf, other

    cs.CV

    A Dataset To Evaluate The Representations Learned By Video Prediction Models

    Authors: Ryan Szeto, Simon Stent, German Ros, Jason J. Corso

    Abstract: We present a parameterized synthetic dataset called Moving Symbols to support the objective study of video prediction networks. Using several instantiations of the dataset in which variation is explicitly controlled, we highlight issues in an existing state-of-the-art approach and propose the use of a performance metric with greater semantic meaning to improve experimental interpretability. Our da… ▽ More

    Submitted 21 March, 2018; v1 submitted 24 February, 2018; originally announced February 2018.

    Comments: Accepted to ICLR 2018 Workshop Track. Fixed Figure 2

  39. arXiv:1802.02274  [pdf, other

    cs.RO cs.AI

    A Critical Investigation of Deep Reinforcement Learning for Navigation

    Authors: Vikas Dhiman, Shurjo Banerjee, Brent Griffin, Jeffrey M Siskind, Jason J Corso

    Abstract: The navigation problem is classically approached in two steps: an exploration step, where map-information about the environment is gathered; and an exploitation step, where this information is used to navigate efficiently. Deep reinforcement learning (DRL) algorithms, alternatively, approach the problem of navigation in an end-to-end fashion. Inspired by the classical approach, we ask whether DRL… ▽ More

    Submitted 4 January, 2019; v1 submitted 6 February, 2018; originally announced February 2018.

  40. arXiv:1802.01666  [pdf, other

    cs.CV

    Adviser Networks: Learning What Question to Ask for Human-In-The-Loop Viewpoint Estimation

    Authors: Mohamed El Banani, Jason J. Corso

    Abstract: Humans have an unparalleled visual intelligence and can overcome visual ambiguities that machines currently cannot. Recent works have shown that incorporating guidance from humans during inference for monocular viewpoint-estimation can help overcome difficult cases in which the computer-alone would have otherwise failed. These hybrid intelligence approaches are hence gaining traction. However, dec… ▽ More

    Submitted 25 October, 2018; v1 submitted 5 February, 2018; originally announced February 2018.

    Comments: 15 pages, 3 figures. Updated Acknowledgment

  41. arXiv:1801.04651  [pdf, other

    cs.CV

    Deep Net Triage: Analyzing the Importance of Network Layers via Structural Compression

    Authors: Theodore S. Nowak, Jason J. Corso

    Abstract: Despite their prevalence, deep networks are poorly understood. This is due, at least in part, to their highly parameterized nature. As such, while certain structures have been found to work better than others, the significance of a model's unique structure, or the importance of a given layer, and how these translate to overall accuracy, remains unclear. In this paper, we analyze these properties o… ▽ More

    Submitted 21 March, 2018; v1 submitted 14 January, 2018; originally announced January 2018.

  42. arXiv:1801.04340  [pdf, other

    cs.RO cs.LG

    Predicting Future Lane Changes of Other Highway Vehicles using RNN-based Deep Models

    Authors: Sajan Patel, Brent Griffin, Kristofer Kusano, Jason J. Corso

    Abstract: In the event of sensor failure, autonomous vehicles need to safely execute emergency maneuvers while avoiding other vehicles on the road. To accomplish this, the sensor-failed vehicle must predict the future semantic behaviors of other drivers, such as lane changes, as well as their future trajectories given a recent window of past sensor observations. We address the first issue of semantic behavi… ▽ More

    Submitted 16 May, 2019; v1 submitted 12 January, 2018; originally announced January 2018.

  43. arXiv:1704.08723  [pdf, other

    cs.CV

    Action Understanding with Multiple Classes of Actors

    Authors: Chenliang Xu, Caiming Xiong, Jason J. Corso

    Abstract: Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor---a human adult, ignoring the diversity of actions performed by other actors. To overcome this narrow viewpoint, our paper marks the first effort in the computer vision community to jointly consider algorithmic understanding of various types of actors undergoing variou… ▽ More

    Submitted 27 April, 2017; originally announced April 2017.

  44. arXiv:1704.05165  [pdf, other

    cs.CV

    Video Object Segmentation using Supervoxel-Based Gerrymandering

    Authors: Brent A. Griffin, Jason J. Corso

    Abstract: Pixels operate locally. Superpixels have some potential to collect information across many pixels; supervoxels have more potential by implicitly operating across time. In this paper, we explore this well established notion thoroughly analyzing how supervoxels can be used in place of and in conjunction with other means of aggregating information across space-time. Focusing on the problem of strictl… ▽ More

    Submitted 17 April, 2017; originally announced April 2017.

  45. arXiv:1703.09859  [pdf, other

    cs.CV

    Click Here: Human-Localized Keypoints as Guidance for Viewpoint Estimation

    Authors: Ryan Szeto, Jason J. Corso

    Abstract: We motivate and address a human-in-the-loop variant of the monocular viewpoint estimation task in which the location and class of one semantic object keypoint is available at test time. In order to leverage the keypoint information, we devise a Convolutional Neural Network called Click-Here CNN (CH-CNN) that integrates the keypoint information with activations from the layers that process the imag… ▽ More

    Submitted 4 August, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: To appear in ICCV 2017

  46. arXiv:1703.09788  [pdf, other

    cs.CV

    Towards Automatic Learning of Procedures from Web Instructional Videos

    Authors: Luowei Zhou, Chenliang Xu, Jason J. Corso

    Abstract: The potential for agents, whether embodied or software, to learn by observing other agents performing procedures involving objects and actions is rich. Current research on automatic procedure learning heavily relies on action labels or video subtitles, even during the evaluation phase, which makes them infeasible in real-world scenarios. This leads to our question: can the human-consensus structur… ▽ More

    Submitted 21 November, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: AAAI 2018 Camera-ready version. See http://youcook2.eecs.umich.edu for YouCook2 dataset

  47. arXiv:1612.04468  [pdf, other

    cs.CV cs.AI stat.ML

    Sparse Factorization Layers for Neural Networks with Limited Supervision

    Authors: Parker Koch, Jason J. Corso

    Abstract: Whereas CNNs have demonstrated immense progress in many vision problems, they suffer from a dependence on monumental amounts of labeled training data. On the other hand, dictionary learning does not scale to the size of problems that CNNs can handle, despite being very effective at low-level vision tasks such as denoising and inpainting. Recently, interest has grown in adapting dictionary learning… ▽ More

    Submitted 13 December, 2016; originally announced December 2016.

  48. arXiv:1606.04621  [pdf, other

    cs.CV

    Watch What You Just Said: Image Captioning with Text-Conditional Attention

    Authors: Luowei Zhou, Chenliang Xu, Parker Koch, Jason J. Corso

    Abstract: Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance. However, existing methods use only visual content as attention and whether textual context can improve attention in image captioning remains unsolved. To explore this problem, we propose a novel attention mechanism, called \textit{text-conditional attention}, which allows the caption gene… ▽ More

    Submitted 23 November, 2016; v1 submitted 14 June, 2016; originally announced June 2016.

    Comments: source code is available online

  49. arXiv:1604.03526  [pdf, other

    cs.RO cs.AI

    Spatiotemporal Articulated Models for Dynamic SLAM

    Authors: Suren Kumar, Vikas Dhiman, Madan Ravi Ganesh, Jason J. Corso

    Abstract: We propose an online spatiotemporal articulation model estimation framework that estimates both articulated structure as well as a temporal prediction model solely using passive observations. The resulting model can predict future mo- tions of an articulated object with high confidence because of the spatial and temporal structure. We demonstrate the effectiveness of the predictive model by incorp… ▽ More

    Submitted 12 April, 2016; originally announced April 2016.

  50. arXiv:1604.03130  [pdf

    cs.CY

    Video Analysis for Body-worn Cameras in Law Enforcement

    Authors: Jason J. Corso, Alexandre Alahi, Kristen Grauman, Gregory D. Hager, Louis-Philippe Morency, Harpreet Sawhney, Yaser Sheikh

    Abstract: The social conventions and expectations around the appropriate use of imaging and video has been transformed by the availability of video cameras in our pockets. The impact on law enforcement can easily be seen by watching the nightly news; more and more arrests, interventions, or even routine stops are being caught on cell phones or surveillance video, with both positive and negative consequences… ▽ More

    Submitted 7 May, 2018; v1 submitted 11 April, 2016; originally announced April 2016.

    Comments: A Computing Community Consortium (CCC) white paper, 9 pages