Skip to main content

Showing 1–46 of 46 results for author: Derpanis, K G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.02233  [pdf, other

    cs.CV

    Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models

    Authors: Matthew Kowal, Richard P. Wildes, Konstantinos G. Derpanis

    Abstract: Understanding what deep network models capture in their learned representations is a fundamental challenge in computer vision. We present a new methodology to understanding such vision models, the Visual Concept Connectome (VCC), which discovers human interpretable concepts and their interlayer connections in a fully unsupervised manner. Our approach simultaneously reveals fine-grained concepts at… ▽ More

    Submitted 10 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 (Highlight)

  2. arXiv:2402.17986  [pdf, other

    cs.CV

    PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis

    Authors: Jason J. Yu, Tristan Aumentado-Armstrong, Fereshteh Forghani, Konstantinos G. Derpanis, Marcus A. Brubaker

    Abstract: This paper considers the problem of generative novel view synthesis (GNVS), generating novel, plausible views of a scene given a limited number of known views. Here, we propose a set-based generative model that can simultaneously generate multiple, self-consistent new views, conditioned on any number of views. Our approach is not limited to generating a single image at a time and can condition on… ▽ More

    Submitted 18 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  3. arXiv:2401.10831  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Understanding Video Transformers via Universal Concept Discovery

    Authors: Matthew Kowal, Achal Dave, Rares Ambrus, Adrien Gaidon, Konstantinos G. Derpanis, Pavel Tokmakov

    Abstract: This paper studies the problem of concept-based interpretability of transformer representations for videos. Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered. Prior research on concept-based interpretability has concentrated solely on image-level tasks. Comparatively, video models deal wit… ▽ More

    Submitted 10 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: CVPR 2024 (Highlight)

  4. arXiv:2310.17880  [pdf, other

    cs.CV

    Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations

    Authors: Tristan Aumentado-Armstrong, Ashkan Mirzaei, Marcus A. Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor Gilitschenski

    Abstract: Neural Radiance Fields (NeRFs) have proven to be powerful 3D representations, capable of high quality novel view synthesis of complex scenes. While NeRFs have been applied to graphics, vision, and robotics, problems with slow rendering speed and characteristic visual artifacts prevent adoption in many use cases. In this work, we investigate combining an autoencoder (AE) with a NeRF, in which laten… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    ACM Class: I.2.10

  5. arXiv:2310.08312  [pdf, other

    cs.CV cs.LG

    GePSAn: Generative Procedure Step Anticipation in Cooking Videos

    Authors: Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations i… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: published at ICCV 2023

  6. arXiv:2309.08826  [pdf, other

    cs.CV

    Dual-Camera Joint Deblurring-Denoising

    Authors: Shayan Shekarforoush, Amanpreet Walia, Marcus A. Brubaker, Konstantinos G. Derpanis, Alex Levinshtein

    Abstract: Recent image enhancement methods have shown the advantages of using a pair of long and short-exposure images for low-light photography. These image modalities offer complementary strengths and weaknesses. The former yields an image that is clean but blurry due to camera or object motion, whereas the latter is sharp but noisy due to low photon count. Motivated by the fact that modern smartphones co… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Project webpage: http://shekshaa.github.io/Joint-Deblurring-Denoising/

  7. arXiv:2308.08947  [pdf, other

    cs.CV

    Watch Your Steps: Local Image and Scene Editing by Text Instructions

    Authors: Ashkan Mirzaei, Tristan Aumentado-Armstrong, Marcus A. Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor Gilitschenski

    Abstract: Denoising diffusion models have enabled high-quality image generation and editing. We present a method to localize the desired edit region implicit in a text instruction. We leverage InstructPix2Pix (IP2P) and identify the discrepancy between IP2P predictions with and without the instruction. This discrepancy is referred to as the relevance map. The relevance map conveys the importance of changing… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Project page: https://ashmrz.github.io/WatchYourSteps/

    Journal ref: European Conference on Computer Vision (ECCV) 2024

  8. arXiv:2304.13265  [pdf, other

    cs.CV

    StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

    Authors: Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson

    Abstract: Instructional videos are an important resource to learn procedural tasks from human demonstrations. However, the instruction steps in such videos are typically short and sparse, with most of the video being irrelevant to the procedure. This motivates the need to temporally localize the instruction steps in such videos, i.e. the task called key-step localization. Traditional methods for key-step lo… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: CVPR'23

  9. arXiv:2304.10700  [pdf, other

    cs.CV

    Long-Term Photometric Consistent Novel View Synthesis with Diffusion Models

    Authors: Jason J. Yu, Fereshteh Forghani, Konstantinos G. Derpanis, Marcus A. Brubaker

    Abstract: Novel view synthesis from a single input image is a challenging task, where the goal is to generate a new view of a scene from a desired camera pose that may be separated by a large motion. The highly uncertain nature of this synthesis task due to unobserved elements within the scene (i.e. occlusion) and outside the field-of-view makes the use of generative models appealing to capture the variety… ▽ More

    Submitted 21 August, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Project page: https://yorkucvil.github.io/Photoconsistent-NVS/

  10. arXiv:2304.09677  [pdf, other

    cs.CV

    Reference-guided Controllable Inpainting of Neural Radiance Fields

    Authors: Ashkan Mirzaei, Tristan Aumentado-Armstrong, Marcus A. Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor Gilitschenski

    Abstract: The popularity of Neural Radiance Fields (NeRFs) for view synthesis has led to a desire for NeRF editing tools. Here, we focus on inpainting regions in a view-consistent and controllable manner. In addition to the typical NeRF inputs and masks delineating the unwanted region in each view, we require only a single inpainted view of the scene, i.e., a reference view. We use monocular depth estimator… ▽ More

    Submitted 20 April, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: Project Page: https://ashmrz.github.io/reference-guided-3d

  11. arXiv:2211.12254  [pdf, other

    cs.CV

    SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

    Authors: Ashkan Mirzaei, Tristan Aumentado-Armstrong, Konstantinos G. Derpanis, Jonathan Kelly, Marcus A. Brubaker, Igor Gilitschenski, Alex Levinshtein

    Abstract: Neural Radiance Fields (NeRFs) have emerged as a popular approach for novel view synthesis. While NeRFs are quickly being adapted for a wider set of applications, intuitively editing NeRF scenes is still an open challenge. One important editing task is the removal of unwanted objects from a 3D scene, such that the replaced region is visually plausible and consistent with its context. We refer to t… ▽ More

    Submitted 15 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Project Page: https://spinnerf3d.github.io

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  12. arXiv:2211.01783  [pdf, other

    cs.CV

    Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks

    Authors: Matthew Kowal, Mennatullah Siam, Md Amirul Islam, Neil D. B. Bruce, Richard P. Wildes, Konstantinos G. Derpanis

    Abstract: There is limited understanding of the information captured by deep spatiotemporal models in their intermediate representations. For example, while evidence suggests that action recognition algorithms are heavily influenced by visual appearance in single frames, no quantitative methodology exists for evaluating such static bias in the latent representation compared to bias toward dynamics. We tackl… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.02846

  13. arXiv:2211.00113  [pdf, other

    cs.LG cs.CV

    SAGE: Saliency-Guided Mixup with Optimal Rearrangements

    Authors: Avery Ma, Nikita Dvornik, Ran Zhang, Leila Pishdad, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: Data augmentation is a key element for training accurate models by reducing overfitting and improving generalization. For image classification, the most popular data augmentation techniques range from simple photometric and geometrical transformations, to more complex methods that use visual saliency to craft new training examples. As augmentation methods get more complex, their ability to increas… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Accepted at British Machine Vision Conference (BMVC) 2022. Code: https://github.com/SamsungLabs/SAGE

  14. arXiv:2206.02846  [pdf, other

    cs.CV

    A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information

    Authors: Matthew Kowal, Mennatullah Siam, Md Amirul Islam, Neil D. B. Bruce, Richard P. Wildes, Konstantinos G. Derpanis

    Abstract: Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been observed that action recognition algorithms are heavily influenced by visual appearance in single static… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: CVPR 2022

  15. arXiv:2205.02300  [pdf, other

    cs.CV

    P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision

    Authors: He Zhao, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Richard P. Wildes, Allan D. Jepson

    Abstract: In this paper, we study the problem of procedure planning in instructional videos. Here, an agent must produce a plausible sequence of actions that can transform the environment from a given start to a desired goal state. When learning procedure planning from instructional videos, most recent work leverages intermediate visual observations as supervision, which requires expensive annotation effort… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted as an oral paper at CVPR 2022

  16. arXiv:2204.09268  [pdf, other

    cs.LG cs.CL cs.CV cs.IR

    Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations

    Authors: Leila Pishdad, Ran Zhang, Konstantinos G. Derpanis, Allan Jepson, Afsaneh Fazly

    Abstract: Probabilistic embeddings have proven useful for capturing polysemous word meanings, as well as ambiguity in image matching. In this paper, we study the advantages of probabilistic embeddings in a cross-modal setting (i.e., text and images), and propose a simple approach that replaces the standard vector point embeddings in extant image-text matching models with probabilistic distributions that are… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: 13 pages, 7 figures

  17. Semantic keypoint-based pose estimation from single RGB frames

    Authors: Karl Schmeckpeper, Philip R. Osteen, Yufu Wang, Georgios Pavlakos, Kenneth Chaney, Wyatt Jordan, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: This paper presents an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. Unlike prior investigators, we are agnostic to whether the object is textured or textureless, as the convnet learns the optimal representation from the available training-… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: https://sites.google.com/view/rcta-object-keypoints-dataset/home. arXiv admin note: substantial text overlap with arXiv:1703.04670

    Journal ref: Field Robotics, 2, 147-171, 2022

  18. arXiv:2203.14308  [pdf, other

    cs.CV

    Temporal Transductive Inference for Few-Shot Video Object Segmentation

    Authors: Mennatullah Siam, Konstantinos G. Derpanis, Richard P. Wildes

    Abstract: Few-shot video object segmentation (FS-VOS) aims at segmenting video frames using a few labelled examples of classes not seen during initial training. In this paper, we present a simple but effective temporal transductive inference (TTI) approach that leverages temporal consistency in the unlabelled video frames during few-shot inference. Key to our approach is the use of both global and local tem… ▽ More

    Submitted 16 July, 2023; v1 submitted 27 March, 2022; originally announced March 2022.

    Comments: IJCV submission under review

  19. arXiv:2110.10335  [pdf, other

    cs.CV

    Simpler Does It: Generating Semantic Labels with Objectness Guidance

    Authors: Md Amirul Islam, Matthew Kowal, Sen Jia, Konstantinos G. Derpanis, Neil D. B. Bruce

    Abstract: Existing weakly or semi-supervised semantic segmentation methods utilize image or box-level supervision to generate pseudo-labels for weakly labeled images. However, due to the lack of strong supervision, the generated pseudo-labels are often noisy near the object boundaries, which severely impacts the network's ability to learn strong representations. To address this problem, we present a novel f… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: BMVC 2021

  20. arXiv:2108.11996  [pdf, other

    cs.CV

    Drop-DTW: Aligning Common Signal Between Sequences While Drop** Outliers

    Authors: Nikita Dvornik, Isma Hadji, Konstantinos G. Derpanis, Animesh Garg, Allan D. Jepson

    Abstract: In this work, we consider the problem of sequence-to-sequence alignment for signals containing outliers. Assuming the absence of outliers, the standard Dynamic Time War** (DTW) algorithm efficiently computes the optimal alignment between two (generally) variable-length sequences. While DTW is robust to temporal shifts and dilations of the signal, it fails to align sequences in a meaningful way i… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

  21. arXiv:2108.09929  [pdf, other

    cs.CV

    SegMix: Co-occurrence Driven Mixup for Semantic Segmentation and Adversarial Robustness

    Authors: Md Amirul Islam, Matthew Kowal, Konstantinos G. Derpanis, Neil D. B. Bruce

    Abstract: In this paper, we present a strategy for training convolutional neural networks to effectively resolve interference arising from competing hypotheses relating to inter-categorical information throughout the network. The premise is based on the notion of feature binding, which is defined as the process by which activations spread across space and layers in the network are successfully integrated to… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: Under submission at IJCV (BMVC 2020 Extension). arXiv admin note: substantial text overlap with arXiv:2008.05667

  22. arXiv:2108.07884  [pdf, other

    cs.CV

    Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs

    Authors: Md Amirul Islam, Matthew Kowal, Sen Jia, Konstantinos G. Derpanis, Neil D. B. Bruce

    Abstract: In this paper, we challenge the common assumption that collapsing the spatial dimensions of a 3D (spatial-channel) tensor in a convolutional neural network (CNN) into a vector via global pooling removes all spatial information. Specifically, we demonstrate that positional information is encoded based on the ordering of the channel dimensions, while semantic information is largely not. Following th… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  23. arXiv:2105.05217  [pdf, other

    cs.CV

    Representation Learning via Global Temporal Alignment and Cycle-Consistency

    Authors: Isma Hadji, Konstantinos G. Derpanis, Allan D. Jepson

    Abstract: We introduce a weakly supervised method for representation learning based on aligning temporal sequences (e.g., videos) of the same process (e.g., human action). The main idea is to use the global temporal ordering of latent correspondences across sequence pairs as a supervisory signal. In particular, we propose a loss based on scoring the optimal sequence alignment to train an embedding network.… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: accepted to CVPR 2021

  24. arXiv:2105.04551  [pdf, other

    cs.CV

    Stochastic Image-to-Video Synthesis using cINNs

    Authors: Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis, Björn Ommer

    Abstract: Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame. This naturally suggests a bij… ▽ More

    Submitted 17 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR 2021

  25. arXiv:2101.12322  [pdf, other

    cs.CV

    Position, Padding and Predictions: A Deeper Look at Position Information in CNNs

    Authors: Md Amirul Islam, Matthew Kowal, Sen Jia, Konstantinos G. Derpanis, Neil D. B. Bruce

    Abstract: In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. An implication of this is that a filter may know what it is looking at, but not where it is positioned in the image. In this paper, we first test this hypothesis and reveal that a surprising degree of absolute position info… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

  26. arXiv:2101.11604  [pdf, other

    cs.CV

    Shape or Texture: Understanding Discriminative Features in CNNs

    Authors: Md Amirul Islam, Matthew Kowal, Patrick Esser, Sen Jia, Bjorn Ommer, Konstantinos G. Derpanis, Neil Bruce

    Abstract: Contrasting the previous evidence that neurons in the later layers of a Convolutional Neural Network (CNN) respond to complex object shapes, recent studies have shown that CNNs actually exhibit a `texture bias': given an image with both texture and shape cues (e.g., a stylized image), a CNN is biased towards predicting the category corresponding to the texture. However, these previous studies cond… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Comments: Accepted to ICLR 2021

  27. arXiv:2011.08026  [pdf, other

    cs.CV cs.LG

    Cycle-Consistent Generative Rendering for 2D-3D Modality Translation

    Authors: Tristan Aumentado-Armstrong, Alex Levinshtein, Stavros Tsogkas, Konstantinos G. Derpanis, Allan D. Jepson

    Abstract: For humans, visual understanding is inherently generative: given a 3D shape, we can postulate how it would look in the world; given a 2D image, we can infer the 3D structure that likely gave rise to it. We can thus translate between the 2D visual and 3D structural modalities of a given object. In the context of computer vision, this corresponds to a learnable module that serves two purposes: (i) g… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: 3DV 2020 (oral). Project page: https://ttaa9.github.io/genren/

    ACM Class: I.2.10; I.2.6

  28. arXiv:2010.13821  [pdf, other

    cs.CV cs.LG

    Wavelet Flow: Fast Training of High Resolution Normalizing Flows

    Authors: Jason J. Yu, Konstantinos G. Derpanis, Marcus A. Brubaker

    Abstract: Normalizing flows are a class of probabilistic generative models which allow for both fast density computation and efficient sampling and are effective at modelling complex distributions like images. A drawback among current methods is their significant training cost, sometimes requiring months of GPU training time to achieve state-of-the-art results. This paper introduces Wavelet Flow, a multi-sc… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: Manuscript appendix images compressed with JPEG to meet arXiv size limits. Visit the project page for PNG versions: https://yorkucvil.github.io/Wavelet-Flow

  29. arXiv:2008.05667  [pdf, other

    cs.CV

    Feature Binding with Category-Dependant MixUp for Semantic Segmentation and Adversarial Robustness

    Authors: Md Amirul Islam, Matthew Kowal, Konstantinos G. Derpanis, Neil D. B. Bruce

    Abstract: In this paper, we present a strategy for training convolutional neural networks to effectively resolve interference arising from competing hypotheses relating to inter-categorical information throughout the network. The premise is based on the notion of feature binding, which is defined as the process by which activation's spread across space and layers in the network are successfully integrated t… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: Accepted to BMVC 2020 (Oral)

  30. arXiv:2003.11596  [pdf, other

    eess.IV cs.CV

    Learning Multi-Scale Photo Exposure Correction

    Authors: Mahmoud Afifi, Konstantinos G. Derpanis, Björn Ommer, Michael S. Brown

    Abstract: Capturing photographs with wrong exposures remains a major source of errors in camera-based imaging. Exposure problems are categorized as either: (i) overexposed, where the camera exposure was too long, resulting in bright and washed-out image regions, or (ii) underexposed, where the exposure was too short, resulting in dark regions. Both under- and overexposure greatly reduce the contrast and vis… ▽ More

    Submitted 30 March, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

    Comments: CVPR 2021

  31. arXiv:1904.08245  [pdf, other

    cs.CV

    End-to-End Learning of Representations for Asynchronous Event-Based Data

    Authors: Daniel Gehrig, Antonio Loquercio, Konstantinos G. Derpanis, Davide Scaramuzza

    Abstract: Event cameras are vision sensors that record asynchronous streams of per-pixel brightness changes, referred to as "events". They have appealing advantages over frame-based cameras for computer vision, including high temporal resolution, high dynamic range, and no motion blur. Due to the sparse, non-uniform spatiotemporal layout of the event signal, pattern recognition algorithms typically aggregat… ▽ More

    Submitted 20 August, 2019; v1 submitted 17 April, 2019; originally announced April 2019.

    Comments: To appear at ICCV 2019

  32. arXiv:1904.05869  [pdf, other

    cs.LG cs.CV stat.ML

    Keyframing the Future: Keyframe Discovery for Visual Prediction and Planning

    Authors: Karl Pertsch, Oleh Rybkin, **gyun Yang, Shenghao Zhou, Konstantinos G. Derpanis, Kostas Daniilidis, Joseph Lim, Andrew Jaegle

    Abstract: Temporal observations such as videos contain essential information about the dynamics of the underlying scene, but they are often interleaved with inessential, predictable details. One way of dealing with this problem is by focusing on the most informative moments in a sequence. We propose a model that learns to discover these important events and the times when they occur and uses them to represe… ▽ More

    Submitted 7 May, 2020; v1 submitted 11 April, 2019; originally announced April 2019.

    Comments: Conference on Learning for Dynamics and Control, 2020. Website: https://sites.google.com/view/keyin/home

  33. arXiv:1901.05376  [pdf, other

    cs.CV

    Joint Spatial and Layer Attention for Convolutional Networks

    Authors: Tony Joseph, Konstantinos G. Derpanis, Faisal Z. Qureshi

    Abstract: In this paper, we propose a novel approach that learns to sequentially attend to different Convolutional Neural Networks (CNN) layers (i.e., ``what'' feature abstraction to attend to) and different spatial locations of the selected feature map (i.e., ``where'') to perform the task at hand. Specifically, at each Recurrent Neural Network (RNN) step, both a CNN layer and localized spatial region with… ▽ More

    Submitted 31 May, 2019; v1 submitted 16 January, 2019; originally announced January 2019.

  34. arXiv:1806.09655  [pdf, other

    cs.LG cs.CV stat.ML

    Learning what you can do before doing anything

    Authors: Oleh Rybkin, Karl Pertsch, Konstantinos G. Derpanis, Kostas Daniilidis, Andrew Jaegle

    Abstract: Intelligent agents can learn to represent the action spaces of other agents simply by observing them act. Such representations help agents quickly learn to predict the effects of their own actions on the environment and to plan complex action sequences. In this work, we address the problem of learning an agent's action space purely from visual observation. We use stochastic video prediction to lea… ▽ More

    Submitted 12 February, 2019; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: Published at ICLR 2019. 10 pages + 15 pages of references and appendices

    Journal ref: International Conference on Learning Representations, 2019

  35. arXiv:1805.01358  [pdf, other

    cs.CV

    SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning

    Authors: Titus Cieslewski, Konstantinos G. Derpanis, Davide Scaramuzza

    Abstract: A wide range of computer vision algorithms rely on identifying sparse interest points in images and establishing correspondences between them. However, only a subset of the initially identified interest points results in true correspondences (inliers). In this paper, we seek a detector that finds the minimum number of points that are likely to result in an application-dependent "sufficient" number… ▽ More

    Submitted 19 August, 2019; v1 submitted 3 May, 2018; originally announced May 2018.

    Comments: 8 pages, 2p references, 1p supplementary material. Accepted for publication at the IEEE International Conference on 3D Vision (3DV), Québec City, 2019. v2 contains significant changes VS v1

    Journal ref: IEEE International Conference on 3D Vision (3DV), Québec City, 2019

  36. arXiv:1803.09760  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Predicting the Future with Transformational States

    Authors: Andrew Jaegle, Oleh Rybkin, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: An intelligent observer looks at the world and sees not only what is, but what is moving and what can be moved. In other words, the observer sees how the present state of the world can transform in the future. We propose a model that predicts future images by learning to represent the present state and its transformation given only a sequence of images. To do so, we introduce an architecture with… ▽ More

    Submitted 26 March, 2018; originally announced March 2018.

    Comments: 24 pages, including supplement

  37. arXiv:1708.04607  [pdf, other

    cs.CV

    Segmentation-Aware Convolutional Networks Using Local Attention Masks

    Authors: Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos

    Abstract: We introduce an approach to integrate segmentation information within a convolutional neural network (CNN). This counter-acts the tendency of CNNs to smooth information across regions and increases their spatial precision. To obtain segmentation information, we set up a CNN to provide an embedding space where region co-membership can be estimated based on Euclidean distance. We use these embedding… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

  38. arXiv:1706.06982  [pdf, other

    cs.CV

    Two-Stream Convolutional Networks for Dynamic Texture Synthesis

    Authors: Matthew Tesfaldet, Marcus A. Brubaker, Konstantinos G. Derpanis

    Abstract: We introduce a two-stream model for dynamic texture synthesis. Our model is based on pre-trained convolutional networks (ConvNets) that target two independent tasks: (i) object recognition, and (ii) optical flow prediction. Given an input dynamic texture, statistics of filter responses from the object recognition ConvNet encapsulate the per-frame appearance of the input texture, while statistics o… ▽ More

    Submitted 12 April, 2018; v1 submitted 21 June, 2017; originally announced June 2017.

    Comments: In proc. CVPR 2018. Full results available at https://ryersonvisionlab.github.io/two-stream-projpage/

  39. arXiv:1704.04793  [pdf, other

    cs.CV

    Harvesting Multiple Views for Marker-less 3D Human Pose Annotations

    Authors: Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: Recent advances with Convolutional Networks (ConvNets) have shifted the bottleneck for many computer vision tasks to annotated data collection. In this paper, we present a geometry-driven approach to automatically collect annotations for human pose prediction tasks. Starting from a generic ConvNet for 2D human pose, and assuming a multi-view setup, we describe an automatic way to collect accurate… ▽ More

    Submitted 16 April, 2017; originally announced April 2017.

    Comments: CVPR 2017 Camera Ready

  40. arXiv:1703.04670  [pdf, other

    cs.CV cs.RO

    6-DoF Object Pose from Semantic Keypoints

    Authors: Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: This paper presents a novel approach to estimating the continuous six degree of freedom (6-DoF) pose (3D translation and rotation) of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. Unlike prior work, we are agnostic to whether the object is textured or textureless, as the convnet learns the o… ▽ More

    Submitted 14 March, 2017; originally announced March 2017.

    Comments: IEEE International Conference on Robotics and Automation (ICRA), 2017

  41. Building Usage Profiles Using Deep Neural Nets

    Authors: Domenic Curro, Konstantinos G. Derpanis, Andriy V. Miranskyy

    Abstract: To improve software quality, one needs to build test scenarios resembling the usage of a software product in the field. This task is rendered challenging when a product's customer base is large and diverse. In this scenario, existing profiling approaches, such as operational profiling, are difficult to apply. In this work, we consider publicly available video tutorials of a product to profile usag… ▽ More

    Submitted 23 February, 2017; originally announced February 2017.

    Journal ref: Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track (ICSE-NIER '17). IEEE Press, Piscataway, NJ, USA, 43-46, 2017

  42. arXiv:1701.02354  [pdf, other

    cs.CV

    MonoCap: Monocular Human Motion Capture using a CNN Coupled with a Geometric Prior

    Authors: Xiaowei Zhou, Menglong Zhu, Georgios Pavlakos, Spyridon Leonardos, Kostantinos G. Derpanis, Kostas Daniilidis

    Abstract: Recovering 3D full-body human pose is a challenging problem with many applications. It has been successfully addressed by motion capture systems with body worn markers and multiple cameras. In this paper, we address the more challenging case of not only using a single camera but also not leveraging markers: going directly from 2D appearance to 3D geometry. Deep learning approaches have shown remar… ▽ More

    Submitted 9 March, 2018; v1 submitted 9 January, 2017; originally announced January 2017.

    Comments: Accepted by PAMI. Extended version of the following paper: Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video. X Zhou, M Zhu, S Leonardos, K Derpanis, K Daniilidis. CVPR 2016. arXiv admin note: substantial text overlap with arXiv:1511.09439

  43. arXiv:1611.07828  [pdf, other

    cs.CV

    Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

    Authors: Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis

    Abstract: This paper addresses the challenge of 3D human pose estimation from a single color image. Despite the general success of the end-to-end learning paradigm, top performing approaches employ a two-step solution consisting of a Convolutional Network (ConvNet) for 2D joint localization and a subsequent optimization step to recover 3D pose. In this paper, we identify the representation of 3D pose as a c… ▽ More

    Submitted 26 July, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: CVPR 2017 Camera Ready. Project Page: https://www.seas.upenn.edu/~pavlakos/projects/volumetric/

  44. arXiv:1608.05842  [pdf, other

    cs.CV

    Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness

    Authors: Jason J. Yu, Adam W. Harley, Konstantinos G. Derpanis

    Abstract: Recently, convolutional networks (convnets) have proven useful for predicting optical flow. Much of this success is predicated on the availability of large datasets that require expensive and involved data acquisition and laborious la- beling. To bypass these challenges, we propose an unsuper- vised approach (i.e., without leveraging groundtruth flow) to train a convnet end-to-end for predicting o… ▽ More

    Submitted 20 August, 2016; originally announced August 2016.

  45. arXiv:1511.04377  [pdf, other

    cs.CV

    Learning Dense Convolutional Embeddings for Semantic Segmentation

    Authors: Adam W. Harley, Konstantinos G. Derpanis, Iasonas Kokkinos

    Abstract: This paper proposes a new deep convolutional neural network (DCNN) architecture that learns pixel embeddings, such that pairwise distances between the embeddings can be used to infer whether or not the pixels lie on the same region. That is, for any two pixels on the same object, the embeddings are trained to be similar; for any pair that straddles an object boundary, the embeddings are trained to… ▽ More

    Submitted 7 January, 2016; v1 submitted 13 November, 2015; originally announced November 2015.

  46. arXiv:1502.07058  [pdf, other

    cs.CV cs.IR cs.LG cs.NE

    Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

    Authors: Adam W. Harley, Alex Ufkes, Konstantinos G. Derpanis

    Abstract: This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs). In object and scene analysis, deep neural nets are capable of learning a hierarchical chain of abstraction from pixel inputs to concise and descriptive representations. The current work explores this capacity in the realm of document analy… ▽ More

    Submitted 25 February, 2015; originally announced February 2015.