Skip to main content

Showing 1–14 of 14 results for author: Dvornik, N

.
  1. arXiv:2311.01444  [pdf, other

    cs.CV cs.RO

    LabelFormer: Object Trajectory Refinement for Offboard Perception from LiDAR Point Clouds

    Authors: Anqi Joyce Yang, Sergio Casas, Nikita Dvornik, Sean Segal, Yuwen Xiong, Jordan Sir Kwang Hu, Carter Fang, Raquel Urtasun

    Abstract: A major bottleneck to scaling-up training of self-driving perception systems are the human annotations required for supervision. A promising alternative is to leverage "auto-labelling" offboard perception models that are trained to automatically generate annotations from raw LiDAR point clouds at a fraction of the cost. Auto-labels are most commonly generated via a two-stage approach -- first obje… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 20 pages, 8 figures, 7 tables

    Journal ref: CoRL 2023

  2. arXiv:2310.08312  [pdf, other

    cs.CV cs.LG

    GePSAn: Generative Procedure Step Anticipation in Cooking Videos

    Authors: Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations i… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: published at ICCV 2023

  3. arXiv:2305.17565  [pdf, other

    cs.CV cs.RO

    Self-Supervised Learning of Action Affordances as Interaction Modes

    Authors: Liquan Wang, Nikita Dvornik, Rafael Dubeau, Mayank Mittal, Animesh Garg

    Abstract: When humans perform a task with an articulated object, they interact with the object only in a handful of ways, while the space of all possible interactions is nearly endless. This is because humans have prior knowledge about what interactions are likely to be successful, i.e., to open a new door we first try the handle. While learning such priors without supervision is easy for humans, it is noto… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Journal ref: 2023 International Conference on Robotics and Automation

  4. arXiv:2304.13265  [pdf, other

    cs.CV

    StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

    Authors: Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson

    Abstract: Instructional videos are an important resource to learn procedural tasks from human demonstrations. However, the instruction steps in such videos are typically short and sparse, with most of the video being irrelevant to the procedure. This motivates the need to temporally localize the instruction steps in such videos, i.e. the task called key-step localization. Traditional methods for key-step lo… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: CVPR'23

  5. arXiv:2211.00113  [pdf, other

    cs.LG cs.CV

    SAGE: Saliency-Guided Mixup with Optimal Rearrangements

    Authors: Avery Ma, Nikita Dvornik, Ran Zhang, Leila Pishdad, Konstantinos G. Derpanis, Afsaneh Fazly

    Abstract: Data augmentation is a key element for training accurate models by reducing overfitting and improving generalization. For image classification, the most popular data augmentation techniques range from simple photometric and geometrical transformations, to more complex methods that use visual saliency to craft new training examples. As augmentation methods get more complex, their ability to increas… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Accepted at British Machine Vision Conference (BMVC) 2022. Code: https://github.com/SamsungLabs/SAGE

  6. arXiv:2210.05861  [pdf, other

    cs.CV cs.AI cs.LG

    SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models

    Authors: Ziyi Wu, Nikita Dvornik, Klaus Greff, Thomas Kipf, Animesh Garg

    Abstract: Understanding dynamics from visual observations is a challenging problem that requires disentangling individual objects from the scene and learning their interactions. While recent object-centric models can successfully decompose a scene into objects, modeling their dynamics effectively still remains a challenge. We address this problem by introducing SlotFormer -- a Transformer-based autoregressi… ▽ More

    Submitted 20 January, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted by ICLR 2023. Project page: https://slotformer.github.io/

  7. arXiv:2210.04996  [pdf, other

    cs.CV cs.AI

    Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization

    Authors: Nikita Dvornik, Isma Hadji, Hai Pham, Dhaivat Bhatt, Brais Martinez, Afsaneh Fazly, Allan D. Jepson

    Abstract: In this work, we consider the problem of weakly-supervised multi-step localization in instructional videos. An established approach to this problem is to rely on a given list of steps. However, in reality, there is often more than one way to execute a procedure successfully, by following the set of steps in slightly varying orders. Thus, for successful localization in a given video, recent works r… ▽ More

    Submitted 31 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: ECCV'22, oral

    Journal ref: ECCV 2022

  8. arXiv:2205.02300  [pdf, other

    cs.CV

    P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision

    Authors: He Zhao, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Richard P. Wildes, Allan D. Jepson

    Abstract: In this paper, we study the problem of procedure planning in instructional videos. Here, an agent must produce a plausible sequence of actions that can transform the environment from a given start to a desired goal state. When learning procedure planning from instructional videos, most recent work leverages intermediate visual observations as supervision, which requires expensive annotation effort… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted as an oral paper at CVPR 2022

  9. arXiv:2108.11996  [pdf, other

    cs.CV

    Drop-DTW: Aligning Common Signal Between Sequences While Drop** Outliers

    Authors: Nikita Dvornik, Isma Hadji, Konstantinos G. Derpanis, Animesh Garg, Allan D. Jepson

    Abstract: In this work, we consider the problem of sequence-to-sequence alignment for signals containing outliers. Assuming the absence of outliers, the standard Dynamic Time War** (DTW) algorithm efficiently computes the optimal alignment between two (generally) variable-length sequences. While DTW is robust to temporal shifts and dilations of the signal, it fails to align sequences in a meaningful way i… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

  10. arXiv:2003.09338  [pdf, other

    cs.CV

    Selecting Relevant Features from a Multi-domain Representation for Few-shot Classification

    Authors: Nikita Dvornik, Cordelia Schmid, Julien Mairal

    Abstract: Popular approaches for few-shot classification consist of first learning a generic data representation based on a large annotated dataset, before adapting the representation to new classes given only a few labeled samples. In this work, we propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches. First, we obtain a mult… ▽ More

    Submitted 20 July, 2020; v1 submitted 20 March, 2020; originally announced March 2020.

    Comments: ECCV'20

  11. arXiv:1903.11341  [pdf, other

    cs.CV cs.AI

    Diversity with Cooperation: Ensemble Methods for Few-Shot Classification

    Authors: Nikita Dvornik, Cordelia Schmid, Julien Mairal

    Abstract: Few-shot classification consists of learning a predictive model that is able to effectively adapt to a new class, given only a few annotated samples. To solve this challenging problem, meta-learning has become a popular paradigm that advocates the ability to "learn to adapt". Recent works have shown, however, that simple learning strategies without meta-learning could be competitive. In this paper… ▽ More

    Submitted 30 August, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

    Comments: Added experiments for different network architectures across different input image resolutions

  12. arXiv:1809.02492  [pdf, other

    cs.CV

    On the Importance of Visual Context for Data Augmentation in Scene Understanding

    Authors: Nikita Dvornik, Julien Mairal, Cordelia Schmid

    Abstract: Performing data augmentation for learning deep neural networks is known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. While simple image transformations can already improve predictive performance in most vision tasks, larger gains can be obtained by leveraging task-spec… ▽ More

    Submitted 19 September, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: Updated the experimental section. arXiv admin note: substantial text overlap with arXiv:1807.07428

  13. arXiv:1807.07428  [pdf, other

    cs.CV

    Modeling Visual Context is Key to Augmenting Object Detection Datasets

    Authors: Nikita Dvornik, Julien Mairal, Cordelia Schmid

    Abstract: Performing data augmentation for learning deep neural networks is well known to be important for training visual recognition systems. By artificially increasing the number of training examples, it helps reducing overfitting and improves generalization. For object detection, classical approaches for data augmentation consist of generating images obtained by basic geometrical transformations and col… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

    Journal ref: ECCV2018, Sep 2018, Munich, Germany. 2018

  14. arXiv:1708.02813  [pdf, other

    cs.CV

    BlitzNet: A Real-Time Deep Network for Scene Understanding

    Authors: Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid

    Abstract: Real-time scene understanding has become crucial in many applications such as autonomous driving. In this paper, we propose a deep architecture, called BlitzNet, that jointly performs object detection and semantic segmentation in one forward pass, allowing real-time computations. Besides the computational gain of having a single network to perform several tasks, we show that object detection and s… ▽ More

    Submitted 9 August, 2017; originally announced August 2017.