Skip to main content

Showing 1–23 of 23 results for author: Sevilla-Lara, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.07723  [pdf, other

    cs.CV

    Coarse or Fine? Recognising Action End States without Labels

    Authors: Davide Moltisanti, Hakan Bilen, Laura Sevilla-Lara, Frank Keller

    Abstract: We focus on the problem of recognising the end state of an action in an image, which is critical for understanding what action is performed and in which manner. We study this focusing on the task of predicting the coarseness of a cut, i.e., deciding whether an object was cut "coarsely" or "finely". No dataset with these annotated end states is available, so we propose an augmentation method to syn… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: The Eleventh Workshop on Fine-Grained Visual Categorization (CVPR 24)

  2. arXiv:2311.17776  [pdf, other

    cs.CV

    One-Shot Open Affordance Learning with Foundation Models

    Authors: Gen Li, Deqing Sun, Laura Sevilla-Lara, Varun Jampani

    Abstract: We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category, but is expected to identify novel objects and affordances. While vision-language models excel at recognizing novel objects and scenes, they often struggle to understand finer levels of granularity such as affordances. To handle this issue, we conduct a comprehensive analy… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  3. arXiv:2311.15964  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Efficient Pre-training for Localized Instruction Generation of Videos

    Authors: Anil Batra, Davide Moltisanti, Laura Sevilla-Lara, Marcus Rohrbach, Frank Keller

    Abstract: Procedural videos, exemplified by recipe demonstrations, are instrumental in conveying step-by-step instructions. However, understanding such videos is challenging as it involves the precise localization of steps and the generation of textual instructions. Manually annotating steps and writing instructions is costly, which limits the size of current datasets and hinders effective learning. Leverag… ▽ More

    Submitted 23 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: updated version

  4. arXiv:2310.06522  [pdf, other

    cs.LG cs.CV

    Watt For What: Rethinking Deep Learning's Energy-Performance Relationship

    Authors: Shreyank N Gowda, Xinyue Hao, Gen Li, Laura Sevilla-Lara, Shashank Narayana Gowda

    Abstract: Deep learning models have revolutionized various fields, from image recognition to natural language processing, by achieving unprecedented levels of accuracy. However, their increasing energy consumption has raised concerns about their environmental impact, disadvantaging smaller entities in research and exacerbating global energy consumption. In this paper, we explore the trade-off between model… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  5. arXiv:2309.17327  [pdf, other

    cs.CV

    Telling Stories for Common Sense Zero-Shot Action Recognition

    Authors: Shreyank N Gowda, Laura Sevilla-Lara

    Abstract: Video understanding has long suffered from reliance on large labeled datasets, motivating research into zero-shot learning. Recent progress in language modeling presents opportunities to advance zero-shot video analysis, but constructing an effective semantic space relating action classes remains challenging. We address this by introducing a novel dataset, Stories, which contains rich textual desc… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

  6. arXiv:2303.15086  [pdf, other

    cs.CV

    Learning Action Changes by Measuring Verb-Adverb Textual Relationships

    Authors: Davide Moltisanti, Frank Keller, Hakan Bilen, Laura Sevilla-Lara

    Abstract: The goal of this work is to understand the way actions are performed in videos. That is, given a video, we aim to predict an adverb indicating a modification applied to the action (e.g. cut "finely"). We cast this problem as a regression task. We measure textual relationships between verbs and adverbs to generate a regression target representing the action change we aim to learn. We test our appro… ▽ More

    Submitted 23 May, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: CVPR 23. Version 2 updates some results due to an errata (see code repository for more details). Code and dataset available at https://github.com/dmoltisanti/air-cvpr23

  7. arXiv:2303.09665  [pdf, other

    cs.CV

    LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

    Authors: Gen Li, Varun Jampani, Deqing Sun, Laura Sevilla-Lara

    Abstract: Humans excel at acquiring knowledge through observation. For example, we can learn to use new tools by watching demonstrations. This skill is fundamental for intelligent systems to interact with the world. A key step to acquire this skill is to identify what part of the object affords each action, which is called affordance grounding. In this paper, we address this problem and propose a framework… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: CVPR 2023, Project page: https://reagan1311.github.io/locate/, Video: https://www.youtube.com/watch?v=RLHansdFxII

  8. arXiv:2210.04933  [pdf, other

    cs.CV

    An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

    Authors: Kiyoon Kim, Davide Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara

    Abstract: Precisely naming the action depicted in a video can be a challenging and oftentimes ambiguous task. In contrast to object instances represented as nouns (e.g. dog, cat, chair, etc.), in the case of actions, human annotators typically lack a consensus as to what constitutes a specific action (e.g. jogging versus running). In practice, a given video can contain multiple valid positive annotations fo… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  9. arXiv:2209.15501  [pdf, other

    cs.CV

    A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos

    Authors: Anil Batra, Shreyank N Gowda, Frank Keller, Laura Sevilla-Lara

    Abstract: Understanding the steps required to perform a task is an important skill for AI systems. Learning these steps from instructional videos involves two subproblems: (i) identifying the temporal boundary of sequentially occurring segments and (ii) summarizing these steps in natural language. We refer to this task as Procedure Segmentation and Summarization (PSS). In this paper, we take a closer look a… ▽ More

    Submitted 7 October, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Accepted at BMVC 2022

  10. arXiv:2206.04790  [pdf, other

    cs.CV

    Learn2Augment: Learning to Composite Videos for Data Augmentation in Action Recognition

    Authors: Shreyank N Gowda, Marcus Rohrbach, Frank Keller, Laura Sevilla-Lara

    Abstract: We address the problem of data augmentation for video action recognition. Standard augmentation strategies in video are hand-designed and sample the space of possible augmented data points either at random, without knowing which augmented points will be better, or through heuristics. We propose to learn what makes a good video for action recognition and select only high-quality samples for augment… ▽ More

    Submitted 23 July, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted to ECCV-2022

  11. arXiv:2201.10394  [pdf, other

    cs.CV

    Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition

    Authors: Kiyoon Kim, Shreyank N Gowda, Oisin Mac Aodha, Laura Sevilla-Lara

    Abstract: We address the problem of capturing temporal information for video classification in 2D networks, without increasing their computational cost. Existing approaches focus on modifying the architecture of 2D networks (e.g. by including filters in the temporal dimension to turn them into 3D networks, or using optical flow, etc.), which increases computation cost. Instead, we propose a novel sampling s… ▽ More

    Submitted 10 October, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: BMVC 2022

  12. arXiv:2107.13029  [pdf, other

    cs.CV

    A New Split for Evaluating True Zero-Shot Action Recognition

    Authors: Shreyank N Gowda, Laura Sevilla-Lara, Kiyoon Kim, Frank Keller, Marcus Rohrbach

    Abstract: Zero-shot action recognition is the task of classifying action categories that are not available in the training set. In this setting, the standard evaluation protocol is to use existing action recognition datasets(e.g. UCF101) and randomly split the classes into seen and unseen. However, most recent work builds on representations pre-trained on the Kinetics dataset, where classes largely overlap… ▽ More

    Submitted 13 September, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

    Comments: Accepted to GCPR 2021

  13. arXiv:2104.01893  [pdf, other

    cs.CV

    Adaptive Prototype Learning and Allocation for Few-Shot Segmentation

    Authors: Gen Li, Varun Jampani, Laura Sevilla-Lara, Deqing Sun, Jonghyun Kim, Joongkyu Kim

    Abstract: Prototype learning is extensively used for few-shot segmentation. Typically, a single prototype is obtained from the support feature by averaging the global object information. However, using one prototype to represent all the information may lead to ambiguities. In this paper, we propose two novel modules, named superpixel-guided clustering (SGC) and guided prototype allocation (GPA), for multipl… ▽ More

    Submitted 16 May, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

    Comments: Accepted to CVPR2021

  14. arXiv:2101.07042  [pdf, other

    cs.CV

    CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition

    Authors: Shreyank N Gowda, Laura Sevilla-Lara, Frank Keller, Marcus Rohrbach

    Abstract: Zero-shot action recognition is the task of recognizingaction classes without visual examples, only with a seman-tic embedding which relates unseen to seen classes. Theproblem can be seen as learning a function which general-izes well to instances of unseen classes without losing dis-crimination between classes. Neural networks can modelthe complex boundaries between visual classes, which ex-plain… ▽ More

    Submitted 23 July, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: Accepted to ECCV-22

  15. arXiv:2012.10671  [pdf, other

    cs.CV

    SMART Frame Selection for Action Recognition

    Authors: Shreyank N Gowda, Marcus Rohrbach, Laura Sevilla-Lara

    Abstract: Action recognition is computationally expensive. In this paper, we address the problem of frame selection to improve the accuracy of action recognition. In particular, we show that selecting good frames helps in action recognition performance even in the trimmed videos domain. Recent work has successfully leveraged frame selection for long, untrimmed videos, where much of the content is not releva… ▽ More

    Submitted 19 December, 2020; originally announced December 2020.

    Comments: To be published in AAAI-21

  16. arXiv:2005.13039  [pdf, other

    cs.CV

    ALBA : Reinforcement Learning for Video Object Segmentation

    Authors: Shreyank N Gowda, Panagiotis Eustratiadis, Timothy Hospedales, Laura Sevilla-Lara

    Abstract: We consider the challenging problem of zero-shot video object segmentation (VOS). That is, segmenting and tracking multiple moving objects within a video fully automatically, without any manual initialization. We treat this as a grou** problem by exploiting object proposals and making a joint inference about grou** over both space and time. We propose a network architecture for tractably perfo… ▽ More

    Submitted 14 August, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  17. arXiv:2004.11051   

    cs.CV cs.AI

    Proceedings of the ICLR Workshop on Computer Vision for Agriculture (CV4A) 2020

    Authors: Yannis Kalantidis, Laura Sevilla-Lara, Ernest Mwebaze, Dina Machuve, Hamed Alemohammad, David Guerena

    Abstract: This is the proceedings of the Computer Vision for Agriculture (CV4A) Workshop that was held in conjunction with the International Conference on Learning Representations (ICLR) 2020. The Computer Vision for Agriculture (CV4A) 2020 workshop was scheduled to be held in Addis Ababa, Ethiopia, on April 26th, 2020. It was held virtually that same day due to the COVID-19 pandemic. The workshop was hel… ▽ More

    Submitted 17 May, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

    Comments: 14 papers accepted, 4 as oral, 10 as spotlights

  18. arXiv:1907.08340  [pdf, other

    cs.CV cs.LG

    Only Time Can Tell: Discovering Temporal Data for Temporal Modeling

    Authors: Laura Sevilla-Lara, Shengxin Zha, Zhicheng Yan, Vedanuj Goswami, Matt Feiszli, Lorenzo Torresani

    Abstract: Understanding temporal information and how the visual world changes over time is a fundamental ability of intelligent systems. In video understanding, temporal information is at the core of many current challenges, including compression, efficient inference, motion estimation or summarization. However, in current video datasets it has been observed that action classes can often be recognized witho… ▽ More

    Submitted 29 October, 2019; v1 submitted 18 July, 2019; originally announced July 2019.

  19. arXiv:1906.04226  [pdf, other

    cs.CV

    FASTER Recurrent Networks for Efficient Video Classification

    Authors: Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang

    Abstract: Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel fra… ▽ More

    Submitted 8 September, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

  20. arXiv:1901.03460  [pdf, other

    cs.CV

    DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

    Authors: Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan

    Abstract: Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming. Recent works directly leverage the motion vectors and residuals readily available in the compressed video to represent motion at no cost. While this avoids flow computation, it also hurts accuracy since the motion vector is… ▽ More

    Submitted 7 May, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

    Comments: Accepted by CVPR'19

  21. arXiv:1712.08416  [pdf, other

    cs.CV

    On the Integration of Optical Flow and Action Recognition

    Authors: Laura Sevilla-Lara, Yiyi Liao, Fatma Guney, Varun Jampani, Andreas Geiger, Michael J. Black

    Abstract: Most of the top performing action recognition methods use optical flow as a "black box" input. Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better. In particular, we investigate the impact of different flow algorithms and input transformations t… ▽ More

    Submitted 22 December, 2017; originally announced December 2017.

  22. arXiv:1705.01352  [pdf, other

    cs.CV

    Optical Flow in Mostly Rigid Scenes

    Authors: Jonas Wulff, Laura Sevilla-Lara, Michael J. Black

    Abstract: The optical flow of natural scenes is a combination of the motion of the observer and the independent motion of objects. Existing algorithms typically focus on either recovering motion and structure under the assumption of a purely static world or optical flow for general unconstrained scenes. We combine these approaches in an optical flow algorithm that estimates an explicit segmentation of movin… ▽ More

    Submitted 3 May, 2017; originally announced May 2017.

    Comments: 15 pages, 10 figures; accepted for publication at CVPR 2017

  23. arXiv:1603.03911  [pdf, other

    cs.CV

    Optical Flow with Semantic Segmentation and Localized Layers

    Authors: Laura Sevilla-Lara, Deqing Sun, Varun Jampani, Michael J. Black

    Abstract: Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow. In reality, optical flow varies across an image depending on object class. Simply put, different objects move differently. Here we exploit recent advances in static semantic scene segmentation to segment the image into objects of different types. We define different models of ima… ▽ More

    Submitted 11 April, 2016; v1 submitted 12 March, 2016; originally announced March 2016.