Skip to main content

Showing 51–100 of 108 results for author: Snoek, C G M

.
  1. arXiv:2108.03329  [pdf, other

    cs.CV

    Feature-Supervised Action Modality Transfer

    Authors: Fida Mohammad Thoker, Cees G. M. Snoek

    Abstract: This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality many large-scale labeled datasets have been made available. They have become the de facto pre-training choice when recognizing or detecting new actions from RGB d… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: IEEE International Conference on Pattern Recognition (ICPR), 2020

  2. arXiv:2107.05757  [pdf, other

    cs.LG cs.AI

    Kernel Continual Learning

    Authors: Mohammad Mahdi Derakhshani, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: This paper introduces kernel continual learning, a simple but effective variant of continual learning that leverages the non-parametric nature of kernel methods to tackle catastrophic forgetting. We deploy an episodic memory unit that stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression. This does not require memory replay and systematically… ▽ More

    Submitted 14 July, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: accepted to ICML 2021

  3. arXiv:2107.01125  [pdf, other

    eess.IV cs.CV

    On Measuring and Controlling the Spectral Bias of the Deep Image Prior

    Authors: Zenglin Shi, Pascal Mettes, Subhransu Maji, Cees G. M. Snoek

    Abstract: The deep image prior showed that a randomly initialized network with a suitable architecture can be trained to solve inverse imaging problems by simply optimizing it's parameters to reconstruct a single degraded image. However, it suffers from two practical limitations. First, it remains unclear how to control the prior beyond the choice of the network architecture. Second, training requires an or… ▽ More

    Submitted 30 December, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: IJCV 2022; Spectral bias; Deep image prior; 24 pages

  4. arXiv:2106.13919  [pdf, other

    cs.LG

    Pruning Edges and Gradients to Learn Hypergraphs from Larger Sets

    Authors: David W. Zhang, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: This paper aims for set-to-hypergraph prediction, where the goal is to infer the set of relations for a given set of entities. This is a common abstraction for applications in particle physics, biological systems, and combinatorial optimization. We address two common scaling problems encountered in set-to-hypergraph tasks that limit the size of the input set: the exponentially growing number of hy… ▽ More

    Submitted 16 January, 2023; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: A previous version was named "Recurrently Predicting Hypergraphs". See https://github.com/davzha/recurrently_predicting_hypergraphs for code

  5. arXiv:2106.02960  [pdf, other

    cs.CL

    Meta-Learning with Variational Semantic Memory for Word Sense Disambiguation

    Authors: Yingjun Du, Nithin Holla, Xiantong Zhen, Cees G. M. Snoek, Ekaterina Shutova

    Abstract: A critical challenge faced by supervised word sense disambiguation (WSD) is the lack of large annotated datasets with sufficient coverage of words in their diversity of senses. This inspired recent research on few-shot WSD using meta-learning. While such work has successfully applied meta-learning to learn new word senses from very few examples, its performance still lags behind its fully supervis… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

    Comments: 15 pages, 5 figures

    Journal ref: ACL-IJCNLP 2021

  6. Unsharp Mask Guided Filtering

    Authors: Zenglin Shi, Yunlu Chen, Efstratios Gavves, Pascal Mettes, Cees G. M. Snoek

    Abstract: The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering by means of an additional guidance image. Where classical guided filters transfer structures using hand-designed functions, recent guided filters have been considerably advanced through parametric learning of deep networks. The state-of-the-art leverages deep networks to estimat… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: IEEE Transactions on Image Processing, 2021

  7. arXiv:2105.06668  [pdf, other

    cs.CV

    Attentional Prototype Inference for Few-Shot Segmentation

    Authors: Haoliang Sun, Xiankai Lu, Haochen Wang, Yilong Yin, Xiantong Zhen, Cees G. M. Snoek, Ling Shao

    Abstract: This paper aims to address few-shot segmentation. While existing prototype-based methods have achieved considerable success, they suffer from uncertainty and ambiguity caused by limited labeled examples. In this work, we propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype o… ▽ More

    Submitted 29 May, 2023; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: Pattern Recognition Journal

  8. arXiv:2105.04030  [pdf, other

    cs.LG

    A Bit More Bayesian: Domain-Invariant Learning with Uncertainty

    Authors: Zehao Xiao, Jiayi Shen, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: Domain generalization is challenging due to the domain shift and the uncertainty caused by the inaccessibility of target domain data. In this paper, we address both challenges with a probabilistic framework based on variational Bayesian inference, by incorporating uncertainty into neural network weights. We couple domain invariance in a probabilistic formula with the variational Bayesian inference… ▽ More

    Submitted 14 July, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: accepted to ICML 2021

  9. arXiv:2105.03781  [pdf, other

    cs.LG cs.CV

    MetaKernel: Learning Variational Random Features with Limited Labels

    Authors: Yingjun Du, Haoliang Sun, Xiantong Zhen, Jun Xu, Yilong Yin, Ling Shao, Cees G. M. Snoek

    Abstract: Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. The crux of few-shot learning is to extract prior knowledge from related tasks to enable fast adaptation to a new task with a limited amount of data. In this paper, we propose meta-learning kernels with random Fourier features for few-shot… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

    Comments: 19 pages,7 figures. arXiv admin note: substantial text overlap with arXiv:2006.06707

  10. arXiv:2105.01646  [pdf, other

    cs.CV

    Motion-Augmented Self-Training for Video Recognition at Smaller Scale

    Authors: Kirill Gavrilyuk, Mihir Jain, Ilia Karmanov, Cees G. M. Snoek

    Abstract: The goal of this paper is to self-train a 3D convolutional neural network on an unlabeled video collection for deployment on small-scale video collections. As smaller video datasets benefit more from motion than appearance, we strive to train our network using optical flow, but avoid its computation during inference. We propose the first motion-augmented self-training regime, we call MotionFit. We… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

  11. arXiv:2104.11721  [pdf, other

    cs.CV

    Safe Fakes: Evaluating Face Anonymizers for Face Detectors

    Authors: Sander R. Klomp, Matthew van Rijn, Rob G. J. Wijnhoven, Cees G. M. Snoek, Peter H. N. de With

    Abstract: Since the introduction of the GDPR and CCPA legislation, both public and private facial image datasets are increasingly scrutinized. Several datasets have been taken offline completely and some have been anonymized. However, it is unclear how anonymization impacts face detection performance. To our knowledge, this paper presents the first empirical study on the effect of image anonymization on sup… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    ACM Class: I.5.4

  12. arXiv:2104.04715  [pdf, other

    cs.CV

    Object Priors for Classifying and Localizing Unseen Actions

    Authors: Pascal Mettes, William Thong, Cees G. M. Snoek

    Abstract: This work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples. Where existing work relies on transferring global attribute or object information from seen to unseen action videos, we seek to classify and spatio-temporally localize unseen actions in videos from image-based object information only. We propose three spat… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

    Comments: Accepted to IJCV

  13. arXiv:2104.02439  [pdf, other

    cs.CV

    Few-Shot Transformation of Common Actions into Time and Space

    Authors: Pengwan Yang, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper introduces the task of few-shot common action localization in time and space. Given a few trimmed support videos containing the same but unknown action, we strive for spatio-temporal localization of that action in a long untrimmed query video. We do not require any class labels, interval bounds, or bounding boxes. To address this challenging task, we introduce a novel few-shot transform… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  14. arXiv:2104.00996  [pdf, other

    cs.CV

    LiftPool: Bidirectional ConvNet Pooling

    Authors: Jiaojiao Zhao, Cees G. M. Snoek

    Abstract: Pooling is a critical operation in convolutional neural networks for increasing receptive fields and improving robustness to input variations. Most existing pooling operations downsample the feature maps, which is a lossy process. Moreover, they are not invertible: upsampling a downscaled feature map can not recover the lost information in the downsampling. By adopting the philosophy of the classi… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: published on ICLR 2021

  15. arXiv:2104.00969  [pdf, other

    cs.CV

    TubeR: Tubelet Transformer for Video Action Detection

    Authors: Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Shuai Bing, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, Ivan Marsic, Cees G. M. Snoek, Joseph Tighe

    Abstract: We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly detect an action tubelet in a video by simultaneously performing action localization and recognition from a single representation. TubeR learns… ▽ More

    Submitted 10 May, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: Accepted at CVPR 2022 (Oral)

  16. arXiv:2103.13096  [pdf, other

    cs.CV

    Repetitive Activity Counting by Sight and Sound

    Authors: Yunhua Zhang, Ling Shao, Cees G. M. Snoek

    Abstract: This paper strives for repetitive activity counting in videos. Different from existing works, which all analyze the visual video content only, we incorporate for the first time the corresponding sound into the repetition counting process. This benefits accuracy in challenging vision conditions such as occlusion, dramatic camera view changes, low resolution, etc. We propose a model that starts with… ▽ More

    Submitted 17 April, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted at CVPR 2021

  17. arXiv:2010.10341  [pdf, other

    cs.LG

    Learning to Learn Variational Semantic Memory

    Authors: Xiantong Zhen, Yingjun Du, Huan Xiong, Qiang Qiu, Cees G. M. Snoek, Ling Shao

    Abstract: In this paper, we introduce variational semantic memory into meta-learning to acquire long-term knowledge for few-shot learning. The variational semantic memory accrues and stores semantic information for the probabilistic inference of class prototypes in a hierarchical Bayesian framework. The semantic memory is grown from scratch and gradually consolidated by absorbing information from tasks it e… ▽ More

    Submitted 14 July, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: accepted to NeurIPS 2020; code is available in https://github.com/YDU-uva/VSM

  18. arXiv:2010.04109  [pdf, other

    cs.LG stat.ML

    Set Prediction without Imposing Structure as Conditional Density Estimation

    Authors: David W. Zhang, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Set prediction is about learning to predict a collection of unordered variables with unknown interrelations. Training such models with set losses imposes the structure of a metric space over sets. We focus on stochastic and underdefined cases, where an incorrectly chosen loss function leads to implausible predictions. Example tasks include conditional point-cloud reconstruction and predicting futu… ▽ More

    Submitted 21 February, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

  19. arXiv:2008.11185  [pdf, other

    cs.CV

    Bias-Awareness for Zero-Shot Learning the Seen and Unseen

    Authors: William Thong, Cees G. M. Snoek

    Abstract: Generalized zero-shot learning recognizes inputs from both seen and unseen classes. Yet, existing methods tend to be biased towards the classes seen during training. In this paper, we strive to mitigate this bias. We propose a bias-aware learner to map inputs to a semantic embedding space for generalized zero-shot learning. During training, the model learns to regress to real-valued class prototyp… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: Accepted at British Machine Vision Conference (BMVC) 2020

  20. arXiv:2008.06374  [pdf, other

    cs.CV

    PointMixup: Augmentation for Point Clouds

    Authors: Yunlu Chen, Vincent Tao Hu, Efstratios Gavves, Thomas Mensink, Pascal Mettes, Pengwan Yang, Cees G. M. Snoek

    Abstract: This paper introduces data augmentation for point clouds by interpolation between examples. Data augmentation by interpolation has shown to be a simple and effective approach in the image domain. Such a mixup is however not directly transferable to point clouds, as we do not have a one-to-one correspondence between the points of two different objects. In this paper, we define data augmentation bet… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

    Comments: Accepted as Spotlight presentation at European Conference on Computer Vision (ECCV), 2020

  21. arXiv:2008.05826  [pdf, other

    cs.CV cs.LG eess.IV

    Localizing the Common Action Among a Few Videos

    Authors: Pengwan Yang, Vincent Tao Hu, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper strives to localize the temporal extent of an action in a long untrimmed video. Where existing work leverages many examples with their start, their ending, and/or the class of the action during training time, we propose few-shot common action localization. The start and end of an action in a long untrimmed video is determined based on just a hand-full of trimmed video examples containin… ▽ More

    Submitted 25 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  22. arXiv:2007.07645  [pdf, other

    cs.CV

    Learning to Learn with Variational Information Bottleneck for Domain Generalization

    Authors: Yingjun Du, Jun Xu, Huan Xiong, Qiang Qiu, Xiantong Zhen, Cees G. M. Snoek, Ling Shao

    Abstract: Domain generalization models learn to generalize to previously unseen domains, but suffer from prediction uncertainty and domain shift. In this paper, we address both problems. We introduce a probabilistic meta-learning model for domain generalization, in which classifier parameters shared across domains are modeled as distributions. This enables better handling of prediction uncertainty on unseen… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: 15 pages, 4 figures, ECCV2020

  23. arXiv:2003.12737  [pdf, other

    cs.CV

    Actor-Transformers for Group Activity Recognition

    Authors: Kirill Gavrilyuk, Ryan Sanford, Mehrsan Javan, Cees G. M. Snoek

    Abstract: This paper strives to recognize individual actions and group activities from videos. While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. We feed the transformer with rich actor-… ▽ More

    Submitted 28 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

  24. arXiv:2003.07833  [pdf, other

    cs.CV

    Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

    Authors: Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao

    Abstract: Zero-shot learning strives to classify unseen categories for which no data is available during training. In the generalized variant, the test samples can further belong to seen or unseen categories. The state-of-the-art relies on Generative Adversarial Networks that synthesize unseen class features by leveraging class-specific semantic embeddings. During training, they generate semantically consis… ▽ More

    Submitted 18 July, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: Accepted for publication at ECCV 2020

  25. arXiv:2003.05065  [pdf, other

    cs.CV

    Cloth in the Wind: A Case Study of Physical Measurement through Simulation

    Authors: Tom F. H. Runia, Kirill Gavrilyuk, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: For many of the physical phenomena around us, we have developed sophisticated models explaining their behavior. Nevertheless, measuring physical properties from visual observations is challenging due to the high number of causally underlying physical parameters -- including material properties and external forces. In this paper, we propose to measure latent physical properties for cloth in the win… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: CVPR 2020. arXiv admin note: substantial text overlap with arXiv:1910.07861

  26. arXiv:1911.08621  [pdf, other

    cs.CV

    Open Cross-Domain Visual Search

    Authors: William Thong, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper addresses cross-domain visual search, where visual queries retrieve category samples from a different domain. For example, we may want to sketch an airplane and retrieve photographs of airplanes. Despite considerable progress, the search occurs in a closed setting between two pre-defined domains. In this paper, we make the step towards an open setting where multiple visual domains are a… ▽ More

    Submitted 28 July, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: Accepted at Computer Vision and Image Understanding (CVIU)

  27. arXiv:1910.07861  [pdf, other

    cs.CV

    Go with the Flow: Perception-refined Physics Simulation

    Authors: Tom F. H. Runia, Kirill Gavrilyuk, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: For many of the physical phenomena around us, we have developed sophisticated models explaining their behavior. Nevertheless, inferring specifics from visual observations is challenging due to the high number of causally underlying physical parameters -- including material properties and external forces. This paper addresses the problem of inferring such latent physical properties from observation… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

  28. arXiv:1904.05404  [pdf, other

    cs.CV cs.LG

    Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on n-Spheres

    Authors: Shuai Liao, Efstratios Gavves, Cees G. M. Snoek

    Abstract: Many computer vision challenges require continuous outputs, but tend to be solved by discrete classification. The reason is classification's natural containment within a probability $n$-simplex, as defined by the popular softmax activation function. Regular regression lacks such a closed geometry, leading to unstable training and convergence to suboptimal local minima. Starting from this insight w… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: CVPR 2019 camera ready

  29. arXiv:1904.01421  [pdf, other

    cs.CV

    Cooperative Embeddings for Instance, Attribute and Category Retrieval

    Authors: William Thong, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: The goal of this paper is to retrieve an image based on instance, attribute and category similarity notions. Different from existing works, which usually address only one of these entities in isolation, we introduce a cooperative embedding to integrate them while preserving their specific level of semantic representation. An algebraic structure defines a superspace filled with instances. Attribute… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

  30. arXiv:1904.00696  [pdf, other

    cs.CV

    Dance with Flow: Two-in-One Stream Action Detection

    Authors: Jiaojiao Zhao, Cees G. M. Snoek

    Abstract: The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, whic… ▽ More

    Submitted 11 June, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: Accepted by CVPR2019

  31. arXiv:1903.12206  [pdf, other

    cs.CV

    Counting with Focus for Free

    Authors: Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper aims to count arbitrary objects in images. The leading counting approaches start from point annotations per object from which they construct density maps. Then, their training objective transforms input images to density maps through deep convolutional networks. We posit that the point annotations serve more supervision purposes than just constructing density maps. We introduce ways to… ▽ More

    Submitted 6 August, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

    Comments: ICCV, 2019

  32. arXiv:1901.10889  [pdf, other

    cs.CV

    Pixelated Semantic Colorization

    Authors: Jiaojiao Zhao, Jungong Han, Ling Shao, Cees G. M. Snoek

    Abstract: While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from limited semantic understanding. To address this shortcoming, we propose to exploit pixelated object semantics to guide image colorization. The rationale is that human beings perceive and distinguish colors based on the semantic catego… ▽ More

    Submitted 7 February, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

  33. arXiv:1901.10514  [pdf, other

    cs.LG stat.ML

    Hyperspherical Prototype Networks

    Authors: Pascal Mettes, Elise van der Pol, Cees G. M. Snoek

    Abstract: This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces. For classification, a common approach is to define prototypes as the mean output vector over training examples per class. Here, we propose to use hyperspheres as output spaces, with class prototypes defined a priori with large margin separation. We pos… ▽ More

    Submitted 25 October, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: NeurIPS 2019

  34. arXiv:1901.10364  [pdf, other

    cs.CV

    Anomaly Locality in Video Surveillance

    Authors: Federico Landi, Cees G. M. Snoek, Rita Cucchiara

    Abstract: This paper strives for the detection of real-world anomalies such as burglaries and assaults in surveillance videos. Although anomalies are generally local, as they happen in a limited portion of the frame, none of the previous works on the subject has ever studied the contribution of locality. In this work, we explore the impact of considering spatiotemporal tubes instead of whole-frame video seg… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

    Comments: Submitted to International Conference on Image Processing, 2019

  35. arXiv:1808.01597  [pdf, other

    cs.CV

    Pixel-level Semantics Guided Image Colorization

    Authors: Jiaojiao Zhao, Li Liu, Cees G. M. Snoek, Jungong Han, Ling Shao

    Abstract: While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from the problems of context confusion and edge color bleeding. To address context confusion, we propose to incorporate the pixel-level object semantics to guide the image colorization. The rationale is that human beings perceive and disti… ▽ More

    Submitted 5 August, 2018; originally announced August 2018.

  36. arXiv:1807.06980  [pdf, other

    cs.CV

    Video Time: Properties, Encoders and Evaluation

    Authors: Amir Ghodrati, Efstratios Gavves, Cees G. M. Snoek

    Abstract: Time-aware encoding of frame sequences in a video is a fundamental problem in video understanding. While many attempted to model time in videos, an explicit study on quantifying video time is missing. To fill this lacuna, we aim to evaluate video time explicitly. We describe three properties of video time, namely a) temporal asymmetry, b)temporal continuity and c) temporal causality. Based on each… ▽ More

    Submitted 18 July, 2018; originally announced July 2018.

    Comments: 14 pages, BMVC 2018

  37. arXiv:1807.02800  [pdf, other

    cs.CV

    Spatio-Temporal Instance Learning: Action Tubes from Class Supervision

    Authors: Pascal Mettes, Cees G. M. Snoek

    Abstract: The goal of this work is spatio-temporal action localization in videos, using only the supervision from video-level class labels. The state-of-the-art casts this weakly-supervised action localization regime as a Multiple Instance Learning problem, where instances are a priori computed spatio-temporal proposals. Rather than disconnecting the spatio-temporal learning from the training, we propose Sp… ▽ More

    Submitted 21 November, 2018; v1 submitted 8 July, 2018; originally announced July 2018.

  38. arXiv:1806.06984  [pdf, other

    cs.CV

    Repetition Estimation

    Authors: Tom F. H. Runia, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: Visual repetition is ubiquitous in our world. It appears in human activity (sports, cooking), animal behavior (a bee's waggle dance), natural phenomena (leaves in the wind) and in urban environments (flashing lights). Estimating visual repetition from realistic video is challenging as periodic motion is rarely perfectly static and stationary. To better deal with realistic video, we elevate the sta… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

  39. Pointly-Supervised Action Localization

    Authors: Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper strives for spatio-temporal localization of human actions in videos. In the literature, the consensus is to achieve localization by training on bounding box annotations provided for each frame of each training video. As annotating boxes in video is expensive, cumbersome and error-prone, we propose to bypass box-supervision. Instead, we introduce action localization based on point-superv… ▽ More

    Submitted 1 October, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: International Journal of Computer Vision, 2018

  40. arXiv:1803.07485  [pdf, other

    cs.CV

    Actor and Action Video Segmentation from a Sentence

    Authors: Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G. M. Snoek

    Abstract: This paper strives for pixel-level segmentation of actors and their actions in video content. Different from existing works, which all learn to segment from a fixed vocabulary of actor and action pairs, we infer the segmentation from a natural language input sentence. This allows to distinguish between fine-grained actors in the same super-category, identify actor and action instances, and segment… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

    Comments: Accepted to CVPR 2018 as oral

  41. arXiv:1802.09971  [pdf, other

    cs.CV

    Real-World Repetition Estimation by Div, Grad and Curl

    Authors: Tom F. H. Runia, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: We consider the problem of estimating repetition in video, such as performing push-ups, cutting a melon or playing violin. Existing work shows good results under the assumption of static and stationary periodicity. As realistic video is rarely perfectly static and stationary, the often preferred Fourier-based measurements is inapt. Instead, we adopt the wavelet transform to better handle non-stati… ▽ More

    Submitted 27 February, 2018; originally announced February 2018.

  42. arXiv:1801.10253  [pdf, other

    cs.CL cs.IR cs.MM

    The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval

    Authors: Spencer Cappallo, Stacey Svetlichnaya, Pierre Garrigues, Thomas Mensink, Cees G. M. Snoek

    Abstract: Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages. We propose to treat these ideograms as a new modality in their own right, distinct in their semantic structure from both the text in which they are often embedded as well as the images which they resemble. As a new modality, emoji present rich novel… ▽ More

    Submitted 2 February, 2018; v1 submitted 30 January, 2018; originally announced January 2018.

  43. Predicting Visual Features from Text for Image and Video Caption Retrieval

    Authors: Jianfeng Dong, Xirong Li, Cees G. M. Snoek

    Abstract: This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute \emph{Word2VisualVec}, a deep neural network architecture that learns to pre… ▽ More

    Submitted 14 July, 2018; v1 submitted 5 September, 2017; originally announced September 2017.

    Comments: Accepted by Transaction on Multimedia. Code is available at https://github.com/danieljf24/w2vv

  44. arXiv:1707.09145  [pdf, other

    cs.CV

    Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions

    Authors: Pascal Mettes, Cees G. M. Snoek

    Abstract: We aim for zero-shot localization and classification of human actions in video. Where traditional approaches rely on global attribute or object classification scores for their zero-shot knowledge transfer, our main contribution is a spatial-aware object embedding. To arrive at spatial awareness, we build our embedding on top of freely available actor and object detectors. Relevance of objects is d… ▽ More

    Submitted 28 July, 2017; originally announced July 2017.

    Comments: ICCV

    Report number: ICCV/2017/10

  45. arXiv:1707.09143  [pdf, other

    cs.CV

    Localizing Actions from Video Labels and Pseudo-Annotations

    Authors: Pascal Mettes, Cees G. M. Snoek, Shih-Fu Chang

    Abstract: The goal of this paper is to determine the spatio-temporal location of actions in video. Where training from hard to obtain box annotations is the norm, we propose an intuitive and effective algorithm that localizes actions from their class label only. We are inspired by recent work showing that unsupervised action proposals selected with human point-supervision perform as well as using expensive… ▽ More

    Submitted 28 July, 2017; originally announced July 2017.

    Comments: BMVC

    Report number: BMVC/2017/09

  46. arXiv:1612.06753  [pdf, other

    cs.IR cs.MM

    Video Stream Retrieval of Unseen Queries using Semantic Memory

    Authors: Spencer Cappallo, Thomas Mensink, Cees G. M. Snoek

    Abstract: Retrieval of live, user-broadcast video streams is an under-addressed and increasingly relevant challenge. The on-line nature of the problem requires temporal evaluation and the unforeseeable scope of potential queries motivates an approach which can accommodate arbitrary search queries. To account for the breadth of possible queries, we adopt a no-example approach to query retrieval, which uses a… ▽ More

    Submitted 20 December, 2016; originally announced December 2016.

    Comments: Presented at BMVC 2016, British Machine Vision Conference, 2016

  47. arXiv:1610.01801  [pdf, other

    cs.CV

    Searching Scenes by Abstracting Things

    Authors: Svetlana Kordumova, Jan C. van Gemert, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: In this paper we propose to represent a scene as an abstraction of 'things'. We start from 'things' as generated by modern object proposals, and we investigate their immediately observable properties: position, size, aspect ratio and color, and those only. Where the recent successes and excitement of the field lie in object identification, we represent the scene composition independent of object i… ▽ More

    Submitted 6 October, 2016; originally announced October 2016.

  48. arXiv:1607.02003  [pdf, other

    cs.CV

    Tubelets: Unsupervised action proposals from spatiotemporal super-voxels

    Authors: Mihir Jain, Jan van Gemert, Hervé Jégou, Patrick Bouthemy, Cees G. M. Snoek

    Abstract: This paper considers the problem of localizing actions in videos as a sequences of bounding boxes. The objective is to generate action proposals that are likely to include the action of interest, ideally achieving high recall with few proposals. Our contributions are threefold. First, inspired by selective search for object proposals, we introduce an approach to generate action proposals from spat… ▽ More

    Submitted 7 July, 2016; originally announced July 2016.

    Comments: submitted to International Journal of Computer Vision

  49. arXiv:1607.01794  [pdf, other

    cs.CV

    VideoLSTM Convolves, Attends and Flows for Action Recognition

    Authors: Zhenyang Li, Efstratios Gavves, Mihir Jain, Cees G. M. Snoek

    Abstract: We present a new architecture for end-to-end sequence learning of actions in video, we call VideoLSTM. Rather than adapting the video to the peculiarities of established recurrent or convolutional architectures, we adapt the architecture to fit the requirements of the video medium. Starting from the soft-Attention LSTM, VideoLSTM makes three novel contributions. First, video has a spatial layout.… ▽ More

    Submitted 6 July, 2016; originally announced July 2016.

  50. arXiv:1604.07602  [pdf, other

    cs.CV

    Spot On: Action Localization from Pointly-Supervised Proposals

    Authors: Pascal Mettes, Jan C. van Gemert, Cees G. M. Snoek

    Abstract: We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier trained on carefully annotated box annotations. Annotating action boxes in video is cumbersome, tedious, and error prone. Rather than annotating boxes, we propose to annotate actions in video with points on a sparse subset of frame… ▽ More

    Submitted 25 July, 2016; v1 submitted 26 April, 2016; originally announced April 2016.

    Report number: ECCV/2016/10