Skip to main content

Showing 51–100 of 119 results for author: Snoek, C

.
  1. arXiv:2110.14336  [pdf, other

    cs.CV

    Feature and Label Embedding Spaces Matter in Addressing Image Classifier Bias

    Authors: William Thong, Cees G. M. Snoek

    Abstract: This paper strives to address image classifier bias, with a focus on both feature and label embedding spaces. Previous works have shown that spurious correlations from protected attributes, such as age, gender, or skin tone, can cause adverse decisions. To balance potential harms, there is a growing need to identify and mitigate image classifier bias. First, we identify in the feature space a bias… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted at British Machine Vision Conference (BMVC) 2021

  2. arXiv:2110.13110  [pdf, other

    cs.CV

    Diagnosing Errors in Video Relation Detectors

    Authors: Shuo Chen, Pascal Mettes, Cees G. M. Snoek

    Abstract: Video relation detection forms a new and challenging problem in computer vision, where subjects and objects need to be localized spatio-temporally and a predicate label needs to be assigned if and only if there is an interaction between the two. Despite recent progress in video relation detection, overall performance is still marginal and it remains unclear what the key factors are towards solving… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: BMVC 2021

  3. arXiv:2108.08363  [pdf, other

    cs.CV

    Social Fabric: Tubelet Compositions for Video Relation Detection

    Authors: Shuo Chen, Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper strives to classify and detect the relationship between object tubelets appearing within a video as a <subject-predicate-object> triplet. Where existing works treat object proposals or tubelets as single entities and model their relations a posteriori, we propose to classify and detect predicates for pairs of object tubelets a priori. We also propose Social Fabric: an encoding that repr… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  4. arXiv:2108.03656  [pdf, other

    cs.CV

    Skeleton-Contrastive 3D Action Representation Learning

    Authors: Fida Mohammad Thoker, Hazel Doughty, Cees G. M. Snoek

    Abstract: This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. Our proposal is built upon learning invariances to input skeleton representations and various skeleton augmentations via a noise contrastive estimation. In particular, we propose inter-skeleton contrastive learning, which learns from multiple different input skeleton representations i… ▽ More

    Submitted 8 August, 2021; originally announced August 2021.

    Comments: Accepted in ACM Multimedia 2021

  5. arXiv:2108.03329  [pdf, other

    cs.CV

    Feature-Supervised Action Modality Transfer

    Authors: Fida Mohammad Thoker, Cees G. M. Snoek

    Abstract: This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality many large-scale labeled datasets have been made available. They have become the de facto pre-training choice when recognizing or detecting new actions from RGB d… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: IEEE International Conference on Pattern Recognition (ICPR), 2020

  6. arXiv:2107.08962  [pdf, other

    eess.IV cs.CV

    Frequency-Supervised MR-to-CT Image Synthesis

    Authors: Zenglin Shi, Pascal Mettes, Guoyan Zheng, Cees Snoek

    Abstract: This paper strives to generate a synthetic computed tomography (CT) image from a magnetic resonance (MR) image. The synthetic CT image is valuable for radiotherapy planning when only an MR image is available. Recent approaches have made large strides in solving this challenging synthesis problem with convolutional neural networks that learn a map** from MR inputs to CT outputs. In this paper, we… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: MICCAI workshop on Deep Generative Models, 2021

  7. arXiv:2107.05757  [pdf, other

    cs.LG cs.AI

    Kernel Continual Learning

    Authors: Mohammad Mahdi Derakhshani, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: This paper introduces kernel continual learning, a simple but effective variant of continual learning that leverages the non-parametric nature of kernel methods to tackle catastrophic forgetting. We deploy an episodic memory unit that stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression. This does not require memory replay and systematically… ▽ More

    Submitted 14 July, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: accepted to ICML 2021

  8. arXiv:2107.01125  [pdf, other

    eess.IV cs.CV

    On Measuring and Controlling the Spectral Bias of the Deep Image Prior

    Authors: Zenglin Shi, Pascal Mettes, Subhransu Maji, Cees G. M. Snoek

    Abstract: The deep image prior showed that a randomly initialized network with a suitable architecture can be trained to solve inverse imaging problems by simply optimizing it's parameters to reconstruct a single degraded image. However, it suffers from two practical limitations. First, it remains unclear how to control the prior beyond the choice of the network architecture. Second, training requires an or… ▽ More

    Submitted 30 December, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: IJCV 2022; Spectral bias; Deep image prior; 24 pages

  9. arXiv:2106.13919  [pdf, other

    cs.LG

    Pruning Edges and Gradients to Learn Hypergraphs from Larger Sets

    Authors: David W. Zhang, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: This paper aims for set-to-hypergraph prediction, where the goal is to infer the set of relations for a given set of entities. This is a common abstraction for applications in particle physics, biological systems, and combinatorial optimization. We address two common scaling problems encountered in set-to-hypergraph tasks that limit the size of the input set: the exponentially growing number of hy… ▽ More

    Submitted 16 January, 2023; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: A previous version was named "Recurrently Predicting Hypergraphs". See https://github.com/davzha/recurrently_predicting_hypergraphs for code

  10. arXiv:2106.02960  [pdf, other

    cs.CL

    Meta-Learning with Variational Semantic Memory for Word Sense Disambiguation

    Authors: Yingjun Du, Nithin Holla, Xiantong Zhen, Cees G. M. Snoek, Ekaterina Shutova

    Abstract: A critical challenge faced by supervised word sense disambiguation (WSD) is the lack of large annotated datasets with sufficient coverage of words in their diversity of senses. This inspired recent research on few-shot WSD using meta-learning. While such work has successfully applied meta-learning to learn new word senses from very few examples, its performance still lags behind its fully supervis… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

    Comments: 15 pages, 5 figures

    Journal ref: ACL-IJCNLP 2021

  11. Unsharp Mask Guided Filtering

    Authors: Zenglin Shi, Yunlu Chen, Efstratios Gavves, Pascal Mettes, Cees G. M. Snoek

    Abstract: The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering by means of an additional guidance image. Where classical guided filters transfer structures using hand-designed functions, recent guided filters have been considerably advanced through parametric learning of deep networks. The state-of-the-art leverages deep networks to estimat… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: IEEE Transactions on Image Processing, 2021

  12. arXiv:2105.06668  [pdf, other

    cs.CV

    Attentional Prototype Inference for Few-Shot Segmentation

    Authors: Haoliang Sun, Xiankai Lu, Haochen Wang, Yilong Yin, Xiantong Zhen, Cees G. M. Snoek, Ling Shao

    Abstract: This paper aims to address few-shot segmentation. While existing prototype-based methods have achieved considerable success, they suffer from uncertainty and ambiguity caused by limited labeled examples. In this work, we propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype o… ▽ More

    Submitted 29 May, 2023; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: Pattern Recognition Journal

  13. arXiv:2105.04030  [pdf, other

    cs.LG

    A Bit More Bayesian: Domain-Invariant Learning with Uncertainty

    Authors: Zehao Xiao, Jiayi Shen, Xiantong Zhen, Ling Shao, Cees G. M. Snoek

    Abstract: Domain generalization is challenging due to the domain shift and the uncertainty caused by the inaccessibility of target domain data. In this paper, we address both challenges with a probabilistic framework based on variational Bayesian inference, by incorporating uncertainty into neural network weights. We couple domain invariance in a probabilistic formula with the variational Bayesian inference… ▽ More

    Submitted 14 July, 2021; v1 submitted 9 May, 2021; originally announced May 2021.

    Comments: accepted to ICML 2021

  14. arXiv:2105.03781  [pdf, other

    cs.LG cs.CV

    MetaKernel: Learning Variational Random Features with Limited Labels

    Authors: Yingjun Du, Haoliang Sun, Xiantong Zhen, Jun Xu, Yilong Yin, Ling Shao, Cees G. M. Snoek

    Abstract: Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. The crux of few-shot learning is to extract prior knowledge from related tasks to enable fast adaptation to a new task with a limited amount of data. In this paper, we propose meta-learning kernels with random Fourier features for few-shot… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

    Comments: 19 pages,7 figures. arXiv admin note: substantial text overlap with arXiv:2006.06707

  15. arXiv:2105.01646  [pdf, other

    cs.CV

    Motion-Augmented Self-Training for Video Recognition at Smaller Scale

    Authors: Kirill Gavrilyuk, Mihir Jain, Ilia Karmanov, Cees G. M. Snoek

    Abstract: The goal of this paper is to self-train a 3D convolutional neural network on an unlabeled video collection for deployment on small-scale video collections. As smaller video datasets benefit more from motion than appearance, we strive to train our network using optical flow, but avoid its computation during inference. We propose the first motion-augmented self-training regime, we call MotionFit. We… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

  16. arXiv:2104.11721  [pdf, other

    cs.CV

    Safe Fakes: Evaluating Face Anonymizers for Face Detectors

    Authors: Sander R. Klomp, Matthew van Rijn, Rob G. J. Wijnhoven, Cees G. M. Snoek, Peter H. N. de With

    Abstract: Since the introduction of the GDPR and CCPA legislation, both public and private facial image datasets are increasingly scrutinized. Several datasets have been taken offline completely and some have been anonymized. However, it is unclear how anonymization impacts face detection performance. To our knowledge, this paper presents the first empirical study on the effect of image anonymization on sup… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    ACM Class: I.5.4

  17. arXiv:2104.04715  [pdf, other

    cs.CV

    Object Priors for Classifying and Localizing Unseen Actions

    Authors: Pascal Mettes, William Thong, Cees G. M. Snoek

    Abstract: This work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples. Where existing work relies on transferring global attribute or object information from seen to unseen action videos, we seek to classify and spatio-temporally localize unseen actions in videos from image-based object information only. We propose three spat… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

    Comments: Accepted to IJCV

  18. arXiv:2104.02439  [pdf, other

    cs.CV

    Few-Shot Transformation of Common Actions into Time and Space

    Authors: Pengwan Yang, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper introduces the task of few-shot common action localization in time and space. Given a few trimmed support videos containing the same but unknown action, we strive for spatio-temporal localization of that action in a long untrimmed query video. We do not require any class labels, interval bounds, or bounding boxes. To address this challenging task, we introduce a novel few-shot transform… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  19. arXiv:2104.00996  [pdf, other

    cs.CV

    LiftPool: Bidirectional ConvNet Pooling

    Authors: Jiaojiao Zhao, Cees G. M. Snoek

    Abstract: Pooling is a critical operation in convolutional neural networks for increasing receptive fields and improving robustness to input variations. Most existing pooling operations downsample the feature maps, which is a lossy process. Moreover, they are not invertible: upsampling a downscaled feature map can not recover the lost information in the downsampling. By adopting the philosophy of the classi… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: published on ICLR 2021

  20. arXiv:2104.00969  [pdf, other

    cs.CV

    TubeR: Tubelet Transformer for Video Action Detection

    Authors: Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Shuai Bing, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, Ivan Marsic, Cees G. M. Snoek, Joseph Tighe

    Abstract: We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly detect an action tubelet in a video by simultaneously performing action localization and recognition from a single representation. TubeR learns… ▽ More

    Submitted 10 May, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

    Comments: Accepted at CVPR 2022 (Oral)

  21. arXiv:2103.13096  [pdf, other

    cs.CV

    Repetitive Activity Counting by Sight and Sound

    Authors: Yunhua Zhang, Ling Shao, Cees G. M. Snoek

    Abstract: This paper strives for repetitive activity counting in videos. Different from existing works, which all analyze the visual video content only, we incorporate for the first time the corresponding sound into the repetition counting process. This benefits accuracy in challenging vision conditions such as occlusion, dramatic camera view changes, low resolution, etc. We propose a model that starts with… ▽ More

    Submitted 17 April, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted at CVPR 2021

  22. arXiv:2010.10341  [pdf, other

    cs.LG

    Learning to Learn Variational Semantic Memory

    Authors: Xiantong Zhen, Yingjun Du, Huan Xiong, Qiang Qiu, Cees G. M. Snoek, Ling Shao

    Abstract: In this paper, we introduce variational semantic memory into meta-learning to acquire long-term knowledge for few-shot learning. The variational semantic memory accrues and stores semantic information for the probabilistic inference of class prototypes in a hierarchical Bayesian framework. The semantic memory is grown from scratch and gradually consolidated by absorbing information from tasks it e… ▽ More

    Submitted 14 July, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: accepted to NeurIPS 2020; code is available in https://github.com/YDU-uva/VSM

  23. arXiv:2010.04109  [pdf, other

    cs.LG stat.ML

    Set Prediction without Imposing Structure as Conditional Density Estimation

    Authors: David W. Zhang, Gertjan J. Burghouts, Cees G. M. Snoek

    Abstract: Set prediction is about learning to predict a collection of unordered variables with unknown interrelations. Training such models with set losses imposes the structure of a metric space over sets. We focus on stochastic and underdefined cases, where an incorrectly chosen loss function leads to implausible predictions. Example tasks include conditional point-cloud reconstruction and predicting futu… ▽ More

    Submitted 21 February, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

  24. arXiv:2008.11185  [pdf, other

    cs.CV

    Bias-Awareness for Zero-Shot Learning the Seen and Unseen

    Authors: William Thong, Cees G. M. Snoek

    Abstract: Generalized zero-shot learning recognizes inputs from both seen and unseen classes. Yet, existing methods tend to be biased towards the classes seen during training. In this paper, we strive to mitigate this bias. We propose a bias-aware learner to map inputs to a semantic embedding space for generalized zero-shot learning. During training, the model learns to regress to real-valued class prototyp… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: Accepted at British Machine Vision Conference (BMVC) 2020

  25. arXiv:2008.06374  [pdf, other

    cs.CV

    PointMixup: Augmentation for Point Clouds

    Authors: Yunlu Chen, Vincent Tao Hu, Efstratios Gavves, Thomas Mensink, Pascal Mettes, Pengwan Yang, Cees G. M. Snoek

    Abstract: This paper introduces data augmentation for point clouds by interpolation between examples. Data augmentation by interpolation has shown to be a simple and effective approach in the image domain. Such a mixup is however not directly transferable to point clouds, as we do not have a one-to-one correspondence between the points of two different objects. In this paper, we define data augmentation bet… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

    Comments: Accepted as Spotlight presentation at European Conference on Computer Vision (ECCV), 2020

  26. arXiv:2008.05826  [pdf, other

    cs.CV cs.LG eess.IV

    Localizing the Common Action Among a Few Videos

    Authors: Pengwan Yang, Vincent Tao Hu, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper strives to localize the temporal extent of an action in a long untrimmed video. Where existing work leverages many examples with their start, their ending, and/or the class of the action during training time, we propose few-shot common action localization. The start and end of an action in a long untrimmed video is determined based on just a hand-full of trimmed video examples containin… ▽ More

    Submitted 25 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  27. arXiv:2007.07645  [pdf, other

    cs.CV

    Learning to Learn with Variational Information Bottleneck for Domain Generalization

    Authors: Yingjun Du, Jun Xu, Huan Xiong, Qiang Qiu, Xiantong Zhen, Cees G. M. Snoek, Ling Shao

    Abstract: Domain generalization models learn to generalize to previously unseen domains, but suffer from prediction uncertainty and domain shift. In this paper, we address both problems. We introduce a probabilistic meta-learning model for domain generalization, in which classifier parameters shared across domains are modeled as distributions. This enables better handling of prediction uncertainty on unseen… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: 15 pages, 4 figures, ECCV2020

  28. arXiv:2006.06707  [pdf, other

    cs.LG stat.ML

    Learning to Learn Kernels with Variational Random Features

    Authors: Xiantong Zhen, Haoliang Sun, Yingjun Du, Jun Xu, Yilong Yin, Ling Shao, Cees Snoek

    Abstract: In this work, we introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability. We propose meta variational random features (MetaVRF) to learn adaptive kernels for the base-learner, which is developed in a latent variable model by treating the random feature basis as the latent variable. We formulate the optimization of MetaVRF as… ▽ More

    Submitted 13 August, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: ICML'2020; code is available in: https://github.com/Yingjun-Du/MetaVRF

  29. arXiv:2003.12737  [pdf, other

    cs.CV

    Actor-Transformers for Group Activity Recognition

    Authors: Kirill Gavrilyuk, Ryan Sanford, Mehrsan Javan, Cees G. M. Snoek

    Abstract: This paper strives to recognize individual actions and group activities from videos. While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. We feed the transformer with rich actor-… ▽ More

    Submitted 28 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

  30. arXiv:2003.07833  [pdf, other

    cs.CV

    Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification

    Authors: Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees G. M. Snoek, Ling Shao

    Abstract: Zero-shot learning strives to classify unseen categories for which no data is available during training. In the generalized variant, the test samples can further belong to seen or unseen categories. The state-of-the-art relies on Generative Adversarial Networks that synthesize unseen class features by leveraging class-specific semantic embeddings. During training, they generate semantically consis… ▽ More

    Submitted 18 July, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: Accepted for publication at ECCV 2020

  31. arXiv:2003.05065  [pdf, other

    cs.CV

    Cloth in the Wind: A Case Study of Physical Measurement through Simulation

    Authors: Tom F. H. Runia, Kirill Gavrilyuk, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: For many of the physical phenomena around us, we have developed sophisticated models explaining their behavior. Nevertheless, measuring physical properties from visual observations is challenging due to the high number of causally underlying physical parameters -- including material properties and external forces. In this paper, we propose to measure latent physical properties for cloth in the win… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: CVPR 2020. arXiv admin note: substantial text overlap with arXiv:1910.07861

  32. arXiv:1911.08621  [pdf, other

    cs.CV

    Open Cross-Domain Visual Search

    Authors: William Thong, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper addresses cross-domain visual search, where visual queries retrieve category samples from a different domain. For example, we may want to sketch an airplane and retrieve photographs of airplanes. Despite considerable progress, the search occurs in a closed setting between two pre-defined domains. In this paper, we make the step towards an open setting where multiple visual domains are a… ▽ More

    Submitted 28 July, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: Accepted at Computer Vision and Image Understanding (CVIU)

  33. arXiv:1910.07861  [pdf, other

    cs.CV

    Go with the Flow: Perception-refined Physics Simulation

    Authors: Tom F. H. Runia, Kirill Gavrilyuk, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: For many of the physical phenomena around us, we have developed sophisticated models explaining their behavior. Nevertheless, inferring specifics from visual observations is challenging due to the high number of causally underlying physical parameters -- including material properties and external forces. This paper addresses the problem of inferring such latent physical properties from observation… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

  34. arXiv:1904.05404  [pdf, other

    cs.CV cs.LG

    Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on n-Spheres

    Authors: Shuai Liao, Efstratios Gavves, Cees G. M. Snoek

    Abstract: Many computer vision challenges require continuous outputs, but tend to be solved by discrete classification. The reason is classification's natural containment within a probability $n$-simplex, as defined by the popular softmax activation function. Regular regression lacks such a closed geometry, leading to unstable training and convergence to suboptimal local minima. Starting from this insight w… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: CVPR 2019 camera ready

  35. arXiv:1904.01421  [pdf, other

    cs.CV

    Cooperative Embeddings for Instance, Attribute and Category Retrieval

    Authors: William Thong, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: The goal of this paper is to retrieve an image based on instance, attribute and category similarity notions. Different from existing works, which usually address only one of these entities in isolation, we introduce a cooperative embedding to integrate them while preserving their specific level of semantic representation. An algebraic structure defines a superspace filled with instances. Attribute… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

  36. arXiv:1904.00696  [pdf, other

    cs.CV

    Dance with Flow: Two-in-One Stream Action Detection

    Authors: Jiaojiao Zhao, Cees G. M. Snoek

    Abstract: The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, whic… ▽ More

    Submitted 11 June, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: Accepted by CVPR2019

  37. arXiv:1903.12206  [pdf, other

    cs.CV

    Counting with Focus for Free

    Authors: Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper aims to count arbitrary objects in images. The leading counting approaches start from point annotations per object from which they construct density maps. Then, their training objective transforms input images to density maps through deep convolutional networks. We posit that the point annotations serve more supervision purposes than just constructing density maps. We introduce ways to… ▽ More

    Submitted 6 August, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

    Comments: ICCV, 2019

  38. arXiv:1901.10889  [pdf, other

    cs.CV

    Pixelated Semantic Colorization

    Authors: Jiaojiao Zhao, Jungong Han, Ling Shao, Cees G. M. Snoek

    Abstract: While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from limited semantic understanding. To address this shortcoming, we propose to exploit pixelated object semantics to guide image colorization. The rationale is that human beings perceive and distinguish colors based on the semantic catego… ▽ More

    Submitted 7 February, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

  39. arXiv:1901.10514  [pdf, other

    cs.LG stat.ML

    Hyperspherical Prototype Networks

    Authors: Pascal Mettes, Elise van der Pol, Cees G. M. Snoek

    Abstract: This paper introduces hyperspherical prototype networks, which unify classification and regression with prototypes on hyperspherical output spaces. For classification, a common approach is to define prototypes as the mean output vector over training examples per class. Here, we propose to use hyperspheres as output spaces, with class prototypes defined a priori with large margin separation. We pos… ▽ More

    Submitted 25 October, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: NeurIPS 2019

  40. arXiv:1901.10364  [pdf, other

    cs.CV

    Anomaly Locality in Video Surveillance

    Authors: Federico Landi, Cees G. M. Snoek, Rita Cucchiara

    Abstract: This paper strives for the detection of real-world anomalies such as burglaries and assaults in surveillance videos. Although anomalies are generally local, as they happen in a limited portion of the frame, none of the previous works on the subject has ever studied the contribution of locality. In this work, we explore the impact of considering spatiotemporal tubes instead of whole-frame video seg… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.

    Comments: Submitted to International Conference on Image Processing, 2019

  41. arXiv:1808.03766  [pdf, ps, other

    cs.CV

    The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

    Authors: Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao

    Abstract: The 3rd annual installment of the ActivityNet Large- Scale Activity Recognition Challenge, held as a full-day workshop in CVPR 2018, focused on the recognition of daily life, high-level, goal-oriented activities from user-generated videos as those found in internet video portals. The 2018 challenge hosted six diverse tasks which aimed to push the limits of semantic visual understanding of videos a… ▽ More

    Submitted 23 August, 2018; v1 submitted 11 August, 2018; originally announced August 2018.

    Comments: CVPR Workshop 2018 challenge summary

  42. arXiv:1808.01597  [pdf, other

    cs.CV

    Pixel-level Semantics Guided Image Colorization

    Authors: Jiaojiao Zhao, Li Liu, Cees G. M. Snoek, Jungong Han, Ling Shao

    Abstract: While many image colorization algorithms have recently shown the capability of producing plausible color versions from gray-scale photographs, they still suffer from the problems of context confusion and edge color bleeding. To address context confusion, we propose to incorporate the pixel-level object semantics to guide the image colorization. The rationale is that human beings perceive and disti… ▽ More

    Submitted 5 August, 2018; originally announced August 2018.

  43. arXiv:1807.06980  [pdf, other

    cs.CV

    Video Time: Properties, Encoders and Evaluation

    Authors: Amir Ghodrati, Efstratios Gavves, Cees G. M. Snoek

    Abstract: Time-aware encoding of frame sequences in a video is a fundamental problem in video understanding. While many attempted to model time in videos, an explicit study on quantifying video time is missing. To fill this lacuna, we aim to evaluate video time explicitly. We describe three properties of video time, namely a) temporal asymmetry, b)temporal continuity and c) temporal causality. Based on each… ▽ More

    Submitted 18 July, 2018; originally announced July 2018.

    Comments: 14 pages, BMVC 2018

  44. arXiv:1807.02800  [pdf, other

    cs.CV

    Spatio-Temporal Instance Learning: Action Tubes from Class Supervision

    Authors: Pascal Mettes, Cees G. M. Snoek

    Abstract: The goal of this work is spatio-temporal action localization in videos, using only the supervision from video-level class labels. The state-of-the-art casts this weakly-supervised action localization regime as a Multiple Instance Learning problem, where instances are a priori computed spatio-temporal proposals. Rather than disconnecting the spatio-temporal learning from the training, we propose Sp… ▽ More

    Submitted 21 November, 2018; v1 submitted 8 July, 2018; originally announced July 2018.

  45. arXiv:1806.06984  [pdf, other

    cs.CV

    Repetition Estimation

    Authors: Tom F. H. Runia, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: Visual repetition is ubiquitous in our world. It appears in human activity (sports, cooking), animal behavior (a bee's waggle dance), natural phenomena (leaves in the wind) and in urban environments (flashing lights). Estimating visual repetition from realistic video is challenging as periodic motion is rarely perfectly static and stationary. To better deal with realistic video, we elevate the sta… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

  46. Pointly-Supervised Action Localization

    Authors: Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper strives for spatio-temporal localization of human actions in videos. In the literature, the consensus is to achieve localization by training on bounding box annotations provided for each frame of each training video. As annotating boxes in video is expensive, cumbersome and error-prone, we propose to bypass box-supervision. Instead, we introduce action localization based on point-superv… ▽ More

    Submitted 1 October, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: International Journal of Computer Vision, 2018

  47. arXiv:1804.01824  [pdf, other

    cs.CV

    Guess Where? Actor-Supervision for Spatiotemporal Action Localization

    Authors: Victor Escorcia, Cuong D. Dao, Mihir Jain, Bernard Ghanem, Cees Snoek

    Abstract: This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a weakly-supervised solution that only requires a video class label. We introduce an actor-supervised architecture that exploits the inherent compositionality of actions in terms o… ▽ More

    Submitted 5 April, 2018; originally announced April 2018.

    Comments: cvpr version

  48. arXiv:1803.07485  [pdf, other

    cs.CV

    Actor and Action Video Segmentation from a Sentence

    Authors: Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G. M. Snoek

    Abstract: This paper strives for pixel-level segmentation of actors and their actions in video content. Different from existing works, which all learn to segment from a fixed vocabulary of actor and action pairs, we infer the segmentation from a natural language input sentence. This allows to distinguish between fine-grained actors in the same super-category, identify actor and action instances, and segment… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

    Comments: Accepted to CVPR 2018 as oral

  49. arXiv:1802.09971  [pdf, other

    cs.CV

    Real-World Repetition Estimation by Div, Grad and Curl

    Authors: Tom F. H. Runia, Cees G. M. Snoek, Arnold W. M. Smeulders

    Abstract: We consider the problem of estimating repetition in video, such as performing push-ups, cutting a melon or playing violin. Existing work shows good results under the assumption of static and stationary periodicity. As realistic video is rarely perfectly static and stationary, the often preferred Fourier-based measurements is inapt. Instead, we adopt the wavelet transform to better handle non-stati… ▽ More

    Submitted 27 February, 2018; originally announced February 2018.

  50. arXiv:1801.10253  [pdf, other

    cs.CL cs.IR cs.MM

    The New Modality: Emoji Challenges in Prediction, Anticipation, and Retrieval

    Authors: Spencer Cappallo, Stacey Svetlichnaya, Pierre Garrigues, Thomas Mensink, Cees G. M. Snoek

    Abstract: Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages. We propose to treat these ideograms as a new modality in their own right, distinct in their semantic structure from both the text in which they are often embedded as well as the images which they resemble. As a new modality, emoji present rich novel… ▽ More

    Submitted 2 February, 2018; v1 submitted 30 January, 2018; originally announced January 2018.