Skip to main content

Showing 51–70 of 70 results for author: Damen, D

.
  1. arXiv:1908.00867  [pdf, other

    cs.CV

    An Evaluation of Action Recognition Models on EPIC-Kitchens

    Authors: Will Price, Dima Damen

    Abstract: We benchmark contemporary action recognition models (TSN, TRN, and TSM) on the recently introduced EPIC-Kitchens dataset and release pretrained models on GitHub (https://github.com/epic-kitchens/action-models) for others to build upon. In contrast to popular action recognition datasets like Kinetics, Something-Something, UCF101, and HMDB51, EPIC-Kitchens is shot from an egocentric perspective and… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

    Comments: 6 pages, 3 figures, 3 tables. Models released at https://github.com/epic-kitchens/action-models

  2. arXiv:1907.11117  [pdf, other

    cs.CV

    Learning Visual Actions Using Multiple Verb-Only Labels

    Authors: Michael Wray, Dima Damen

    Abstract: This work introduces verb-only representations for both recognition and retrieval of visual actions, in video. Current methods neglect legitimate semantic ambiguities between verbs, instead choosing unambiguous subsets of verbs along with objects to disambiguate the actions. We instead propose multiple verb-only labels, which we learn through hard or soft assignment as a regression. This enables l… ▽ More

    Submitted 1 August, 2019; v1 submitted 25 July, 2019; originally announced July 2019.

    Comments: Accepted at BMVC 2019. More information can be found at https://mwray.github.io/MVOL/. Annotations can be found at https://github.com/mwray/Multi-Verb-Labels

  3. arXiv:1904.08634  [pdf, other

    cs.CV

    DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition

    Authors: Toby Perrett, Dima Damen

    Abstract: Domain alignment in convolutional networks aims to learn the degree of layer-specific feature alignment beneficial to the joint learning of source and target datasets. While increasingly popular in convolutional networks, there have been no previous attempts to achieve domain alignment in recurrent networks. Similar to spatial features, both source and target domains are likely to exhibit temporal… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: To appear in CVPR 2019

  4. arXiv:1904.04689  [pdf, other

    cs.CV

    Action Recognition from Single Timestamp Supervision in Untrimmed Videos

    Authors: Davide Moltisanti, Sanja Fidler, Dima Damen

    Abstract: Recognising actions in videos relies on labelled supervision during training, typically the start and end times of each action instance. This supervision is not only subjective, but also expensive to acquire. Weak video-level supervision has been successfully exploited for recognition in untrimmed videos, however it is challenged when the number of different actions in training videos increases. W… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: CVPR 2019

  5. arXiv:1812.05538  [pdf, other

    cs.CV

    The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos

    Authors: Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen

    Abstract: We present a new model to determine relative skill from long videos, through learnable temporal attention modules. Skill determination is formulated as a ranking problem, making it suitable for common and generic tasks. However, for long videos, parts of the video are irrelevant for assessing skill, and there may be variability in the skill exhibited throughout a video. We therefore propose a meth… ▽ More

    Submitted 10 April, 2019; v1 submitted 13 December, 2018; originally announced December 2018.

    Comments: CVPR 2019

  6. arXiv:1806.08152  [pdf, other

    cs.CV

    CaloriNet: From silhouettes to calorie estimation in private environments

    Authors: Alessandro Masullo, Tilo Burghardt, Dima Damen, Sion Hannuna, Victor Ponce-López, Majid Mirmehdi

    Abstract: We propose a novel deep fusion architecture, CaloriNet, for the online estimation of energy expenditure for free living monitoring in private environments, where RGB data is discarded and replaced by silhouettes. Our fused convolutional neural network architecture is trainable end-to-end, to estimate calorie expenditure, using temporal foreground silhouettes alongside accelerometer data. The netwo… ▽ More

    Submitted 21 June, 2018; originally announced June 2018.

    Comments: 11 pages, 7 figures

  7. arXiv:1806.04074  [pdf, other

    cs.CV

    Semantically Selective Augmentation for Deep Compact Person Re-Identification

    Authors: Víctor Ponce-López, Tilo Burghardt, Sion Hannunna, Dima Damen, Alessandro Masullo, Majid Mirmehdi

    Abstract: We present a deep person re-identification approach that combines semantically selective, deep data augmentation with clustering-based network compression to generate high performance, light and fast inference networks. In particular, we propose to augment limited training data via sampling from a deep convolutional generative adversarial network (DCGAN), whose discriminator is constrained by a se… ▽ More

    Submitted 18 June, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

  8. arXiv:1805.11907  [pdf, other

    cs.OH

    A Guide to the SPHERE 100 Homes Study Dataset

    Authors: Atis Elsts, Tilo Burghardt, Dallan Byrne, Massimo Camplani, Dima Damen, Xenofon Fafoutis, Sion Hannuna, William Harwin, Michael Holmes, Balazs Janko, Victor Ponce Lopez, Alessandro Masullo, Majid Mirmehdi, George Oikonomou, Robert Piechocki, R. Simon Sherratt, Emma Tonkin, Niall Twomey, Antonis Vafeas, Przemyslaw Woznowski, Ian Craddock

    Abstract: The SPHERE project has developed a multi-modal sensor platform for health and behavior monitoring in residential environments. So far, the SPHERE platform has been deployed for data collection in approximately 50 homes for duration up to one year. This technical document describes the format and the expected content of the SPHERE dataset(s) under preparation. It includes a list of some data qualit… ▽ More

    Submitted 30 October, 2018; v1 submitted 30 May, 2018; originally announced May 2018.

  9. arXiv:1805.06749  [pdf, ps, other

    cs.CV

    Action Completion: A Temporal Model for Moment Detection

    Authors: Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen

    Abstract: We introduce completion moment detection for actions - the problem of locating the moment of completion, when the action's goal is confidently considered achieved. The paper proposes a joint classification-regression recurrent model that predicts completion from a given frame, and then integrates frame-level contributions to detect sequence-level completion moment. We introduce a recurrent voting… ▽ More

    Submitted 23 July, 2018; v1 submitted 17 May, 2018; originally announced May 2018.

  10. arXiv:1805.04026  [pdf, other

    cs.CV

    Towards an Unequivocal Representation of Actions

    Authors: Michael Wray, Davide Moltisanti, Dima Damen

    Abstract: This work introduces verb-only representations for actions and interactions; the problem of describing similar motions (e.g. 'open door', 'open cupboard'), and distinguish differing ones (e.g. 'open door' vs 'open bottle') using verb-only labels. Current approaches for action recognition neglect legitimate semantic ambiguities and class overlaps between verbs (Fig. 1), relying on the objects to di… ▽ More

    Submitted 10 May, 2018; originally announced May 2018.

  11. arXiv:1804.02748  [pdf, other

    cs.CV

    Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

    Authors: Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, Michael Wray

    Abstract: First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen… ▽ More

    Submitted 31 July, 2018; v1 submitted 8 April, 2018; originally announced April 2018.

    Comments: European Conference on Computer Vision (ECCV) 2018 Dataset and Project page: http://epic-kitchens.github.io

  12. arXiv:1710.02310  [pdf, ps, other

    cs.CV

    Detecting the Moment of Completion: Temporal Models for Localising Action Completion

    Authors: Farnoosh Heidarivincheh, Majid Mirmehdi, Dima Damen

    Abstract: Action completion detection is the problem of modelling the action's progression towards localising the moment of completion - when the action's goal is confidently considered achieved. In this work, we assess the ability of two temporal models, namely Hidden Markov Models (HMM) and Long-Short Term Memory (LSTM), to localise completion for six object interactions: switch, plug, open, pull, pick an… ▽ More

    Submitted 6 October, 2017; originally announced October 2017.

  13. arXiv:1703.09913  [pdf, other

    cs.CV

    Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination

    Authors: Hazel Doughty, Dima Damen, Walterio Mayol-Cuevas

    Abstract: We present a method for assessing skill from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate the problem as pairwise (who's better?) and overall (who's best?) ranking of video collections, using supervised deep ranking. We propose a novel loss function that learns discriminative features when a pair of videos exhibit variance in skill,… ▽ More

    Submitted 29 March, 2018; v1 submitted 29 March, 2017; originally announced March 2017.

    Comments: CVPR 2018

  14. arXiv:1703.09026  [pdf, other

    cs.CV

    Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video

    Authors: Davide Moltisanti, Michael Wray, Walterio Mayol-Cuevas, Dima Damen

    Abstract: Manual annotations of temporal bounds for object interactions (i.e. start and end times) are typical training input to recognition, localization and detection algorithms. For three publicly available egocentric datasets, we uncover inconsistencies in ground truth temporal bounds within and across annotators and datasets. We systematically assess the robustness of state-of-the-art approaches to cha… ▽ More

    Submitted 26 July, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

    Comments: ICCV 2017

  15. arXiv:1703.08338  [pdf, other

    cs.CV

    Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition

    Authors: Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen

    Abstract: This work deviates from easy-to-define class boundaries for object interactions. For the task of object interaction recognition, often captured using an egocentric view, we show that semantic ambiguities in verbs and recognising sub-interactions along with concurrent interactions result in legitimate class overlaps (Figure 1). We thus aim to model the map** between observations and interaction c… ▽ More

    Submitted 21 April, 2017; v1 submitted 24 March, 2017; originally announced March 2017.

  16. arXiv:1701.02586  [pdf, other

    cs.HC

    Automated capture and delivery of assistive task guidance with an eyewear computer: The GlaciAR system

    Authors: Teesid Leelasawassuk, Dima Damen, Walterio Mayol-Cuevas

    Abstract: In this paper we describe and evaluate a mixed reality system that aims to augment users in task guidance applications by combining automated and unsupervised information collection with minimally invasive video guides. The result is a self-contained system that we call GlaciAR (Glass-enabled Contextual Interactions for Augmented Reality), that operates by extracting contextual interactions from o… ▽ More

    Submitted 28 December, 2016; originally announced January 2017.

  17. arXiv:1607.08414  [pdf, other

    cs.CV

    SEMBED: Semantic Embedding of Egocentric Action Videos

    Authors: Michael Wray, Davide Moltisanti, Walterio Mayol-Cuevas, Dima Damen

    Abstract: We present SEMBED, an approach for embedding an egocentric object interaction video in a semantic-visual graph to estimate the probability distribution over its potential semantic labels. When object interactions are annotated using unbounded choice of verbs, we embrace the wealth and ambiguity of these labels by capturing the semantic relationships as well as the visual similarities over motion a… ▽ More

    Submitted 29 July, 2016; v1 submitted 28 July, 2016; originally announced July 2016.

  18. arXiv:1607.08196  [pdf, other

    cs.CV

    Calorie Counter: RGB-Depth Visual Estimation of Energy Expenditure at Home

    Authors: Lili Tao, Tilo Burghardt, Majid Mirmehdi, Dima Damen, Ashley Cooper, Sion Hannuna, Massimo Camplani, Adeline Paiement, Ian Craddock

    Abstract: We present a new framework for vision-based estimation of calorific expenditure from RGB-D data - the first that is validated on physical gas exchange measurements and applied to daily living scenarios. Deriving a person's energy expenditure from sensors is an important tool in tracking physical activity levels for health and lifestyle monitoring. Most existing methods use metabolic lookup tables… ▽ More

    Submitted 27 July, 2016; originally announced July 2016.

  19. arXiv:1606.04450  [pdf, other

    cs.CV

    Multiple Human Tracking in RGB-D Data: A Survey

    Authors: Massimo Camplani, Adeline Paiement, Majid Mirmehdi, Dima Damen, Sion Hannuna, Tilo Burghardt, Lili Tao

    Abstract: Multiple human tracking (MHT) is a fundamental task in many computer vision applications. Appearance-based approaches, primarily formulated on RGB data, are constrained and affected by problems arising from occlusions and/or illumination variations. In recent years, the arrival of cheap RGB-Depth (RGB-D) devices has {led} to many new approaches to MHT, and many of these integrate color and depth c… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.

  20. arXiv:1510.04862  [pdf, other

    cs.CV

    You-Do, I-Learn: Unsupervised Multi-User egocentric Approach Towards Video-Based Guidance

    Authors: Dima Damen, Teesid Leelasawassuk, Walterio Mayol-Cuevas

    Abstract: This paper presents an unsupervised approach towards automatically extracting video-based guidance on object usage, from egocentric video and wearable gaze tracking, collected from multiple users while performing tasks. The approach i) discovers task relevant objects, ii) builds a model for each, iii) distinguishes different ways in which each discovered object has been used and iv) discovers the… ▽ More

    Submitted 19 March, 2016; v1 submitted 16 October, 2015; originally announced October 2015.