Skip to main content

Showing 1–50 of 55 results for author: Fernando, B

.
  1. arXiv:2407.02420  [pdf

    astro-ph.EP astro-ph.IM physics.geo-ph

    Geophysical Observations of the 24 September 2023 OSIRIS-REx Sample Return Capsule Re-Entry

    Authors: Elizabeth A. Silber, Daniel C. Bowman, Chris G. Carr, David P. Eisenberg, Brian R. Elbing, Benjamin Fernando, Milton A. Garcés, Robert Haaser, Siddharth Krishnamoorthy, Charles A. Langston, Yasuhiro Nishikawa, Jeremy Webster, Jacob F. Anderson, Stephen Arrowsmith, Sonia Bazargan, Luke Beardslee, Brant Beck, Jordan W. Bishop, Philip Blom, Grant Bracht, David L. Chichester, Anthony Christe, Kenneth Cummins, James Cutts, Lisa Danielson , et al. (57 additional authors not shown)

    Abstract: Sample Return Capsules (SRCs) entering Earth's atmosphere at hypervelocity from interplanetary space are a valuable resource for studying meteor phenomena. The 24 September 2023 arrival of the OSIRIS-REx (Origins, Spectral Interpretation, Resource Identification, and Security-Regolith Explorer) SRC provided an unprecedented chance for geophysical observations of a well-characterized source with kn… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 87 pages, 14 figures

  2. arXiv:2404.01299  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

    Authors: Paritosh Parmar, Eric Peh, Ruirui Chen, Ting En Lam, Yuhan Chen, Elston Tan, Basura Fernando

    Abstract: Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create… ▽ More

    Submitted 14 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Project Page: https://github.com/LUNAProject22/CausalChaos

  3. arXiv:2403.03966  [pdf, other

    astro-ph.EP astro-ph.IM astro-ph.SR physics.ao-ph physics.geo-ph

    Seismic and acoustic signals from the 2014 'Interstellar Meteor'

    Authors: Benjamin Fernando, Pierrick Mialle, G öram Ekstr öm, Constantinos Charalambous, Steven Desch, Alan Jackson, Eleanor K. Sansom

    Abstract: We conduct a thorough analysis of seismic and acoustic data from the so-called `Interstellar Meteor' which entered the Earth's atmosphere off the coast of Papua New Guinea on 2014-01-08. We conclude that both previously-reported seismic signals are spurious -- one has characteristics suggesting a local vehicular-traffic based origin; whilst the other is statistically indistinguishable from the bac… ▽ More

    Submitted 3 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 24 pages, 7 figures

  4. arXiv:2401.12471  [pdf, ps, other

    cs.CV

    Zero Shot Open-ended Video Inference

    Authors: Ee Yeo Keat, Zhang Hao, Alexander Matyasko, Basura Fernando

    Abstract: Zero-shot open-ended inference on untrimmed videos poses a significant challenge, especially when no annotated data is utilized to navigate the inference direction. In this work, we aim to address this underexplored domain by introducing an adaptable framework that efficiently combines both the frozen vision-language (VL) model and off-the-shelf large language model (LLM) for conducting zero-shot… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  5. arXiv:2401.10805  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Learning to Visually Connect Actions and their Effects

    Authors: Eric Peh, Paritosh Parmar, Basura Fernando

    Abstract: In this work, we introduce the novel concept of visually Connecting Actions and Their Effects (CATE) in video understanding. CATE can have applications in areas like task planning and learning from demonstration. We identify and explore two different aspects of the concept of CATE: Action Selection and Effect-Affinity Assessment, where video understanding models connect actions and effects at sema… ▽ More

    Submitted 26 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

  6. arXiv:2312.08895  [pdf, other

    cs.CV

    Motion Flow Matching for Human Motion Synthesis and Editing

    Authors: Vincent Tao Hu, Wenzhe Yin, **chuan Ma, Yunlu Chen, Basura Fernando, Yuki M Asano, Efstratios Gavves, Pascal Mettes, Bjorn Ommer, Cees G. M. Snoek

    Abstract: Human motion synthesis is a fundamental task in computer animation. Recent methods based on diffusion models or GPT structure demonstrate commendable performance but exhibit drawbacks in terms of slow sampling speeds and error accumulation. In this paper, we propose \emph{Motion Flow Matching}, a novel generative model designed for human motion generation featuring efficient sampling and effective… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: WIP

  7. arXiv:2310.13619  [pdf, other

    cs.CL cs.CV

    Semi-supervised multimodal coreference resolution in image narrations

    Authors: Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

    Abstract: In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised a… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Long paper at EMNLP'23-Main

  8. arXiv:2307.00586  [pdf, other

    cs.CV

    ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition

    Authors: Debaditya Roy, Dhruv Verma, Basura Fernando

    Abstract: Situation Recognition is the task of generating a structured summary of what is happening in an image using an activity verb and the semantic roles played by actors and objects. In this task, the same activity verb can describe a diverse set of situations as well as the same actor or object category can play a diverse set of semantic roles depending on the situation depicted in the image. Hence a… ▽ More

    Submitted 11 September, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: State-of-the-art results on Grounded Situation Recognition

  9. arXiv:2306.08889  [pdf, other

    cs.CV cs.AI

    Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion

    Authors: Ishaan Singh Rawal, Alexander Matyasko, Shantanu Jaiswal, Basura Fernando, Cheston Tan

    Abstract: While VideoQA Transformer models demonstrate competitive performance on standard benchmarks, the reasons behind their success are not fully understood. Do these models capture the rich multimodal structures and dynamics from video and text jointly? Or are they achieving high scores by exploiting biases and spurious features? Hence, to provide insights, we design $\textit{QUAG}$ (QUadrant AveraGe),… ▽ More

    Submitted 7 June, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted at ICML 2024

  10. arXiv:2305.02673  [pdf, other

    cs.CV

    Modelling Spatio-Temporal Interactions for Compositional Action Recognition

    Authors: Ramanathan Rajendiran, Debaditya Roy, Basura Fernando

    Abstract: Humans have the natural ability to recognize actions even if the objects involved in the action or the background are changed. Humans can abstract away the action from the appearance of the objects and their context which is referred to as compositionality of actions. Compositional action recognition deals with imparting human-like compositional generalization abilities to action-recognition model… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  11. arXiv:2303.10428  [pdf, other

    cs.CV

    A Region-Prompted Adapter Tuning for Visual Abductive Reasoning

    Authors: Hao Zhang, Yeo Keat Ee, Basura Fernando

    Abstract: Visual Abductive Reasoning is an emerging vision-language (VL) topic where the model needs to retrieve/generate a likely textual hypothesis from a visual input (image or its part) using backward reasoning based on commonsense. Unlike in conventional VL retrieval or captioning tasks, where entities of texts appear in the image, in abductive inferences, the relevant facts about inferences are not re… ▽ More

    Submitted 7 January, 2024; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: 13 pages, 11 figures, Under Review of IEEE Transaction

  12. arXiv:2211.14563  [pdf, other

    cs.CV cs.CL

    Who are you referring to? Coreference resolution in image narrations

    Authors: Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

    Abstract: Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form narrations of visual scenes. First we introduce a new dataset with annotated coreference chains and their bounding boxes, as most existing image-text datasets only contain short sentence… ▽ More

    Submitted 17 March, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: 15 pages

  13. arXiv:2211.14154  [pdf, other

    cs.CV

    Interaction Region Visual Transformer for Egocentric Action Anticipation

    Authors: Debaditya Roy, Ramanathan Rajendiran, Basura Fernando

    Abstract: Human-object interaction is one of the most important visual cues and we propose a novel way to represent human-object interactions for egocentric action anticipation. We propose a novel transformer variant to model interactions by computing the change in the appearance of objects and human hands due to the execution of the actions and use those changes to refine the video representation. Specific… ▽ More

    Submitted 11 January, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: Top of the public leaderboard on EK100 Action Anticipation https://codalab.lisn.upsaclay.fr/competitions/702#results. Accepted at IEEE/CVF WACV 2024

  14. arXiv:2210.13984  [pdf, other

    cs.CV

    Abductive Action Inference

    Authors: Clement Tan, Chai Kiat Yeo, Cheston Tan, Basura Fernando

    Abstract: Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this paper, we introduce a novel research task known as "abductive action inference" which addresses the question of which actions were executed by a human to reach a specific state shown in a single snapshot. The research explores three key abductive inference problems: action set prediction,… ▽ More

    Submitted 7 August, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: 16 pages, 9 figures

  15. arXiv:2209.05044  [pdf, other

    cs.CV cs.AI

    Predicting the Next Action by Modeling the Abstract Goal

    Authors: Debaditya Roy, Basura Fernando

    Abstract: The problem of anticipating human actions is an inherently uncertain one. However, we can reduce this uncertainty if we have a sense of the goal that the actor is trying to achieve. Here, we present an action anticipation model that leverages goal information for the purpose of reducing the uncertainty in future predictions. Since we do not possess goal information or the observed actions during i… ▽ More

    Submitted 8 June, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  16. arXiv:2208.11084  [pdf, other

    cs.CV

    Consistency Regularization for Domain Adaptation

    Authors: Kian Boon Koh, Basura Fernando

    Abstract: Collection of real world annotations for training semantic segmentation models is an expensive process. Unsupervised domain adaptation (UDA) tries to solve this problem by studying how more accessible data such as synthetic data can be used to train and adapt models to real world images without requiring their annotations. Recent UDA methods applies self-learning by training on pixel-wise classifi… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: ECCV 2022 workshop paper

  17. arXiv:2203.17178  [pdf, other

    cs.CV cs.AI

    3D Equivariant Graph Implicit Functions

    Authors: Yunlu Chen, Basura Fernando, Hakan Bilen, Matthias Nießner, Efstratios Gavves

    Abstract: In recent years, neural implicit representations have made remarkable progress in modeling of 3D shapes with arbitrary topology. In this work, we address two key limitations of such representations, in failing to capture local 3D geometric fine details, and to learn from and generalize to shapes with unseen 3D transformations. To this end, we introduce a novel family of graph implicit functions wi… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: Video: https://youtu.be/W7goOzZP2Kc

  18. arXiv:2112.10066  [pdf, other

    cs.CV

    LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach

    Authors: Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hiroya Takamura, Qi Wu

    Abstract: We propose LocFormer, a Transformer-based model for video grounding which operates at a constant memory footprint regardless of the video length, i.e. number of frames. LocFormer is designed for tasks where it is necessary to process the entire long video and at its core lie two main contributions. First, our model incorporates a new sampling technique that splits the input feature sequence into a… ▽ More

    Submitted 19 December, 2021; originally announced December 2021.

  19. arXiv:2111.13517  [pdf, other

    cs.CV

    Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation

    Authors: Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

    Abstract: Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding. Existing SGG methods trained on the entire set of relations fail to acquire complex reasoning about visual and textual correlations due to various biases in training data. Learning on trivial relations that indicate generic spatial configuration lik… ▽ More

    Submitted 4 April, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: 16 pages

    Journal ref: CVPR 2022

  20. arXiv:2111.13470  [pdf, other

    cs.CV

    TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

    Authors: Shantanu Jaiswal, Basura Fernando, Cheston Tan

    Abstract: Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance on multiple computer-vision tasks. While existing methods appropriately model channel-, spatial- and self-attention, they primarily operate in a feedforward bottom-up manner. Consequently, the attention mechanism strongly depends on the local information of a single input feature map and does… ▽ More

    Submitted 21 October, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: ECCV 2022 Camera Ready

  21. arXiv:2107.08579  [pdf, other

    cs.CV

    Action Forecasting with Feature-wise Self-Attention

    Authors: Yan Bin Ng, Basura Fernando

    Abstract: We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

  22. arXiv:2105.12414  [pdf, other

    cs.CV

    Anticipating human actions by correlating past with the future with Jaccard similarity measures

    Authors: Basura Fernando, Samitha Herath

    Abstract: We propose a framework for early action recognition and anticipation by correlating past features with the future using three novel similarity measures called Jaccard vector similarity, Jaccard cross-correlation and Jaccard Frobenius inner product over covariances. Using these combinations of novel losses and using our framework, we obtain state-of-the-art results for early action recognition in U… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR 2021

  23. arXiv:2012.06123  [pdf, other

    cs.CV

    A Log-likelihood Regularized KL Divergence for Video Prediction with A 3D Convolutional Variational Recurrent Network

    Authors: Haziq Razali, Basura Fernando

    Abstract: The use of latent variable models has shown to be a powerful tool for modeling probability distributions over sequences. In this paper, we introduce a new variational model that extends the recurrent network in two ways for the task of video frame prediction. First, we introduce 3D convolutions inside all modules including the recurrent model for future frame prediction, inputting and outputting a… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

  24. arXiv:2011.03958  [pdf, other

    cs.CV

    FlowCaps: Optical Flow Estimation with Capsule Networks For Action Recognition

    Authors: Vinoj Jayasundara, Debaditya Roy, Basura Fernando

    Abstract: Capsule networks (CapsNets) have recently shown promise to excel in most computer vision tasks, especially pertaining to scene understanding. In this paper, we explore CapsNet's capabilities in optical flow estimation, a task at which convolutional neural networks (CNNs) have already outperformed other approaches. We propose a CapsNet-based architecture, termed FlowCaps, which attempts to a) achie… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

  25. arXiv:2010.06260  [pdf, other

    cs.CV

    DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video

    Authors: Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hongdong Li, Stephen Gould

    Abstract: This paper studies the task of temporal moment localization in a long untrimmed video using natural language query. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

  26. arXiv:2004.13217  [pdf, other

    cs.CV

    Inferring Temporal Compositions of Actions Using Probabilistic Automata

    Authors: Rodrigo Santa Cruz, Anoop Cherian, Basura Fernando, Dylan Campbell, Stephen Gould

    Abstract: This paper presents a framework to recognize temporal compositions of atomic actions in videos. Specifically, we propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata to recognize complex actions as satisfying these expressions on the input video features. Our approach is different from existing works that… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted in Workshop on Compositionality in Computer Vision at CVPR, 2020

  27. Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting

    Authors: Yan Bin Ng, Basura Fernando

    Abstract: Future human action forecasting from partial observations of activities is an important problem in many practical applications such as assistive robotics, video surveillance and security. We present a method to forecast actions for the unseen future of the video using a neural machine translation technique that uses encoder-decoder architecture. The input to this model is the observed RGB video, a… ▽ More

    Submitted 3 February, 2022; v1 submitted 10 December, 2019; originally announced December 2019.

    Journal ref: in IEEE Transactions on Image Processing, vol. 29, pp. 8880-8891, 2020

  28. arXiv:1911.10082  [pdf, other

    cs.CL cs.CV

    Injecting Prior Knowledge into Image Caption Generation

    Authors: Arushi Goel, Basura Fernando, Thanh-Son Nguyen, Hakan Bilen

    Abstract: Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them. The state-of-the-art methods in image captioning struggles to approach human level performance, especially when data is limited. In this paper, we propose to improve the perfo… ▽ More

    Submitted 6 August, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

    Comments: ECCV20 VIPriors Workshop; 14 pages, 5 figures, 4 tables

  29. arXiv:1911.07806  [pdf, other

    cs.CV

    Action Anticipation with RBF Kernelized Feature Map** RNN

    Authors: Yuge Shi, Basura Fernando, Richard Hartley

    Abstract: We introduce a novel Recurrent Neural Network-based algorithm for future video feature generation and action anticipation called feature map** RNN. Our novel RNN architecture builds upon three effective principles of machine learning, namely parameter sharing, Radial Basis Function kernels and adversarial training. Using only some of the earliest frames of a video, the feature map** RNN is abl… ▽ More

    Submitted 11 July, 2021; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: Accepted for publication in ECCV2018

  30. arXiv:1910.02602  [pdf, other

    cs.CV

    Human Action Sequence Classification

    Authors: Yan Bin Ng, Basura Fernando

    Abstract: This paper classifies human action sequences from videos using a machine translation model. In contrast to classical human action classification which outputs a set of actions, our method output a sequence of action in the chronological order of the actions performed by the human. Therefore our method is evaluated using sequential performance measures such as Bilingual Evaluation Understudy (BLEU)… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

  31. arXiv:1904.07774  [pdf, other

    cs.CV cs.LG stat.ML

    Weakly Supervised Gaussian Networks for Action Detection

    Authors: Basura Fernando, Cheston Tan Yin Chet, Hakan Bilen

    Abstract: Detecting temporal extents of human actions in videos is a challenging computer vision problem that requires detailed manual supervision including frame-level labels. This expensive annotation process limits deploying action detectors to a limited number of categories. We propose a novel method, called WSGN, that learns to detect actions from \emph{weak supervision}, using only video-level labels.… ▽ More

    Submitted 5 January, 2020; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Accepted in WACV 2020

  32. arXiv:1810.09044  [pdf, other

    cs.CV

    VIENA2: A Driving Anticipation Dataset

    Authors: Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, Lars Andersson

    Abstract: Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single datase… ▽ More

    Submitted 29 October, 2018; v1 submitted 21 October, 2018; originally announced October 2018.

    Comments: Accepted in ACCV 2018

  33. arXiv:1808.00141  [pdf, other

    cs.CV

    Action Anticipation By Predicting Future Dynamic Images

    Authors: Cristian Rodriguez, Basura Fernando, Hongdong Li

    Abstract: Human action-anticipation methods predict what is the future action by observing only a few portion of an action in progress. This is critical for applications where computers have to react to human actions as early as possible such as autonomous driving, human-robotic interaction, assistive robotics among others. In this paper, we present a method for human action anticipation by predicting the m… ▽ More

    Submitted 31 July, 2018; originally announced August 2018.

    Comments: 14 pages

  34. arXiv:1801.08676  [pdf, other

    cs.CV cs.LG

    Neural Algebra of Classifiers

    Authors: Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould

    Abstract: The world is fundamentally compositional, so it is natural to think of visual recognition as the recognition of basic visually primitives that are composed according to well-defined rules. This strategy allows us to recognize unseen complex concepts from simple visual primitives. However, the current trend in visual recognition follows a data greedy approach where huge amounts of data are required… ▽ More

    Submitted 26 January, 2018; originally announced January 2018.

    Comments: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

  35. arXiv:1705.10420  [pdf, other

    cs.CV

    Discriminatively Learned Hierarchical Rank Pooling Networks

    Authors: Basura Fernando, Stephen Gould

    Abstract: In this work, we present novel temporal encoding methods for action and activity classification by extending the unsupervised rank pooling temporal encoding method in two ways. First, we present "discriminative rank pooling" in which the shared weights of our video representation and the parameters of the action classifiers are estimated jointly for a given training dataset of labelled vector sequ… ▽ More

    Submitted 29 May, 2017; originally announced May 2017.

    Comments: International Journal of Computer Vision

  36. arXiv:1704.02729  [pdf, other

    cs.CV

    DeepPermNet: Visual Permutation Learning

    Authors: Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould

    Abstract: We present a principled approach to uncover the structure of visual data by solving a novel deep learning task coined visual permutation learning. The goal of this task is to find the permutation that recovers the structure of data from shuffled versions of it. In the case of natural images, this task boils down to recovering the original image from patches shuffled by an unknown permutation matri… ▽ More

    Submitted 10 April, 2017; originally announced April 2017.

    Comments: Accepted in IEEE International Conference on Computer Vision and Pattern Recognition CVPR 2017

  37. arXiv:1704.02112  [pdf, other

    cs.CV

    Generalized Rank Pooling for Activity Recognition

    Authors: Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould

    Abstract: Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity. Usually, this pooling step discards the temporal order of the frames, which could otherwise be used for better recognition. Towards this end, we propose a novel pooling method, generalized rank pooling (GRP), t… ▽ More

    Submitted 22 July, 2017; v1 submitted 7 April, 2017; originally announced April 2017.

    Comments: Accepted at IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2017

  38. arXiv:1703.07023  [pdf, other

    cs.CV

    Encouraging LSTMs to Anticipate Actions Very Early

    Authors: Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Basura Fernando, Lars Petersson, Lars Andersson

    Abstract: In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos. As such, it is therefore key to the success of computer vision applications requiring to react as early as possible, such as autonomous navigation. In this paper, we propose a new action anticipation method that achieves… ▽ More

    Submitted 13 August, 2017; v1 submitted 20 March, 2017; originally announced March 2017.

    Comments: 13 Pages, 7 Figures, 11 Tables. Accepted in ICCV 2017. arXiv admin note: text overlap with arXiv:1611.05520

  39. arXiv:1703.02511  [pdf

    cs.CV

    Deep Learning for Automated Quality Assessment of Color Fundus Images in Diabetic Retinopathy Screening

    Authors: Sajib Kumar Saha, Basura Fernando, Jorge Cuadros, Di Xiao, Yogesan Kanagasingam

    Abstract: Purpose To develop a computer based method for the automated assessment of image quality in the context of diabetic retinopathy (DR) to guide the photographer. Methods A deep learning framework was trained to grade the images automatically. A large representative set of 7000 color fundus images were used for the experiment which were obtained from the EyePACS that were made available by the Cali… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

    Comments: 23 pages, 9 figures

  40. arXiv:1612.00738  [pdf, other

    cs.CV

    Action Recognition with Dynamic Image Networks

    Authors: Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi

    Abstract: We introduce the concept of "dynamic image", a novel compact representation of videos useful for video analysis, particularly in combination with convolutional neural networks (CNNs). A dynamic image encodes temporal data such as RGB or optical flow videos by using the concept of `rank pooling'. The idea is to learn a ranking machine that captures the temporal evolution of the data and to use the… ▽ More

    Submitted 19 August, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

    Comments: 14 pages, 9 figures, 9 tables

  41. arXiv:1612.00576  [pdf, other

    cs.CV

    Guided Open Vocabulary Image Captioning with Constrained Beam Search

    Authors: Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould

    Abstract: Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real world applications dealing with images in the wild. We address this problem using a flexible approach that enables existing deep captioning architectures to take advantage of image taggers at test time, without re-train… ▽ More

    Submitted 19 July, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

    Comments: EMNLP 2017

  42. arXiv:1612.00558  [pdf, other

    cs.CV

    Unsupervised Human Action Detection by Action Matching

    Authors: Basura Fernando, Sareh Shirazi, Stephen Gould

    Abstract: We propose a new task of unsupervised action detection by action matching. Given two long videos, the objective is to temporally detect all pairs of matching video segments. A pair of video segments are matched if they share the same human action. The task is category independent---it does not matter what action is being performed---and no supervision is used to discover such video segments. Unsup… ▽ More

    Submitted 15 May, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: IEEE International Conference on Computer Vision and Pattern Recognition CVPR 2017 Workshops

  43. arXiv:1611.06646  [pdf, other

    cs.CV

    Self-Supervised Video Representation Learning With Odd-One-Out Networks

    Authors: Basura Fernando, Hakan Bilen, Efstratios Gavves, Stephen Gould

    Abstract: We propose a new self-supervised CNN pre-training technique based on a novel auxiliary task called "odd-one-out learning". In this task, the machine is asked to identify the unrelated or odd element from a set of otherwise related elements. We apply this technique to self-supervised video representation learning where we sample subsequences from videos and ask the network to learn to predict the o… ▽ More

    Submitted 5 April, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

    Comments: Accepted in In IEEE International Conference on Computer Vision and Pattern Recognition CVPR 2017

  44. arXiv:1611.05927  [pdf, other

    cs.CV

    Generalized BackPropagation, Étude De Cas: Orthogonality

    Authors: Mehrtash Harandi, Basura Fernando

    Abstract: This paper introduces an extension of the backpropagation algorithm that enables us to have layers with constrained weights in a deep network. In particular, we make use of the Riemannian geometry and optimization techniques on matrix manifolds to step outside of normal practice in training deep networks, equip** the network with structures such as orthogonality or positive definiteness. Based o… ▽ More

    Submitted 17 November, 2016; originally announced November 2016.

  45. arXiv:1611.05520  [pdf, other

    cs.CV

    Deep Action- and Context-Aware Sequence Learning for Activity Recognition and Anticipation

    Authors: Mohammad Sadegh Aliakbarian, Fatemehsadat Saleh, Basura Fernando, Mathieu Salzmann, Lars Petersson, Lars Andersson

    Abstract: Action recognition and anticipation are key to the success of many computer vision applications. Existing methods can roughly be grouped into those that extract global, context-aware representations of the entire image or sequence, and those that aim at focusing on the regions where the action occurs. While the former may suffer from the fact that context is not always reliable, the latter complet… ▽ More

    Submitted 17 November, 2016; v1 submitted 16 November, 2016; originally announced November 2016.

    Comments: 10 pages, 4 figures, 7 tables

  46. arXiv:1607.08822  [pdf, other

    cs.CV cs.CL

    SPICE: Semantic Propositional Image Caption Evaluation

    Authors: Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould

    Abstract: There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and p… ▽ More

    Submitted 29 July, 2016; originally announced July 2016.

    Comments: 14 pages plus references, accepted to ECCV 2016

  47. arXiv:1607.05447  [pdf, other

    cs.CV math.OC

    On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

    Authors: Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, Edison Guo

    Abstract: Some recent works in machine learning and computer vision involve the solution of a bi-level optimization problem. Here the solution of a parameterized lower-level problem binds variables that appear in the objective of an upper-level problem. The lower-level problem typically appears as an argmin or argmax optimization problem. Many techniques have been proposed to solve bi-level optimization pro… ▽ More

    Submitted 20 July, 2016; v1 submitted 19 July, 2016; originally announced July 2016.

    Comments: 16 pages, 6 figures

  48. Rank Pooling for Action Recognition

    Authors: Basura Fernando, Efstratios Gavves, Jose Oramas, Amir Ghodrati, Tinne Tuytelaars

    Abstract: We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g. how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to the video data can serve as a robust new video representation. As a specific example, we learn a pooling function via ranking machines. By learning to rank the fra… ▽ More

    Submitted 15 May, 2016; v1 submitted 6 December, 2015; originally announced December 2015.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence

  49. arXiv:1511.08951  [pdf, other

    cs.CV cs.LG

    MidRank: Learning to rank based on subsequences

    Authors: Basura Fernando, Efstratios Gavves, Damien Muselet, Tinne Tuytelaars

    Abstract: We present a supervised learning to rank algorithm that effectively orders images by exploiting the structure in image sequences. Most often in the supervised learning to rank literature, ranking is approached either by analyzing pairs of images or by optimizing a list-wise surrogate loss function on full sequences. In this work we propose MidRank, which learns from moderately sized sub-sequences… ▽ More

    Submitted 28 November, 2015; originally announced November 2015.

    Comments: To appear in ICCV 2015

  50. arXiv:1509.04942  [pdf, ps, other

    cs.CV

    Guiding Long-Short Term Memory for Image Caption Generation

    Authors: Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars

    Abstract: In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we… ▽ More

    Submitted 16 September, 2015; originally announced September 2015.

    Comments: accepted by ICCV 2015