Skip to main content

Showing 1–42 of 42 results for author: Moeslund, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02465  [pdf, other

    cs.LG cs.AI cs.CV

    An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders

    Authors: Scott C. Lowe, Joakim Bruslund Haurum, Sageev Oore, Thomas B. Moeslund, Graham W. Taylor

    Abstract: Can pretrained models generalize to new datasets without any retraining? We deploy pretrained image models on datasets they were not trained for, and investigate whether their embeddings form meaningful clusters. Our suite of benchmarking experiments use encoders pretrained solely on ImageNet-1k with either supervised or self-supervised training techniques, deployed on image datasets that were not… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2405.03770  [pdf, other

    cs.CV

    Foundation Models for Video Understanding: A Survey

    Authors: Neelu Madan, Andreas Moegelmose, Rajat Modi, Yogesh S. Rawat, Thomas B. Moeslund

    Abstract: Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs achieve this by capturing robust and generic features from video data. This survey analyzes over 200 video foundational models, offering a comprehensive overview of benchmarks and evaluation metrics across 14 distinct video… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  3. arXiv:2404.09703  [pdf, other

    cs.LG stat.ML

    AI Competitions and Benchmarks: Dataset Development

    Authors: Romain Egele, Julio C. S. Jacques Junior, Jan N. van Rijn, Isabelle Guyon, Xavier Baró, Albert Clapés, Prasanna Balaprakash, Sergio Escalera, Thomas Moeslund, Jun Wan

    Abstract: Machine learning is now used in many applications thanks to its ability to predict, generate, or discover patterns from large quantities of data. However, the process of collecting and transforming data for practical use is intricate. Even in today's digital era, where substantial data is generated daily, it is uncommon for it to be readily usable; most often, it necessitates meticulous manual dat… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Preprint version of the 3rd Chapter of the book: Competitions and Benchmarks, the science behind the contests (https://sites.google.com/chalearn.org/book/home)

  4. arXiv:2404.07711  [pdf, other

    cs.CV

    OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities

    Authors: Lasse H. Hansen, Simon B. Jensen, Mark P. Philipsen, Andreas Møgelmose, Lars Bodum, Thomas B. Moeslund

    Abstract: Identifying and classifying underground utilities is an important task for efficient and effective urban planning and infrastructure maintenance. We present OpenTrench3D, a novel and comprehensive 3D Semantic Segmentation point cloud dataset, designed to advance research and development in underground utility surveying and map**. OpenTrench3D covers a completely novel domain for public 3D point… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  5. arXiv:2404.05392  [pdf, other

    cs.CV

    T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos

    Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

    Abstract: In this paper, we introduce T-DEED, a Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in sports videos. T-DEED addresses multiple challenges in the task, including the need for discriminability among frame representations, high output temporal resolution to maintain prediction precision, and the necessity to capture information at different temporal scales to handle e… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  6. arXiv:2404.01891  [pdf, other

    cs.CV

    ASTRA: An Action Spotting TRAnsformer for Soccer Videos

    Authors: Artur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés

    Abstract: In this paper, we introduce ASTRA, a Transformer-based model designed for the task of Action Spotting in soccer matches. ASTRA addresses several challenges inherent in the task and dataset, including the requirement for precise action localization, the presence of a long-tail data distribution, non-visibility in certain actions, and inherent label noise. To do so, ASTRA incorporates (a) a Transfor… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  7. arXiv:2404.01775  [pdf, other

    cs.CV cs.AI cs.LG

    A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

    Authors: Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

    Abstract: The ability to detect unfamiliar or unexpected images is essential for safe deployment of computer vision systems. In the context of classification, the task of detecting images outside of a model's training domain is known as out-of-distribution (OOD) detection. While there has been a growing research interest in develo** post-hoc OOD detection methods, there has been comparably little discussi… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  8. Raw Instinct: Trust Your Classifiers and Skip the Conversion

    Authors: Christos Kantas, Bjørk Antoniussen, Mathias V. Andersen, Rasmus Munksø, Shobhit Kotnala, Simon B. Jensen, Andreas Møgelmose, Lau Nørgaard, Thomas B. Moeslund

    Abstract: Using RAW-images in computer vision problems is surprisingly underexplored considering that converting from RAW to RGB does not introduce any new capture information. In this paper, we show that a sufficiently advanced classifier can yield equivalent results on RAW input compared to RGB and present a new public dataset consisting of RAW images and the corresponding converted RGB images. Classifyin… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: https://www.kaggle.com/datasets/mathiasviborg/raw-instinct

    Journal ref: 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)

  9. arXiv:2402.03043  [pdf, other

    cs.CL cs.LG

    SIDU-TXT: An XAI Algorithm for NLP with a Holistic Assessment Approach

    Authors: Mohammad N. S. Jahromi, Satya. M. Muddamsetty, Asta Sofie Stage Jarlner, Anna Murphy Høgenhaug, Thomas Gammeltoft-Hansen, Thomas B. Moeslund

    Abstract: Explainable AI (XAI) aids in deciphering 'black-box' models. While several methods have been proposed and evaluated primarily in the image domain, the exploration of explainability in the text domain remains a growing research area. In this paper, we delve into the applicability of XAI methods for the text domain. In this context, the 'Similarity Difference and Uniqueness' (SIDU) XAI method, recog… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Preprint submitted to Elsevier on Jan 5th, 2024

  10. arXiv:2309.06006  [pdf, ps, other

    cs.CV cs.AI

    SoccerNet 2023 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

    Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  11. arXiv:2308.16572  [pdf, other

    cs.CV cs.AI cs.LG

    CL-MAE: Curriculum-Learned Masked Autoencoders

    Authors: Neelu Madan, Nicolae-Catalin Ristea, Kamal Nasrollahi, Thomas B. Moeslund, Radu Tudor Ionescu

    Abstract: Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches (tokens) in input images, with the masking strategy remaining unchanged during training. In this paper, we propose a curriculum learning approach that updates the… ▽ More

    Submitted 28 February, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted at WACV 2024

  12. arXiv:2308.04657  [pdf, other

    cs.CV

    Which Tokens to Use? Investigating Token Reduction in Vision Transformers

    Authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund

    Abstract: Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 NIVT Workshop. Project webpage https://vap.aau.dk/tokens

  13. arXiv:2306.14658  [pdf, other

    cs.CV cs.AI cs.LG

    Beyond AUROC & co. for evaluating out-of-distribution detection performance

    Authors: Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

    Abstract: While there has been a growing research interest in develo** out-of-distribution (OOD) detection methods, there has been comparably little discussion around how these methods should be evaluated. Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs. In this work, we take a closer look at the go-t… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: published in SAIAD CVPRW'23 (Safe Artificial Intelligence for All Domains CVPR workshop)

  14. arXiv:2302.10645  [pdf, other

    cs.CV

    BrackishMOT: The Brackish Multi-Object Tracking Dataset

    Authors: Malte Pedersen, Daniel Lehotský, Ivan Nikolov, Thomas B. Moeslund

    Abstract: There exist no publicly available annotated underwater multi-object tracking (MOT) datasets captured in turbid environments. To remedy this we propose the BrackishMOT dataset with focus on tracking schools of small fish, which is a notoriously difficult MOT task. BrackishMOT consists of 98 sequences captured in the wild. Alongside the novel dataset, we present baseline results by training a state-… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  15. SoccerNet 2022 Challenges Results

    Authors: Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao , et al. (69 additional authors not shown)

    Abstract: The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team. In 2022, the challenges were composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving action timestamps in long untrimmed videos, (2) replay grounding, focusing on retrieving the live moment of an action shown in a replay, (3) pitch localization, focusing on det… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted at ACM MMSports 2022

  16. Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection

    Authors: Neelu Madan, Nicolae-Catalin Ristea, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

    Abstract: Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where t… ▽ More

    Submitted 5 October, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence

  17. arXiv:2207.10031  [pdf, other

    cs.CV

    MOTCOM: The Multi-Object Tracking Dataset Complexity Metric

    Authors: Malte Pedersen, Joakim Bruslund Haurum, Patrick Dendorfer, Thomas B. Moeslund

    Abstract: There exists no comprehensive metric for describing the complexity of Multi-Object Tracking (MOT) sequences. This lack of metrics decreases explainability, complicates comparison of datasets, and reduces the conversation on tracker performance to a matter of leader board position. As a remedy, we present the novel MOT dataset complexity metric (MOTCOM), which is a combination of three sub-metrics… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV 2022. Project webpage https://vap.aau.dk/motcom

  18. arXiv:2207.08003  [pdf, other

    cs.CV cs.LG

    SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection

    Authors: Antonio Barbalau, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, Jacob Dueholm, Bharathkumar Ramachandra, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

    Abstract: A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on de… ▽ More

    Submitted 12 February, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

    Comments: Accepted in Computer Vision and Image Understanding

  19. Deep Learning-based Anomaly Detection on X-ray Images of Fuel Cell Electrodes

    Authors: Simon B. Jensen, Thomas B. Moeslund, Søren J. Andreasen

    Abstract: Anomaly detection in X-ray images has been an active and lasting research area in the last decades, especially in the domain of medical X-ray images. For this work, we created a real-world labeled anomaly dataset, consisting of 16-bit X-ray image data of fuel cell electrodes coated with a platinum catalyst solution and perform anomaly detection on the dataset using a deep learning approach. The da… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

    Comments: 10 pages, 9 figures, VISAPP2022

    Journal ref: Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP 2022

  20. Video Transformers: A Survey

    Authors: Javier Selva, Anders S. Johansen, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund, Albert Clapés

    Abstract: Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However, they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced by the temporal dimension. While there are surveys analyzing the advances of Transformers for visio… ▽ More

    Submitted 13 February, 2023; v1 submitted 16 January, 2022; originally announced January 2022.

  21. arXiv:2111.09099  [pdf, other

    cs.CV cs.LG

    Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection

    Authors: Nicolae-Catalin Ristea, Neelu Madan, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B. Moeslund, Mubarak Shah

    Abstract: Anomaly detection is commonly pursued as a one-class classification problem, where models can only learn from normal training samples, while being evaluated on both normal and abnormal test samples. Among the successful approaches for anomaly detection, a distinguished category of methods relies on predicting masked information (e.g. patches, future frames, etc.) and leveraging the reconstruction… ▽ More

    Submitted 14 March, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: Accepted at CVPR 2022. Paper + supplementary (14 pages, 9 figures)

  22. arXiv:2111.07846  [pdf, other

    cs.CV

    Multi-Task Classification of Sewer Pipe Defects and Properties using a Cross-Task Graph Neural Network Decoder

    Authors: Joakim Bruslund Haurum, Meysam Madadi, Sergio Escalera, Thomas B. Moeslund

    Abstract: The sewerage infrastructure is one of the most important and expensive infrastructures in modern society. In order to efficiently manage the sewerage infrastructure, automated sewer inspection has to be utilized. However, while sewer defect classification has been investigated for decades, little attention has been given to classifying sewer pipe properties such as water level, pipe material, and… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: WACV 2022

  23. arXiv:2109.09487  [pdf

    cs.CV cs.AI cs.LG

    Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions

    Authors: David Curto, Albert Clapés, Javier Selva, Sorina Smeureanu, Julio C. S. Jacques Junior, David Gallardo-Pujol, Georgina Guilera, David Leiva, Thomas B. Moeslund, Sergio Escalera, Cristina Palmero

    Abstract: Personality computing has become an emerging topic in computer vision, due to the wide range of applications it can be used for. However, most works on the topic have focused on analyzing the individual, even when applied to interaction scenarios, and for short periods of time. To address these limitations, we present the Dyadformer, a novel multi-modal multi-subject Transformer architecture to mo… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: Accepted to the 2021 ICCV Workshop on Understanding Social Behavior in Dyadic and Small Group Interactions

  24. Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning to Segment Driveability in Egocentric Images

    Authors: Galadrielle Humblot-Renaux, Letizia Marchegiani, Thomas B. Moeslund, Rikke Gade

    Abstract: This work tackles scene understanding for outdoor robotic navigation, solely relying on images captured by an on-board camera. Conventional visual scene understanding interprets the environment based on specific descriptive categories. However, such a representation is not directly interpretable for decision-making and constrains robot operation to a specific domain. Thus, we propose to segment eg… ▽ More

    Submitted 23 January, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: Accepted in Robotics and Automation Letters (RA-L 2022). Supplementary video available at https://youtu.be/q_XfjUDO39Y

    Journal ref: Robotics and Automation Letters 7(2) (2022) 2913-2920

  25. arXiv:2103.10895  [pdf, other

    cs.CV

    Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

    Authors: Joakim Bruslund Haurum, Thomas B. Moeslund

    Abstract: Perhaps surprisingly sewerage infrastructure is one of the most costly infrastructures in modern society. Sewer pipes are manually inspected to determine whether the pipes are defective. However, this process is limited by the number of qualified inspectors and the time it takes to inspect a pipe. Automatization of this process is therefore of high interest. So far, the success of computer vision… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

    Comments: CVPR 2021. Project webpage: https://vap.aau.dk/sewer-ml/

  26. arXiv:2102.03113  [pdf, other

    cs.CV

    Real-World Super-Resolution of Face-Images from Surveillance Cameras

    Authors: Andreas Aakerberg, Kamal Nasrollahi, Thomas B. Moeslund

    Abstract: Most existing face image Super-Resolution (SR) methods assume that the Low-Resolution (LR) images were artificially downsampled from High-Resolution (HR) images with bicubic interpolation. This operation changes the natural image characteristics and reduces noise. Hence, SR methods trained on such data most often fail to produce good results when applied to real LR images. To solve this problem, w… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  27. arXiv:2101.10710  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    Visual explanation of black-box model: Similarity Difference and Uniqueness (SIDU) method

    Authors: Satya M. Muddamsetty, Mohammad N. S. Jahromi, Andreea E. Ciontos, Laura M. Fenoy, Thomas B. Moeslund

    Abstract: Explainable Artificial Intelligence (XAI) has in recent years become a well-suited framework to generate human understandable explanations of "black-box" models. In this paper, a novel XAI visual explanation algorithm known as the Similarity Difference and Uniqueness (SIDU) method that can effectively localize entire object regions responsible for prediction is presented in full detail. The SIDU a… ▽ More

    Submitted 10 July, 2022; v1 submitted 26 January, 2021; originally announced January 2021.

    Journal ref: Pattern Recognition 127 (2022): 108604

  28. arXiv:2011.13367  [pdf, other

    cs.CV

    SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos

    Authors: Adrien Deliège, Anthony Cioppa, Silvio Giancola, Meisam J. Seikavandi, Jacob V. Dueholm, Kamal Nasrollahi, Bernard Ghanem, Thomas B. Moeslund, Marc Van Droogenbroeck

    Abstract: Understanding broadcast videos is a challenging task in computer vision, as it requires generic reasoning capabilities to appreciate the content offered by the video editing. In this work, we propose SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.… ▽ More

    Submitted 19 April, 2021; v1 submitted 26 November, 2020; originally announced November 2020.

    Comments: Paper accepted for the CVsports workshop at CVPR2021. This document contains 8 pages + references + supplementary material

  29. arXiv:2006.08466  [pdf, other

    cs.CV

    3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset

    Authors: Malte Pedersen, Joakim Bruslund Haurum, Stefan Hein Bengtson, Thomas B. Moeslund

    Abstract: In this work we present a novel publicly available stereo based 3D RGB dataset for multi-object zebrafish tracking, called 3D-ZeF. Zebrafish is an increasingly popular model organism used for studying neurological disorders, drug addiction, and more. Behavioral analysis is often a critical part of such research. However, visual similarity, occlusion, and erratic movement of the zebrafish makes rob… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: CVPR 2020. Project webpage: https://vap.aau.dk/3d-zef/

  30. arXiv:2006.03122  [pdf, other

    cs.CV cs.AI cs.LG

    SIDU: Similarity Difference and Uniqueness Method for Explainable AI

    Authors: Satya M. Muddamsetty, Mohammad N. S. Jahromi, Thomas B. Moeslund

    Abstract: A new brand of technical artificial intelligence ( Explainable AI ) research has focused on trying to open up the 'black box' and provide some explainability. This paper presents a novel visual explanation method for deep learning networks in the form of a saliency map that can effectively localize entire object regions. In contrast to the current state-of-the art methods, the proposed method show… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: Accepted manuscript in IEEE International Conference on Image Processing

  31. arXiv:2004.07544  [pdf, other

    cs.CV eess.IV

    Multimodal and multiview distillation for real-time player detection on a football field

    Authors: Anthony Cioppa, Adrien Deliège, Noor Ul Huda, Rikke Gade, Marc Van Droogenbroeck, Thomas B. Moeslund

    Abstract: Monitoring the occupancy of public sports facilities is essential to assess their use and to motivate their construction in new places. In the case of a football field, the area to cover is large, thus several regular cameras should be used, which makes the setup expensive and complex. As an alternative, we developed a system that detects players from a unique cheap and wide-angle fisheye camera a… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted for the CVSports workshop of CVPR 2020 ; 8 pages + references

  32. arXiv:2004.01382  [pdf, other

    cs.CV cs.LG eess.IV

    Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking

    Authors: Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei, Kamal Nasrollahi, Thomas B. Moeslund

    Abstract: Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs… ▽ More

    Submitted 20 September, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: To be appeared in The Visual Computer (International Journal of Computer Graphics), Springer, 2021

  33. arXiv:2004.00292  [pdf, other

    cs.CV

    Evaluation of Model Selection for Kernel Fragment Recognition in Corn Silage

    Authors: Christoffer Bøgelund Rasmussen, Thomas B. Moeslund

    Abstract: Model selection when designing deep learning systems for specific use-cases can be a challenging task as many options exist and it can be difficult to know the trade-off between them. Therefore, we investigate a number of state of the art CNN models for the task of measuring kernel fragmentation in harvested corn silage. The models are evaluated across a number of feature extractors and image size… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: Paper presented at the ICLR 2020 Workshop on Computer Vision for Agriculture (CV4A)

  34. arXiv:2003.14047  [pdf, other

    cs.CV

    Prediction Confidence from Neighbors

    Authors: Mark Philip Philipsen, Thomas Baltzer Moeslund

    Abstract: The inability of Machine Learning (ML) models to successfully extrapolate correct predictions from out-of-distribution (OoD) samples is a major hindrance to the application of ML in critical applications. Until the generalization ability of ML methods is improved it is necessary to keep humans in the loop. The need for human supervision can only be reduced if it is possible to determining a level… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: work in progress

  35. arXiv:2003.14043  [pdf, other

    cs.CV

    Distance in Latent Space as Novelty Measure

    Authors: Mark Philip Philipsen, Thomas Baltzer Moeslund

    Abstract: Deep Learning performs well when training data densely covers the experience space. For complex problems this makes data collection prohibitively expensive. We propose to intelligently select samples when constructing data sets in order to best utilize the available labeling budget. The selection methodology is based on the presumption that two dissimilar samples are worth more than two similar sa… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: work in progress

  36. arXiv:1912.01326  [pdf, other

    cs.CV cs.LG eess.IV

    A Context-Aware Loss Function for Action Spotting in Soccer Videos

    Authors: Anthony Cioppa, Adrien Deliège, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck, Rikke Gade, Thomas B. Moeslund

    Abstract: In video understanding, action spotting consists in temporally localizing human-induced events annotated with single timestamps. In this paper, we propose a novel loss function that specifically considers the temporal context naturally present around each action, rather than focusing on the single annotated frame to spot. We benchmark our loss on a large dataset of soccer videos, SoccerNet, and ac… ▽ More

    Submitted 30 March, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accepted for CVPR2020 main conference. This document contains 8 pages + references + supplementary material

  37. Is it Raining Outside? Detection of Rainfall using General-Purpose Surveillance Cameras

    Authors: Joakim Bruslund Haurum, Chris H. Bahnsen, Thomas B. Moeslund

    Abstract: In integrated surveillance systems based on visual cameras, the mitigation of adverse weather conditions is an active research topic. Within this field, rain removal algorithms have been developed that artificially remove rain streaks from images or video. In order to deploy such rain removal algorithms in a surveillance setting, one must detect if rain is present in the scene. In this paper, we d… ▽ More

    Submitted 3 September, 2021; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: 10 pages, 7 figures, CVPR2019 V4AS workshop. Updated to include Zenodo data repository reference

  38. Rain Removal in Traffic Surveillance: Does it Matter?

    Authors: Chris H. Bahnsen, Thomas B. Moeslund

    Abstract: Varying weather conditions, including rainfall and snowfall, are generally regarded as a challenge for computer vision algorithms. One proposed solution to the challenges induced by rain and snowfall is to artificially remove the rain from images or video using rain removal algorithms. It is the promise of these algorithms that the rain-removed image frames will improve the performance of subseque… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Published in IEEE Transactions on Intelligent Transportation Systems

  39. arXiv:1809.03171  [pdf, other

    cs.CV

    The AAU Multimodal Annotation Toolboxes: Annotating Objects in Images and Videos

    Authors: Chris H. Bahnsen, Andreas Møgelmose, Thomas B. Moeslund

    Abstract: This tech report gives an introduction to two annotation toolboxes that enable the creation of pixel and polygon-based masks as well as bounding boxes around objects of interest. Both toolboxes support the annotation of sequential images in the RGB and thermal modalities. Each annotated object is assigned a classification tag, a unique ID, and one or more optional meta data tags. The toolboxes are… ▽ More

    Submitted 10 September, 2018; originally announced September 2018.

    Comments: 6 pages, 10 figures

  40. arXiv:1805.10078  [pdf

    cs.CV

    A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition

    Authors: Alireza Sepas-Moghaddam, Mohammad A. Haque, Paulo Lobato Correia, Kamal Nasrollahi, Thomas B. Moeslund, Fernando Pereira

    Abstract: Face recognition has attracted increasing attention due to its wide range of applications, but it is still challenging when facing large variations in the biometric data characteristics. Lenslet light field cameras have recently come into prominence to capture rich spatio-angular information, thus offering new possibilities for advanced biometric recognition systems. This paper proposes a double-d… ▽ More

    Submitted 24 April, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

    Comments: Submitted to IEEE Transactions on Circuits and Systems for Video Technology

  41. arXiv:1606.07219  [pdf, ps, other

    cs.IR cs.LG

    Learning Dynamic Classes of Events using Stacked Multilayer Perceptron Networks

    Authors: Nattiya Kanhabua, Huamin Ren, Thomas B. Moeslund

    Abstract: People often use a web search engine to find information about events of interest, for example, sport competitions, political elections, festivals and entertainment news. In this paper, we study a problem of detecting event-related queries, which is the first step before selecting a suitable time-aware retrieval model. In general, event-related information needs can be observed in query streams th… ▽ More

    Submitted 1 July, 2016; v1 submitted 23 June, 2016; originally announced June 2016.

    Comments: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, 6 pages, 4 figures

  42. arXiv:1603.04026  [pdf, ps, other

    cs.CV

    A comprehensive study of sparse codes on abnormality detection

    Authors: Huamin Ren, Hong Pan, Søren Ingvor Olsen, Thomas B. Moeslund

    Abstract: Sparse representation has been applied successfully in abnormal event detection, in which the baseline is to learn a dictionary accompanied by sparse codes. While much emphasis is put on discriminative dictionary construction, there are no comparative studies of sparse codes regarding abnormality detection. We comprehensively study two types of sparse codes solutions - greedy algorithms and convex… ▽ More

    Submitted 13 March, 2016; originally announced March 2016.

    Comments: 7 pages