-
Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection
Authors:
Demetris Lappas,
Vasileios Argyriou,
Dimitrios Makris
Abstract:
We introduce Dynamic Distinction Learning (DDL) for Video Anomaly Detection, a novel video anomaly detection methodology that combines pseudo-anomalies, dynamic anomaly weighting, and a distinction loss function to improve detection accuracy. By training on pseudo-anomalies, our approach adapts to the variability of normal and anomalous behaviors without fixed anomaly thresholds. Our model showcas…
▽ More
We introduce Dynamic Distinction Learning (DDL) for Video Anomaly Detection, a novel video anomaly detection methodology that combines pseudo-anomalies, dynamic anomaly weighting, and a distinction loss function to improve detection accuracy. By training on pseudo-anomalies, our approach adapts to the variability of normal and anomalous behaviors without fixed anomaly thresholds. Our model showcases superior performance on the Ped2, Avenue and ShanghaiTech datasets, where individual models are tailored for each scene. These achievements highlight DDL's effectiveness in advancing anomaly detection, offering a scalable and adaptable solution for video surveillance challenges.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision
Authors:
Sanket Kachole,
Hussain Sajwani,
Fariborz Baghaei Naeini,
Dimitrios Makris,
Yahya Zweiri
Abstract:
Spiking Neural Networks (SNNs) offer a biologically inspired approach to computer vision that can lead to more efficient processing of visual data with reduced energy consumption. However, maintaining homeostasis within these networks is challenging, as it requires continuous adjustment of neural responses to preserve equilibrium and optimal processing efficiency amidst diverse and often unpredict…
▽ More
Spiking Neural Networks (SNNs) offer a biologically inspired approach to computer vision that can lead to more efficient processing of visual data with reduced energy consumption. However, maintaining homeostasis within these networks is challenging, as it requires continuous adjustment of neural responses to preserve equilibrium and optimal processing efficiency amidst diverse and often unpredictable input signals. In response to these challenges, we propose the Asynchronous Bioplausible Neuron (ABN), a dynamic spike firing mechanism to auto-adjust the variations in the input signal. Comprehensive evaluation across various datasets demonstrates ABN's enhanced performance in image classification and segmentation, maintenance of neural equilibrium, and energy efficiency.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Gait Data Augmentation using Physics-Based Biomechanical Simulation
Authors:
Mritula Chandrasekaran,
Jarek Francik,
Dimitrios Makris
Abstract:
This paper focuses on addressing the problem of data scarcity for gait analysis. Standard augmentation methods may produce gait sequences that are not consistent with the biomechanical constraints of human walking. To address this issue, we propose a novel framework for gait data augmentation by using OpenSIM, a physics-based simulator, to synthesize biomechanically plausible walking sequences. Th…
▽ More
This paper focuses on addressing the problem of data scarcity for gait analysis. Standard augmentation methods may produce gait sequences that are not consistent with the biomechanical constraints of human walking. To address this issue, we propose a novel framework for gait data augmentation by using OpenSIM, a physics-based simulator, to synthesize biomechanically plausible walking sequences. The proposed approach is validated by augmenting the WBDS and CASIA-B datasets and then training gait-based classifiers for 3D gender gait classification and 2D gait person identification respectively. Experimental results indicate that our augmentation approach can improve the performance of model-based gait classifiers and deliver state-of-the-art results for gait-based person identification with an accuracy of up to 96.11% on the CASIA-B dataset.
△ Less
Submitted 21 July, 2023; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Asynchronous Events-based Panoptic Segmentation using Graph Mixer Neural Network
Authors:
Sanket Kachole,
Yusra Alkendi,
Fariborz Baghaei Naeini,
Dimitrios Makris,
Yahya Zweiri
Abstract:
In the context of robotic gras**, object segmentation encounters several difficulties when faced with dynamic conditions such as real-time operation, occlusion, low lighting, motion blur, and object size variability. In response to these challenges, we propose the Graph Mixer Neural Network that includes a novel collaborative contextual mixing layer, applied to 3D event graphs formed on asynchro…
▽ More
In the context of robotic gras**, object segmentation encounters several difficulties when faced with dynamic conditions such as real-time operation, occlusion, low lighting, motion blur, and object size variability. In response to these challenges, we propose the Graph Mixer Neural Network that includes a novel collaborative contextual mixing layer, applied to 3D event graphs formed on asynchronous events. The proposed layer is designed to spread spatiotemporal correlation within an event graph at four nearest neighbor levels parallelly. We evaluate the effectiveness of our proposed method on the Event-based Segmentation (ESD) Dataset, which includes five unique image degradation challenges, including occlusion, blur, brightness, trajectory, scale variance, and segmentation of known and unknown objects. The results show that our proposed approach outperforms state-of-the-art methods in terms of mean intersection over the union and pixel accuracy. Code available at: https://github.com/sanket0707/GNN-Mixer.git
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Bimodal SegNet: Instance Segmentation Fusing Events and RGB Frames for Robotic Gras**
Authors:
Sanket Kachole,
Xiaoqian Huang,
Fariborz Baghaei Naeini,
Rajkumar Muthusamy,
Dimitrios Makris,
Yahya Zweiri
Abstract:
Object segmentation for robotic gras** under dynamic conditions often faces challenges such as occlusion, low light conditions, motion blur and object size variance. To address these challenges, we propose a Deep Learning network that fuses two types of visual signals, event-based data and RGB frame data. The proposed Bimodal SegNet network has two distinct encoders, one for each signal input an…
▽ More
Object segmentation for robotic gras** under dynamic conditions often faces challenges such as occlusion, low light conditions, motion blur and object size variance. To address these challenges, we propose a Deep Learning network that fuses two types of visual signals, event-based data and RGB frame data. The proposed Bimodal SegNet network has two distinct encoders, one for each signal input and a spatial pyramidal pooling with atrous convolutions. Encoders capture rich contextual information by pooling the concatenated features at different resolutions while the decoder obtains sharp object boundaries. The evaluation of the proposed method undertakes five unique image degradation challenges including occlusion, blur, brightness, trajectory and scale variance on the Event-based Segmentation (ESD) Dataset. The evaluation results show a 6-10\% segmentation accuracy improvement over state-of-the-art methods in terms of mean intersection over the union and pixel accuracy. The model code is available at https://github.com/sanket0707/Bimodal-SegNet.git
△ Less
Submitted 14 July, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered Environment
Authors:
Xiaoqian Huang,
Kachole Sanket,
Abdulla Ayyad,
Fariborz Baghaei Naeini,
Dimitrios Makris,
Yahya Zweiri
Abstract:
Taking advantage of an event-based camera, the issues of motion blur, low dynamic range and low time sampling of standard cameras can all be addressed. However, there is a lack of event-based datasets dedicated to the benchmarking of segmentation algorithms, especially those that provide depth information which is critical for segmentation in occluded scenes. This paper proposes a new Event-based…
▽ More
Taking advantage of an event-based camera, the issues of motion blur, low dynamic range and low time sampling of standard cameras can all be addressed. However, there is a lack of event-based datasets dedicated to the benchmarking of segmentation algorithms, especially those that provide depth information which is critical for segmentation in occluded scenes. This paper proposes a new Event-based Segmentation Dataset (ESD), a high-quality 3D spatial and temporal dataset for object segmentation in an indoor cluttered environment. Our proposed dataset ESD comprises 145 sequences with 14,166 RGB frames that are manually annotated with instance masks. Overall 21.88 million and 20.80 million events from two event-based cameras in a stereo-graphic configuration are collected, respectively. To the best of our knowledge, this densely annotated and 3D spatial-temporal event-based segmentation benchmark of tabletop objects is the first of its kind. By releasing ESD, we expect to provide the community with a challenging segmentation benchmark with high quality.
△ Less
Submitted 17 February, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses
Authors:
Phoebe Chua,
Dimos Makris,
Dorien Herremans,
Gemma Roig,
Kat Agres
Abstract:
Although media content is increasingly produced, distributed, and consumed in multiple combinations of modalities, how individual modalities contribute to the perceived emotion of a media item remains poorly understood. In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived e…
▽ More
Although media content is increasingly produced, distributed, and consumed in multiple combinations of modalities, how individual modalities contribute to the perceived emotion of a media item remains poorly understood. In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived emotion of media. The data were collected by presenting music videos to participants in three conditions: music, visual, and audiovisual. Participants annotated the music videos for valence and arousal over time, as well as the overall emotion conveyed. We present detailed descriptive statistics for key measures in the dataset and the results of feature importance analyses for each condition. Finally, we propose a novel transfer learning architecture to train Predictive models Augmented with Isolated modality Ratings (PAIR) and demonstrate the potential of isolated modality ratings for enhancing multimodal emotion recognition. Our results suggest that perceptions of arousal are influenced primarily by auditory information, while perceptions of valence are more subjective and can be influenced by both visual and auditory information. The dataset is made publicly available.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
Conditional Drums Generation using Compound Word Representations
Authors:
Dimos Makris,
Guo Zixun,
Maximos Kaliakatsos-Papakostas,
Dorien Herremans
Abstract:
The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a n…
▽ More
The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data. Therefore, we present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) Encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences. We conducted experiments to thoroughly compare the effectiveness of our method to several baselines. Quantitative evaluation shows that our model is able to generate drums sequences that have similar statistical distributions and characteristics to the training corpus. These features include syncopation, compression ratio, and symmetry among others. We also verified, through a listening test, that generated drum sequences sound pleasant, natural and coherent while they "groove" with the given accompaniment.
△ Less
Submitted 21 February, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework
Authors:
Dimos Makris,
Kat R. Agres,
Dorien Herremans
Abstract:
The field of automatic music composition has seen great progress in the last few years, much of which can be attributed to advances in deep neural networks. There are numerous studies that present different strategies for generating sheet music from scratch. The inclusion of high-level musical characteristics (e.g., perceived emotional qualities), however, as conditions for controlling the generat…
▽ More
The field of automatic music composition has seen great progress in the last few years, much of which can be attributed to advances in deep neural networks. There are numerous studies that present different strategies for generating sheet music from scratch. The inclusion of high-level musical characteristics (e.g., perceived emotional qualities), however, as conditions for controlling the generation output remains a challenge. In this paper, we present a novel approach for calculating the valence (the positivity or negativity of the perceived emotion) of a chord progression within a lead sheet, using pre-defined mood tags proposed by music experts. Based on this approach, we propose a novel strategy for conditional lead sheet generation that allows us to steer the music generation in terms of valence, phrasing, and time signature. Our approach is similar to a Neural Machine Translation (NMT) problem, as we include high-level conditions in the encoder part of the sequence-to-sequence architectures used (i.e., long-short term memory networks, and a Transformer network). We conducted experiments to thoroughly analyze these two architectures. The results show that the proposed strategy is able to generate lead sheets in a controllable manner, resulting in distributions of musical attributes similar to those of the training dataset. We also verified through a subjective listening test that our approach is effective in controlling the valence of a generated chord progression.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Parallel Direct Domain Decomposition Methods (D3M) for Finite Elements
Authors:
Javad Moshfegh,
Dimitrios G. Makris,
Marinos N. Vouvakis
Abstract:
A parallel direct solution approach based on domain decomposition method (DDM) and directed acyclic graph (DAG) scheduling is outlined. Computations are represented as a sequence of small tasks that operate on domains of DDM or dense matrix blocks of a reduced matrix. These tasks can be statically scheduled for parallel execution using their DAG dependencies and weights that depend on estimates of…
▽ More
A parallel direct solution approach based on domain decomposition method (DDM) and directed acyclic graph (DAG) scheduling is outlined. Computations are represented as a sequence of small tasks that operate on domains of DDM or dense matrix blocks of a reduced matrix. These tasks can be statically scheduled for parallel execution using their DAG dependencies and weights that depend on estimates of computation and communication costs. Performance comparison with MUMPS 5.1.2 on electrically large problems suggest up to 20% better parallel efficiency, 30% less memory and slightly faster in run-time, while maintaining the same accuracy.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
DeepDrum: An Adaptive Conditional Neural Network
Authors:
Dimos Makris,
Maximos Kaliakatsos-Papakostas,
Katia Lida Kermanidis
Abstract:
Considering music as a sequence of events with multiple complex dependencies, the Long Short-Term Memory (LSTM) architecture has proven very efficient in learning and reproducing musical styles. However, the generation of rhythms requires additional information regarding musical structure and accompanying instruments. In this paper we present DeepDrum, an adaptive Neural Network capable of generat…
▽ More
Considering music as a sequence of events with multiple complex dependencies, the Long Short-Term Memory (LSTM) architecture has proven very efficient in learning and reproducing musical styles. However, the generation of rhythms requires additional information regarding musical structure and accompanying instruments. In this paper we present DeepDrum, an adaptive Neural Network capable of generating drum rhythms under constraints imposed by Feed-Forward (Conditional) Layers which contain musical parameters along with given instrumentation information (e.g. bass and guitar notes). Results on generated drum sequences are presented indicating that DeepDrum is effective in producing rhythms that resemble the learned style, while at the same time conforming to given constraints that were unknown during the training process.
△ Less
Submitted 21 January, 2019; v1 submitted 17 September, 2018;
originally announced September 2018.