Skip to main content

Showing 1–45 of 45 results for author: Rigoll, G

.
  1. arXiv:2403.12573  [pdf, other

    cs.CV

    Lifting Multi-View Detection and Tracking to the Bird's Eye View

    Authors: Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, Gerhard Rigoll

    Abstract: Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection. Recent advancements in multi-view detection and 3D object recognition have significantly improved performance by strategically projecting all views onto the ground plane and conducting detection analysis from a Bird's Eye View.… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  2. arXiv:2403.07746  [pdf, other

    cs.CV

    Unleashing HyDRa: Hybrid Fusion, Depth Consistency and Radar for Unified 3D Perception

    Authors: Philipp Wolters, Johannes Gilg, Torben Teepe, Fabian Herzog, Anouar Laouichi, Martin Hofmann, Gerhard Rigoll

    Abstract: Low-cost, vision-centric 3D perception systems for autonomous driving have made significant progress in recent years, narrowing the gap to expensive LiDAR-based methods. The primary challenge in becoming a fully reliable alternative lies in robust depth prediction capabilities, as camera-based systems struggle with long detection ranges and adverse lighting and weather conditions. In this work, we… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 10 pages, 4 figures Added eval on VoD

  3. arXiv:2310.13350  [pdf, other

    cs.CV

    EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View

    Authors: Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, Gerhard Rigoll

    Abstract: Multi-view aggregation promises to overcome the occlusion and missed detection challenge in multi-object detection and tracking. Recent approaches in multi-view detection and 3D object detection made a huge performance leap by projecting all views to the ground plane and performing the detection in the Bird's Eye View (BEV). In this paper, we investigate if tracking in the BEV can also bring the n… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 8 pages, 3 figures

  4. arXiv:2309.03110  [pdf, other

    cs.CV

    Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration

    Authors: Johannes Gilg, Torben Teepe, Fabian Herzog, Philipp Wolters, Gerhard Rigoll

    Abstract: Object detectors are at the heart of many semi- and fully autonomous decision systems and are poised to become even more indispensable. They are, however, still lacking in accessibility and can sometimes produce unreliable predictions. Especially concerning in this regard are the -- essentially hand-crafted -- non-maximum suppression algorithms that lead to an obfuscated prediction process and bia… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  5. arXiv:2304.08134  [pdf, other

    cs.CV cs.LG

    Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach

    Authors: Martin Knoche, Gerhard Rigoll

    Abstract: Nowadays, face recognition systems surpass human performance on several datasets. However, there are still edge cases that the machine can't correctly classify. This paper investigates the effect of a combination of machine and human operators in the face verification task. First, we look closer at the edge cases for several state-of-the-art models to discover common datasets' challenging settings… ▽ More

    Submitted 24 August, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

  6. Explainable Model-Agnostic Similarity and Confidence in Face Verification

    Authors: Martin Knoche, Torben Teepe, Stefan Hörmann, Gerhard Rigoll

    Abstract: Recently, face recognition systems have demonstrated remarkable performances and thus gained a vital role in our daily life. They already surpass human face verification accountability in many scenarios. However, they lack explanations for their predictions. Compared to human operators, typical face recognition network system generate only binary decisions without further explanation and insights… ▽ More

    Submitted 16 February, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

  7. arXiv:2208.14167  [pdf, other

    cs.CV

    Synthehicle: Multi-Vehicle Multi-Camera Tracking in Virtual Cities

    Authors: Fabian Herzog, Junpeng Chen, Torben Teepe, Johannes Gilg, Stefan Hörmann, Gerhard Rigoll

    Abstract: Smart City applications such as intelligent traffic routing or accident prevention rely on computer vision methods for exact vehicle localization and tracking. Due to the scarcity of accurately labeled data, detecting and tracking vehicles in 3D from multiple cameras proves challenging to explore. We present a massive synthetic dataset for multiple vehicle tracking and segmentation in multiple ove… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

  8. Octuplet Loss: Make Face Recognition Robust to Image Resolution

    Authors: Martin Knoche, Mohamed Elkadeem, Stefan Hörmann, Gerhard Rigoll

    Abstract: Image resolution, or in general, image quality, plays an essential role in the performance of today's face recognition systems. To address this problem, we propose a novel combination of the popular triplet loss to improve robustness against image resolution via fine-tuning of existing face recognition models. With octuplet loss, we leverage the relationship between high-resolution images and thei… ▽ More

    Submitted 21 March, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

  9. arXiv:2206.03727  [pdf, other

    cs.CV

    Wavelet Regularization Benefits Adversarial Training

    Authors: Jun Yan, Huilin Yin, Xiaoyang Deng, Ziming Zhao, Wancheng Ge, Hao Zhang, Gerhard Rigoll

    Abstract: Adversarial training methods are state-of-the-art (SOTA) empirical defense methods against adversarial examples. Many regularization methods have been proven to be effective with the combination of adversarial training. Nevertheless, such regularization methods are implemented in the time domain. Since adversarial vulnerability can be regarded as a high-frequency phenomenon, it is essential to reg… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: Preprint version

  10. arXiv:2205.13796  [pdf, other

    cs.CV

    Face Morphing: Fooling a Face Recognition System Is Simple!

    Authors: Stefan Hörmann, Tianlin Kong, Torben Teepe, Fabian Herzog, Martin Knoche, Gerhard Rigoll

    Abstract: State-of-the-art face recognition (FR) approaches have shown remarkable results in predicting whether two faces belong to the same identity, yielding accuracies between 92% and 100% depending on the difficulty of the protocol. However, the accuracy drops substantially when exposed to morphed faces, specifically generated to look similar to two identities. To generate morphed faces, we integrate a… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  11. arXiv:2204.07855  [pdf, other

    cs.CV

    Towards a Deeper Understanding of Skeleton-based Gait Recognition

    Authors: Torben Teepe, Johannes Gilg, Fabian Herzog, Stefan Hörmann, Gerhard Rigoll

    Abstract: Gait recognition is a promising biometric with unique properties for identifying individuals from a long distance by their walking patterns. In recent years, most gait recognition methods used the person's silhouette to extract the gait features. However, silhouette images can lose fine-grained spatial information, suffer from (self) occlusion, and be challenging to obtain in real-world scenarios.… ▽ More

    Submitted 16 April, 2022; originally announced April 2022.

    Comments: 8 Pages, 5 figures, Accepted at 17th IEEE Computer Society Workshop on Biometrics 2022 (CVPRW'22)

  12. arXiv:2112.01901  [pdf, other

    cs.CV

    The Box Size Confidence Bias Harms Your Object Detector

    Authors: Johannes Gilg, Torben Teepe, Fabian Herzog, Gerhard Rigoll

    Abstract: Countless applications depend on accurate predictions with reliable confidence estimates from modern object detectors. It is well known, however, that neural networks including object detectors produce miscalibrated confidence estimates. Recent work even suggests that detectors' confidence predictions are biased with respect to object size and position, but it is still unclear how this bias relate… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

  13. Cross-Quality LFW: A Database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

    Authors: Martin Knoche, Stefan Hörmann, Gerhard Rigoll

    Abstract: Real-world face recognition applications often deal with suboptimal image quality or resolution due to different capturing conditions such as various subject-to-camera distances, poor camera settings, or motion blur. This characteristic has an unignorable effect on performance. Recent cross-resolution face recognition approaches used simple, arbitrary, and unrealistic down- and up-scaling techniqu… ▽ More

    Submitted 25 November, 2022; v1 submitted 23 August, 2021; originally announced August 2021.

    Comments: 9 pages, 4 figures, 2 tables

  14. Susceptibility to Image Resolution in Face Recognition and Trainings Strategies

    Authors: Martin Knoche, Stefan Hörmann, Gerhard Rigoll

    Abstract: Face recognition approaches often rely on equal image resolution for verifying faces on two images. However, in practical applications, those image resolutions are usually not in the same range due to different image capture mechanisms or sources. In this work, we first analyze the impact of image resolutions on face verification performance with a state-of-the-art face recognition model. For imag… ▽ More

    Submitted 25 November, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: 19 pages, 15 figures, 2 tables

  15. arXiv:2106.06415  [pdf, ps, other

    cs.CV

    Attention-based Partial Face Recognition

    Authors: Stefan Hörmann, Zeyuan Zhang, Martin Knoche, Torben Teepe, Gerhard Rigoll

    Abstract: Photos of faces captured in unconstrained environments, such as large crowds, still constitute challenges for current face recognition approaches as often faces are occluded by objects or people in the foreground. However, few studies have addressed the task of recognizing partial faces. In this paper, we propose a novel approach to partial face recognition capable of recognizing faces with differ… ▽ More

    Submitted 14 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: To be published in IEEE ICIP 2021

  16. arXiv:2106.03932  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild

    Authors: Okan Köpüklü, Maja Taseska, Gerhard Rigoll

    Abstract: Successful active speaker detection requires a three-stage pipeline: (i) audio-visual encoding for all speakers in the clip, (ii) inter-speaker relation modeling between a reference speaker and the background speakers within each frame, and (iii) temporal modeling for the reference speaker. Each stage of this pipeline plays an important role for the final performance of the created architecture. B… ▽ More

    Submitted 7 September, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted to ICCV 2021

  17. arXiv:2104.01471  [pdf, other

    eess.AS

    Adversarial Joint Training with Self-Attention Mechanism for Robust End-to-End Speech Recognition

    Authors: Lujun Li, Yikai Kang, Yuchen Shi, Ludwig Kürzinger, Tobias Watzel, Gerhard Rigoll

    Abstract: Lately, the self-attention mechanism has marked a new milestone in the field of automatic speech recognition (ASR). Nevertheless, its performance is susceptible to environmental intrusions as the system predicts the next output symbol depending on the full input sequence and the previous predictions. Inspired by the extensive applications of the generative adversarial networks (GANs) in speech enh… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

  18. GaitGraph: Graph Convolutional Network for Skeleton-Based Gait Recognition

    Authors: Torben Teepe, Ali Khan, Johannes Gilg, Fabian Herzog, Stefan Hörmann, Gerhard Rigoll

    Abstract: Gait recognition is a promising video-based biometric for identifying individual walking patterns from a long distance. At present, most gait recognition methods use silhouette images to represent a person in each frame. However, silhouette images can lose fine-grained spatial information, and most papers do not regard how to obtain these silhouettes in complex scenes. Furthermore, silhouette imag… ▽ More

    Submitted 9 June, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

    Comments: 5 pages, 2 figures

  19. Lightweight Multi-Branch Network for Person Re-Identification

    Authors: Fabian Herzog, Xunbo Ji, Torben Teepe, Stefan Hörmann, Johannes Gilg, Gerhard Rigoll

    Abstract: Person Re-Identification aims to retrieve person identities from images captured by multiple cameras or the same cameras in different time instances and locations. Because of its importance in many vision applications from surveillance to human-machine interaction, person re-identification methods need to be reliable and fast. While more and more deep architectures are proposed for increasing perf… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: 5 pages, 1 figure

  20. arXiv:2010.07597  [pdf, other

    eess.AS cs.SD eess.SP

    Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

    Authors: Ludwig Kürzinger, Nicolas Lindae, Palle Klewitz, Gerhard Rigoll

    Abstract: Many end-to-end Automatic Speech Recognition (ASR) systems still rely on pre-processed frequency-domain features that are handcrafted to emulate the human hearing. Our work is motivated by recent advances in integrated learnable feature extraction. For this, we propose Lightweight Sinc-Convolutions (LSC) that integrate Sinc-convolutions with depthwise convolutions as a low-parameter machine-learna… ▽ More

    Submitted 16 October, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted at INTERSPEECH 2020

  21. arXiv:2009.14660  [pdf, other

    cs.CV cs.LG eess.IV

    Driver Anomaly Detection: A Dataset and Contrastive Learning Approach

    Authors: Okan Köpüklü, Jiapeng Zheng, Hang Xu, Gerhard Rigoll

    Abstract: Distracted drivers are more likely to fail to anticipate hazards, which result in car accidents. Therefore, detecting anomalies in drivers' actions (i.e., any action deviating from normal driving) contains the utmost importance to reduce driver-related accidents. However, there are unbounded many anomalous actions that a driver can do while driving, which leads to an 'open set recognition' problem… ▽ More

    Submitted 30 November, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Accepted to IEEE Winter Conference on Applications of Computer Vision (WACV 2021)

  22. arXiv:2009.14639  [pdf, other

    cs.CV cs.LG eess.IV

    Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing

    Authors: Okan Köpüklü, Stefan Hörmann, Fabian Herzog, Hakan Cevikalp, Gerhard Rigoll

    Abstract: Convolutional Neural Networks with 3D kernels (3D-CNNs) currently achieve state-of-the-art results in video recognition tasks due to their supremacy in extracting spatiotemporal features within video frames. There have been many successful 3D-CNN architectures surpassing the state-of-the-art results successively. However, nearly all of them are designed to operate offline creating several serious… ▽ More

    Submitted 18 October, 2021; v1 submitted 30 September, 2020; originally announced September 2020.

  23. arXiv:2007.12892  [pdf, ps, other

    eess.AS cs.CR cs.SD

    MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition

    Authors: Iustina Andronic, Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Gerhard Rigoll, Bernhard U. Seeber

    Abstract: Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attenti… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: Submitted and accepted at SPECOM 2020 conference

  24. arXiv:2007.10723  [pdf, ps, other

    eess.AS cs.SD

    Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

    Authors: Ludwig Kürzinger, Edgar Ricardo Chavez Rosas, Lujun Li, Tobias Watzel, Gerhard Rigoll

    Abstract: Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal C… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: To be published at SPECOM 2020

  25. CTC-Segmentation of Large Corpora for German End-to-end Speech Recognition

    Authors: Ludwig Kürzinger, Dominik Winkelbauer, Lujun Li, Tobias Watzel, Gerhard Rigoll

    Abstract: Recent end-to-end Automatic Speech Recognition (ASR) systems demonstrated the ability to outperform conventional hybrid DNN/ HMM ASR. Aside from architectural improvements in those systems, those models grew in terms of depth, parameters and model capacity. However, these models also require more training data to achieve comparable performance. In this work, we combine freely available corpora f… ▽ More

    Submitted 5 October, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

    Comments: Published at SPECOM 2020

    Journal ref: Speech and Computer (2020)

  26. arXiv:2006.08506  [pdf, ps, other

    eess.AS cs.CL

    Regularized Forward-Backward Decoder for Attention Models

    Authors: Tobias Watzel, Ludwig Kürzinger, Lujun Li, Gerhard Rigoll

    Abstract: Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the decoder. In this paper, we propose a novel regularization technique incorporating a second decoder during the training phase. This decoder is optimized on time-r… ▽ More

    Submitted 28 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  27. arXiv:2006.01615  [pdf, ps, other

    cs.CV

    A Multi-Task Comparator Framework for Kinship Verification

    Authors: Stefan Hörmann, Martin Knoche, Gerhard Rigoll

    Abstract: Approaches for kinship verification often rely on cosine distances between face identification features. However, due to gender bias inherent in these features, it is hard to reliably predict whether two opposite-gender pairs are related. Instead of fine tuning the feature extractor network on kinship verification, we propose a comparator network to cope with this bias. After concatenating both fe… ▽ More

    Submitted 2 June, 2020; originally announced June 2020.

    Comments: To be published in IEEE FG 2020 - RFIW Workshop

  28. arXiv:2003.00951  [pdf, ps, other

    cs.CV

    DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro Hand Gestures and a Real-Time Recognition Framework

    Authors: Okan Köpüklü, Thomas Ledwon, Yao Rong, Neslihan Kose, Gerhard Rigoll

    Abstract: The use of hand gestures provides a natural alternative to cumbersome interface devices for Human-Computer Interaction (HCI) systems. However, real-time recognition of dynamic micro hand gestures from video streams is challenging for in-vehicle scenarios since (i) the gestures should be performed naturally without distracting the driver, (ii) micro hand gestures occur within very short time interv… ▽ More

    Submitted 19 October, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Accepted to IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)

  29. arXiv:1912.04618  [pdf, other

    cs.CV

    Deep Attention Based Semi-Supervised 2D-Pose Estimation for Surgical Instruments

    Authors: Mert Kayhan, Okan Köpüklü, Mhd Hasan Sarhan, Mehmet Yigitsoy, Abouzar Eslami, Gerhard Rigoll

    Abstract: For many practical problems and applications, it is not feasible to create a vast and accurately labeled dataset, which restricts the application of deep learning in many areas. Semi-supervised learning algorithms intend to improve performance by also leveraging unlabeled data. This is very valuable for 2D-pose estimation task where data labeling requires substantial time and is subject to noise.… ▽ More

    Submitted 11 January, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

  30. arXiv:1911.06644  [pdf, other

    cs.CV

    You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

    Authors: Okan Köpüklü, Xiangyu Wei, Gerhard Rigoll

    Abstract: Spatiotemporal action localization requires the incorporation of two sources of information into the designed architecture: (1) temporal information from the previous frames and (2) spatial information from the key frame. Current state-of-the-art approaches usually extract these information with separate networks and use an extra mechanism for fusion to get detections. In this work, we present YOW… ▽ More

    Submitted 18 October, 2021; v1 submitted 15 November, 2019; originally announced November 2019.

  31. arXiv:1911.02086  [pdf, other

    eess.AS cs.CL cs.SD

    Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions

    Authors: Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, Gerhard Rigoll

    Abstract: Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-on and battery-powered application scenarios for smart devices put constraints on hardware resources and power consumption, while also demanding high accuracy as well as real-time capability. Previous architectures first extracted acoustic features and then applied a neural network to classify keyword probabiliti… ▽ More

    Submitted 3 May, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: Accepted at ICASSP 2020

  32. arXiv:1909.05165  [pdf, other

    cs.CV eess.IV

    Comparative Analysis of CNN-based Spatiotemporal Reasoning in Videos

    Authors: Okan Köpüklü, Fabian Herzog, Gerhard Rigoll

    Abstract: Understanding actions and gestures in video streams requires temporal reasoning of the spatial content from different time instants, i.e., spatiotemporal (ST) modeling. In this survey paper, we have made a comparative analysis of different ST modeling techniques for action and gecture recognition tasks. Since Convolutional Neural Networks (CNNs) are proved to be an effective tool as a feature extr… ▽ More

    Submitted 11 January, 2021; v1 submitted 11 September, 2019; originally announced September 2019.

  33. arXiv:1907.08009  [pdf, other

    cs.CV cs.LG eess.IV

    Real-Time Driver State Monitoring Using a CNN Based Spatio-Temporal Approach

    Authors: Neslihan Kose, Okan Kopuklu, Alexander Unnervik, Gerhard Rigoll

    Abstract: Many road accidents occur due to distracted drivers. Today, driver monitoring is essential even for the latest autonomous vehicles to alert distracted drivers in order to take over control of the vehicle in case of emergency. In this paper, a spatio-temporal approach is applied to classify drivers' distraction level and movement decisions using convolutional neural networks (CNNs). We approach thi… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

    Comments: Accepted for publication by the IEEE Intelligent Transportation Systems Conference (ITSC 2019)

  34. arXiv:1905.04668  [pdf, other

    cs.CV

    On Flow Profile Image for Video Representation

    Authors: Mohammadreza Babaee, David Full, Gerhard Rigoll

    Abstract: Video representation is a key challenge in many computer vision applications such as video classification, video captioning, and video surveillance. In this paper, we propose a novel approach for video representation that captures meaningful information including motion and appearance from a sequence of video frames and compacts it into a single image. To this end, we compute the optical flow and… ▽ More

    Submitted 12 May, 2019; originally announced May 2019.

  35. arXiv:1905.04225  [pdf, other

    cs.CV cs.HC cs.LG

    Talking With Your Hands: Scaling Hand Gestures and Recognition With CNNs

    Authors: Okan Köpüklü, Yao Rong, Gerhard Rigoll

    Abstract: The use of hand gestures provides a natural alternative to cumbersome interface devices for Human-Computer Interaction (HCI) systems. As the technology advances and communication between humans and machines becomes more complex, HCI systems should also be scaled accordingly in order to accommodate the introduced complexities. In this paper, we propose a methodology to scale hand gestures by formin… ▽ More

    Submitted 30 August, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

    Comments: Accepted to ICCV 2019 workshop - Observing and Understanding Hands in Action (HANDS 2019)

  36. arXiv:1904.02422  [pdf, other

    cs.CV

    Resource Efficient 3D Convolutional Neural Networks

    Authors: Okan Köpüklü, Neslihan Kose, Ahmet Gunduz, Gerhard Rigoll

    Abstract: Recently, convolutional neural networks with 3D kernels (3D CNNs) have been very popular in computer vision community as a result of their superior ability of extracting spatio-temporal features within video frames compared to 2D CNNs. Although there has been great advances recently to build resource efficient 2D CNN architectures considering memory and power budget, there is hardly any similar re… ▽ More

    Submitted 18 October, 2021; v1 submitted 4 April, 2019; originally announced April 2019.

    Comments: Accepted to ICCV 2019 workshop - Neural Architects

  37. arXiv:1901.10323  [pdf, other

    cs.CV cs.AI

    Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks

    Authors: Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, Gerhard Rigoll

    Abstract: Real-time recognition of dynamic hand gestures from video streams is a challenging task since (i) there is no indication when a gesture starts and ends in the video, (ii) performed gestures should only be recognized once, and (iii) the entire architecture should be designed considering the memory and power budget. In this work, we address these challenges by proposing a hierarchical structure enab… ▽ More

    Submitted 18 October, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Published at IEEE International Conference on Automatic Face and Gesture Recognition (FG 2019) - Best student paper award! -

  38. arXiv:1901.09615  [pdf, other

    cs.CV

    Convolutional Neural Networks with Layer Reuse

    Authors: Okan Köpüklü, Maryam Babaee, Stefan Hörmann, Gerhard Rigoll

    Abstract: A convolutional layer in a Convolutional Neural Network (CNN) consists of many filters which apply convolution operation to the input, capture some special patterns and pass the result to the next layer. If the same patterns also occur at the deeper layers of the network, why wouldn't the same convolutional filters be used also in those layers? In this paper, we propose a CNN architecture, Layer R… ▽ More

    Submitted 1 February, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

    Comments: Computer Vision and Pattern Recognition

  39. arXiv:1811.04091  [pdf, other

    cs.CV

    Multiple People Tracking Using Hierarchical Deep Tracklet Re-identification

    Authors: Maryam Babaee, Ali Athar, Gerhard Rigoll

    Abstract: The task of multiple people tracking in monocular videos is challenging because of the numerous difficulties involved: occlusions, varying environments, crowded scenes, camera parameters and motion. In the tracking-by-detection paradigm, most approaches adopt person re-identification techniques based on computing the pairwise similarity between detections. However, these techniques are less effect… ▽ More

    Submitted 17 November, 2018; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: 13 pages (8 main + 2 bibliography + 5 appendices)

    MSC Class: 68T45 ACM Class: I.2.10; I.4.8; I.2.6; I.4.9; I.5.3

  40. arXiv:1804.08506  [pdf, other

    cs.CV

    Person Identification from Partial Gait Cycle Using Fully Convolutional Neural Network

    Authors: Maryam Babaee, Linwei Li, Gerhard Rigoll

    Abstract: Gait as a biometric property for person identification plays a key role in video surveillance and security applications. In gait recognition, normally, gait feature such as Gait Energy Image (GEI) is extracted from one full gait cycle. However in many circumstances, such a full gait cycle might not be available due to occlusion. Thus, the GEI is not complete giving rise to a degrading in gait-base… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

  41. arXiv:1804.07187  [pdf, other

    cs.CV

    Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition

    Authors: Okan Köpüklü, Neslihan Köse, Gerhard Rigoll

    Abstract: Acquiring spatio-temporal states of an action is the most crucial step for action classification. In this paper, we propose a data level fusion strategy, Motion Fused Frames (MFFs), designed to fuse motion information into static images as better representatives of spatio-temporal states of an action. MFFs can be used as input to any deep learning architecture with very little modification on the… ▽ More

    Submitted 26 April, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

    Comments: Accepted to CVPR 2018 as workshop paper

  42. arXiv:1702.01731  [pdf, other

    cs.CV

    A Deep Convolutional Neural Network for Background Subtraction

    Authors: Mohammadreza Babaee, Duc Tung Dinh, Gerhard Rigoll

    Abstract: In this work, we present a novel background subtraction system that uses a deep Convolutional Neural Network (CNN) to perform the segmentation. With this approach, feature engineering and parameter tuning become unnecessary since the network parameters can be learned from data by training a single CNN that can handle various video scenes. Additionally, we propose a new approach to estimate backgro… ▽ More

    Submitted 6 February, 2017; originally announced February 2017.

  43. arXiv:1609.03695  [pdf, other

    cs.HC

    Blending Entropy: A Term for Addressing Information Density in Mediated Reality

    Authors: Philipp Tiefenbacher, Gerhard Rigoll

    Abstract: The virtuality continuum describes the degrees of positive virtuality under the umbrella term mixed reality. Besides adding virtual information within a mixed environment, diminished reality aims at reducing real world information. Mann defined the term mediated reality (MR), which also considered diminished reality, but without the possibility to describe different degrees of fusion between a mix… ▽ More

    Submitted 15 September, 2016; v1 submitted 13 September, 2016; originally announced September 2016.

    Comments: 6 pages, 1 figure, "Mobile Mediated Reality", Dissertation, TUM, Philipp Tiefenbacher, 2016

  44. arXiv:1412.4616  [pdf, other

    cs.CL cs.SD

    A Broadcast News Corpus for Evaluation and Tuning of German LVCSR Systems

    Authors: Felix Weninger, Björn Schuller, Florian Eyben, Martin Wöllmer, Gerhard Rigoll

    Abstract: Transcription of broadcast news is an interesting and challenging application for large-vocabulary continuous speech recognition (LVCSR). We present in detail the structure of a manually segmented and annotated corpus including over 160 hours of German broadcast news, and propose it as an evaluation framework of LVCSR systems. We show our own experimental results on the corpus, achieved with a sta… ▽ More

    Submitted 15 December, 2014; originally announced December 2014.

    Comments: submitted to INTERSPEECH 2010 on May 3, 2010

  45. arXiv:1406.2895  [pdf, other

    cs.HC cs.CV

    Acoustic Gait-based Person Identification using Hidden Markov Models

    Authors: Jürgen T. Geiger, Maximilian Kneißl, Björn Schuller, Gerhard Rigoll

    Abstract: We present a system for identifying humans by their walking sounds. This problem is also known as acoustic gait recognition. The goal of the system is to analyse sounds emitted by walking persons (mostly the step sounds) and identify those persons. These sounds are characterised by the gait pattern and are influenced by the movements of the arms and legs, but also depend on the type of shoe. We ex… ▽ More

    Submitted 11 June, 2014; originally announced June 2014.