Skip to main content

Showing 1–27 of 27 results for author: Schindler, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03110  [pdf, other

    cs.SD cs.AI eess.AS

    A Toolchain for Comprehensive Audio/Video Analysis Using Deep Learning Based Multimodal Approach (A use case of riot or violent context detection)

    Authors: Lam Pham, Phat Lam, Tin Nguyen, Hieu Tang, Alexander Schindler

    Abstract: In this paper, we present a toolchain for a comprehensive audio/video analysis by leveraging deep learning based multimodal approach. To this end, different specific tasks of Speech to Text (S2T), Acoustic Scene Classification (ASC), Acoustic Event Detection (AED), Visual Object Detection (VOD), Image Captioning (IC), and Video Captioning (VC) are conducted and integrated into the toolchain. By co… ▽ More

    Submitted 2 May, 2024; originally announced July 2024.

  2. arXiv:2407.01777  [pdf, other

    cs.SD cs.AI eess.AS

    Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models

    Authors: Lam Pham, Phat Lam, Truong Nguyen, Huyen Nguyen, Alexander Schindler

    Abstract: In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Short-time Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditory-based filters of Mel, Gammatone, linear filters (LF), and dis… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2403.00379  [pdf, other

    eess.AS cs.SD

    The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

    Authors: Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

    Abstract: In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effect… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  4. arXiv:2401.15854  [pdf, other

    cs.CL

    LSTM-based Deep Neural Network With A Focus on Sentence Representation for Sequential Sentence Classification in Medical Scientific Abstracts

    Authors: Phat Lam, Lam Pham, Tin Nguyen, Hieu Tang, Michael Seidl, Medina Andresel, Alexander Schindler

    Abstract: The Sequential Sentence Classification task within the domain of medical abstracts, termed as SSC, involves the categorization of sentences into pre-defined headings based on their roles in conveying critical information in the abstract. In the SSC task, sentences are sequentially related to each other. For this reason, the role of sentence embeddings is crucial for capturing both the semantic inf… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: Submitted to FedCSIS 2024

  5. arXiv:2312.16717  [pdf, other

    cs.CV cs.LG eess.IV

    Landslide Detection and Segmentation Using Remote Sensing Images and Deep Neural Network

    Authors: Cam Le, Lam Pham, Jasmin Lampert, Matthias Schlögl, Alexander Schindler

    Abstract: Knowledge about historic landslide event occurrence is important for supporting disaster risk reduction strategies. Building upon findings from 2022 Landslide4Sense Competition, we propose a deep neural network based system for landslide detection and segmentation from multisource remote sensing image input. We use a U-Net trained with Cross Entropy loss as baseline model. We then improve the U-Ne… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  6. arXiv:2305.09463  [pdf, other

    cs.SD cs.AI eess.AS

    Low-complexity deep learning frameworks for acoustic scene classification using teacher-student scheme and multiple spectrograms

    Authors: Lam Pham, Dat Ngo, Cam Le, Anahid Jalali, Alexander Schindler

    Abstract: In this technical report, a low-complexity deep learning system for acoustic scene classification (ASC) is presented. The proposed system comprises two main phases: (Phase I) Training a teacher network; and (Phase II) training a student network using distilled knowledge from the teacher. In the first phase, the teacher, which presents a large footprint model, is trained. After training the teacher… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2206.06057

  7. arXiv:2305.01476  [pdf, other

    cs.SD cs.MM eess.AS

    Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video Classification

    Authors: Lam Pham, Trang Le, Cam Le, Dat Ngo, Weissenfeld Axel, Alexander Schindler

    Abstract: In this paper, we present a deep learning based multimodal system for classifying daily life videos. To train the system, we propose a two-phase training strategy. In the first training phase (Phase I), we extract the audio and visual (image) data from the original video. We then train the audio data and the visual data with independent deep learning based models. After the training processes, we… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

  8. arXiv:2302.13028  [pdf, other

    cs.CV cs.AI cs.LG

    A Light-weight Deep Learning Model for Remote Sensing Image Classification

    Authors: Lam Pham, Cam Le, Dat Ngo, Anh Nguyen, Jasmin Lampert, Alexander Schindler, Ian McLoughlin

    Abstract: In this paper, we present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the aerial scene of a remote sensing image. To this end, we first valuate various benchmark convolutional neural network (CNN) architectures: MobileNet V1/V2, ResNet 50/151V2, InceptionV3/InceptionResNetV2, EfficientNet B0/B7, DenseNet 121/201, C… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  9. arXiv:2210.08610  [pdf, other

    cs.SD cs.AI eess.AS

    Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

    Authors: Lam Pham, Dusan Salovic, Anahid Jalali, Alexander Schindler, Khoa Tran, Canh Vu, Phu X. Nguyen

    Abstract: In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of Mobile… ▽ More

    Submitted 16 October, 2022; originally announced October 2022.

  10. arXiv:2206.13392  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Remote Sensing Image Classification using Transfer Learning and Attention Based Deep Neural Network

    Authors: Lam Pham, Khoa Tran, Dat Ngo, Jasmin Lampert, Alexander Schindler

    Abstract: The task of remote sensing image scene classification (RSISC), which aims at classifying remote sensing images into groups of semantic categories based on their contents, has taken the important role in a wide range of applications such as urban planning, natural hazards detection, environment monitoring,vegetation map**, or geospatial object detection. During the past years, research community… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  11. arXiv:2206.06057  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Low-complexity deep learning frameworks for acoustic scene classification

    Authors: Lam Pham, Dat Ngo, Anahid Jalali, Alexander Schindler

    Abstract: In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. N… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

  12. arXiv:2203.12314  [pdf, other

    cs.SD cs.LG eess.AS

    Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

    Authors: Lam Pham, Khoa Dinh, Dat Ngo, Hieu Tang, Alexander Schindler

    Abstract: In this paper, we present a robust and low complexity system for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue. To further improve the performance but still satisfy the low complexity… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: This paper was submitted to INTERSPEECH 2022

  13. arXiv:2201.03054  [pdf, ps, other

    cs.SD eess.AS

    An Ensemble of Deep Learning Frameworks Applied For Predicting Respiratory Anomalies

    Authors: Lam Pham, Dat Ngo, Truong Hoang, Alexander Schindler, Ian McLoughlin

    Abstract: In this paper, we evaluate various deep learning frameworks for detecting respiratory anomalies from input audio recordings. To this end, we firstly transform audio respiratory cycles collected from patients into spectrograms where both temporal and spectral features are presented, referred to as the front-end feature extraction. We then feed the spectrograms into back-end deep learning networks f… ▽ More

    Submitted 9 January, 2022; originally announced January 2022.

  14. arXiv:2112.09172  [pdf, ps, other

    cs.CV cs.LG eess.IV

    An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification

    Authors: Lam Pham, Dat Ngo, Phu X. Nguyen, Truong Hoang, Alexander Schindler

    Abstract: This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning framew… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  15. arXiv:2106.06840  [pdf, ps, other

    cs.SD eess.AS

    Deep Learning Frameworks Applied For Audio-Visual Scene Classification

    Authors: Lam Pham, Alexander Schindler, Mina Schütz, Jasmin Lampert, Sven Schlarb, Ross King

    Abstract: In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: 6 pages

  16. arXiv:2106.06838  [pdf, ps, other

    cs.SD eess.AS

    A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

    Authors: Lam Pham, Hieu Tang, Anahid Jalali, Alexander Schindler, Ross King

    Abstract: In this paper, we presents a low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed framework can be separated into three main steps: Front-end spectrogram extraction, back-end classification, and late fusion of predicted probabilities. First, we use Mel filter, Gammatone filter and Constant Q Transfrom (CQT) to transform raw audio signal into spectrograms, w… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

  17. arXiv:2106.04908  [pdf, other

    cs.CL cs.AI

    Automatic Sexism Detection with Multilingual Transformer Models

    Authors: Mina Schütz, Jaqueline Boeck, Daria Liakhovets, Djordje Slijepčević, Armin Kirchknopf, Manuel Hecht, Johannes Bogensperger, Sven Schlarb, Alexander Schindler, Matthias Zeppelzauer

    Abstract: Sexism has become an increasingly major problem on social networks during the last years. The first shared task on sEXism Identification in Social neTworks (EXIST) at IberLEF 2021 is an international competition in the field of Natural Language Processing (NLP) with the aim to automatically identify sexism in social media content by applying machine learning methods. Thereby sexism detection is fo… ▽ More

    Submitted 8 February, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: Technical Report to the AIT_FHSTP EXIST 2021 Challenge contribution (under review) http://nlp.uned.es/exist2021/

  18. arXiv:2004.01023  [pdf, other

    cs.MM cs.CV cs.CY cs.SD eess.AS

    Multi-Modal Video Forensic Platform for Investigating Post-Terrorist Attack Scenarios

    Authors: Alexander Schindler, Andrew Lindley, Anahid Jalali, Martin Boyer, Sergiu Gordea, Ross King

    Abstract: The forensic investigation of a terrorist attack poses a significant challenge to the investigative authorities, as often several thousand hours of video footage must be viewed. Large scale Video Analytic Platforms (VAP) assist law enforcement agencies (LEA) in identifying suspects and securing evidence. Current platforms focus primarily on the integration of different computer vision methods and… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Journal ref: In Proceedings of the 11th ACM Multimedia Systems Conference (MMSys2020), June 06-11, 2020, Istanbul, Turkey

  19. arXiv:2003.12265  [pdf, other

    cs.MM cs.IR cs.LG

    Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text

    Authors: Alexander Schindler, Sergiu Gordea, Peter Knees

    Abstract: We present an approach to unsupervised audio representation learning. Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness. By applying Latent Semantic Indexing (LSI) we embed corresponding textual information into a latent vector space from which we derive track relatedness for online triplet selection. This… ▽ More

    Submitted 27 March, 2020; originally announced March 2020.

    Comments: This is the long version of our SAC2020 poster presentation

    Journal ref: In Proceedings of the 35th ACM/SIGAPP Symposium On Applied Computing (SAC2020), March 30-April 3, 2020, Brno, Czech Republic

  20. arXiv:2002.00251  [pdf, other

    cs.MM cs.CV cs.IR cs.SD eess.AS

    Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis

    Authors: Alexander Schindler

    Abstract: This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. This thesis focuses on the information provided by the visual layer of music videos and how it can be harnessed to augment and improve tasks of the MIR research domain. The main hypothesis of this work is based on the observation that certain expressive categ… ▽ More

    Submitted 1 February, 2020; originally announced February 2020.

    Comments: Dissertation at TU Wien

  21. arXiv:2001.05266  [pdf, other

    cs.IR cs.LG cs.SD eess.AS

    Deep Learning for MIR Tutorial

    Authors: Alexander Schindler, Thomas Lidy, Sebastian Böck

    Abstract: Deep Learning has become state of the art in visual computing and continuously emerges into the Music Information Retrieval (MIR) and audio retrieval domain. In order to bring attention to this topic we propose an introductory tutorial on deep learning for MIR. Besides a general introduction to neural networks, the proposed tutorial covers a wide range of MIR relevant deep learning approaches. \te… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

    Comments: This is a description of a tutorial held at the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. 2018

  22. arXiv:1909.07730  [pdf, other

    cs.MM

    Multi-Task Music Representation Learning from Multi-Label Embeddings

    Authors: Alexander Schindler, Peter Knees

    Abstract: This paper presents a novel approach to music representation learning. Triplet loss based networks have become popular for representation learning in various multimedia retrieval domains. Yet, one of the most crucial parts of this approach is the appropriate selection of triplets, which is indispensable, considering that the number of possible triplets grows cubically. We present an approach to ha… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: Best Student Paper award

    Journal ref: Proceedings of the International Conference on Content-Based Multimedia Indexing (CBMI2019)

  23. arXiv:1904.07686  [pdf, other

    cs.LG stat.ML

    Predicting Time-to-Failure of Plasma Etching Equipment using Machine Learning

    Authors: Anahid Jalali, Clemens Heistracher, Alexander Schindler, Bernhard Haslhofer, Tanja Nemeth, Robert Glawar, Wilfried Sihn, Peter De Boer

    Abstract: Predicting unscheduled breakdowns of plasma etching equipment can reduce maintenance costs and production losses in the semiconductor industry. However, plasma etching is a complex procedure and it is hard to capture all relevant equipment properties and behaviors in a single physical model. Machine learning offers an alternative for predicting upcoming machine failures based on relevant data poin… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.

    Comments: 8 pages, 10 figures, accepted in IEEEE/PHM 2019 Conference

  24. arXiv:1811.11623  [pdf, other

    cs.AI cs.CV

    Large Scale Audio-Visual Video Analytics Platform for Forensic Investigations of Terroristic Attacks

    Authors: Alexander Schindler, Martin Boyer, Andrew Lindley, David Schreiber, Thomas Philipp

    Abstract: The forensic investigation of a terrorist attack poses a huge challenge to the investigative authorities, as several thousand hours of video footage need to be spotted. To assist law enforcement agencies (LEA) in identifying suspects and securing evidences, we present a platform which fuses information of surveillance cameras and video uploads from eyewitnesses. The platform integrates analytical… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Journal ref: 25th International Conference on MultiMedia Modeling (MMM2019)

  25. arXiv:1811.04448  [pdf, ps, other

    cs.SD eess.AS

    A Multi-modal Deep Neural Network approach to Bird-song identification

    Authors: Botond Fazeka, Alexander Schindler, Thomas Lidy, Andreas Rauber

    Abstract: We present a multi-modal Deep Neural Network (DNN) approach for bird song identification. The presented approach takes both audio samples and metadata as input. The audio is fed into a Convolutional Neural Network (CNN) using four convolutional layers. The additionally provided metadata is processed using fully connected layers. The flattened convolutional layers and the fully connected layer of t… ▽ More

    Submitted 11 November, 2018; originally announced November 2018.

    Comments: LifeCLEF 2017 working notes, Dublin, Ireland

  26. arXiv:1811.04419  [pdf, other

    cs.SD cs.MM eess.AS

    Multi-Temporal Resolution Convolutional Neural Networks for Acoustic Scene Classification

    Authors: Alexander Schindler, Thomas Lidy, Andreas Rauber

    Abstract: In this paper we present a Deep Neural Network architecture for the task of acoustic scene classification which harnesses information from increasing temporal resolutions of Mel-Spectrogram segments. This architecture is composed of separated parallel Convolutional Neural Networks which learn spectral and temporal representations for each input resolution. The resolutions are chosen to cover fine-… ▽ More

    Submitted 11 November, 2018; originally announced November 2018.

    Comments: In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), November 2017

  27. arXiv:1811.04374  [pdf, other

    cs.CV

    Fashion and Apparel Classification using Convolutional Neural Networks

    Authors: Alexander Schindler, Thomas Lidy, Stephan Karner, Matthias Hecker

    Abstract: We present an empirical study of applying deep Convolutional Neural Networks (CNN) to the task of fashion and apparel image classification to improve meta-data enrichment of e-commerce applications. Five different CNN architectures were analyzed using clean and pre-trained models. The models were evaluated in three different tasks person detection, product and gender classification, on two small a… ▽ More

    Submitted 11 November, 2018; originally announced November 2018.

    Comments: Proceedings of the 10th Forum Media Technology and 3rd All Around Audio Symposium, St. Poelten, Austria, November 29-30, 2017