-
Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context
Authors:
Lam Pham,
Dusan Salovic,
Anahid Jalali,
Alexander Schindler,
Khoa Tran,
Canh Vu,
Phu X. Nguyen
Abstract:
In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of Mobile…
▽ More
In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of MobileNetV1, MobileNetV2, VGG16, VGG19, ResNet50V2, ResNet152V2, DenseNet121, DenseNet201, and Xception. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages residual-inception architectures and multiple kernels. Given the novel residual-inception (NRI) model, we further evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events occurring in a sound scene recording can help to improve ASC accuracy, then indicate how a sound scene context is well presented by combining both sound scene and sound event information. We conduct extensive experiments on various ASC datasets, including Crowded Scenes, IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, 2022 Task 1. The experimental results on several different ASC challenges highlight two main achievements; the first is to propose robust, general, and low complexity ASC systems which are suitable for real-life applications on a wide range of edge devices and mobiles; the second is to propose an effective visualization method for comprehensively presenting a sound scene context.
△ Less
Submitted 16 October, 2022;
originally announced October 2022.
-
Sound-Dr: Reliable Sound Dataset and Baseline Artificial Intelligence System for Respiratory Illnesses
Authors:
Truong V. Hoang,
Quang H. Nguyen,
Cuong Q. Nguyen,
Phong X. Nguyen,
Hoang D. Nguyen
Abstract:
As the burden of respiratory diseases continues to fall on society worldwide, this paper proposes a high-quality and reliable dataset of human sounds for studying respiratory illnesses, including pneumonia and COVID-19. It consists of coughing, mouth breathing, and nose breathing sounds together with metadata on related clinical characteristics. We also develop a proof-of-concept system for establ…
▽ More
As the burden of respiratory diseases continues to fall on society worldwide, this paper proposes a high-quality and reliable dataset of human sounds for studying respiratory illnesses, including pneumonia and COVID-19. It consists of coughing, mouth breathing, and nose breathing sounds together with metadata on related clinical characteristics. We also develop a proof-of-concept system for establishing baselines and benchmarking against multiple datasets, such as Coswara and COUGHVID. Our comprehensive experiments show that the Sound-Dr dataset has richer features, better performance, and is more robust to dataset shifts in various machine learning tasks. It is promising for a wide range of real-time applications on mobile devices. The proposed dataset and system will serve as practical tools to support healthcare professionals in diagnosing respiratory disorders. The dataset and code are publicly available here: https://github.com/ReML-AI/Sound-Dr/.
△ Less
Submitted 4 August, 2023; v1 submitted 12 January, 2022;
originally announced January 2022.
-
An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification
Authors:
Lam Pham,
Dat Ngo,
Phu X. Nguyen,
Truong Hoang,
Alexander Schindler
Abstract:
This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning framew…
▽ More
This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning frameworks are proposed to deploy either audio or visual input data independently. Finally, results obtained from high-performed deep learning frameworks are fused to achieve the best accuracy score. Our experimental results indicate that audio and visual input factors independently contribute to the SC task's performance. Significantly, an ensemble of deep learning frameworks exploring either audio or visual input data can achieve the best accuracy of 95.7%.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
UAV-Assisted Secure Communications in Terrestrial Cognitive Radio Networks: Joint Power Control and 3D Trajectory Optimization
Authors:
Phu X. Nguyen,
Van-Dinh Nguyen,
Hieu V. Nguyen,
Oh-Soon Shin
Abstract:
This paper considers secure communications for an underlay cognitive radio network (CRN) in the presence of an external eavesdropper (Eve). The secrecy performance of CRNs is usually limited by the primary receiver's interference power constraint. To overcome this issue, we propose to use an unmanned aerial vehicle (UAV) as a friendly jammer to interfere with Eve in decoding the confidential messa…
▽ More
This paper considers secure communications for an underlay cognitive radio network (CRN) in the presence of an external eavesdropper (Eve). The secrecy performance of CRNs is usually limited by the primary receiver's interference power constraint. To overcome this issue, we propose to use an unmanned aerial vehicle (UAV) as a friendly jammer to interfere with Eve in decoding the confidential message from the secondary transmitter (ST). Our goal is to jointly optimize the transmit power and UAV's trajectory in the three-dimensional (3D) space to maximize the average achievable secrecy rate of the secondary system. The formulated optimization problem is nonconvex due to the nonconvexity of the objective and nonconvexity of constraints, which is very challenging to solve. To obtain a suboptimal but efficient solution to the problem, we first transform the original problem into a more tractable form and develop an iterative algorithm for its solution by leveraging the inner approximation framework. We further extend the proposed algorithm to the case of imperfect location information of Eve, where the average worst-case secrecy rate is considered as the objective function. Extensive numerical results are provided to demonstrate the merits of the proposed algorithms over existing approaches.
△ Less
Submitted 25 March, 2020; v1 submitted 21 March, 2020;
originally announced March 2020.
-
An Efficient Spectral Leakage Filtering for IEEE 802.11af in TV White Space
Authors:
Phu Xuan Nguyen,
Thinh Hung Pham,
Trang Hoang,
Oh-Soon Shin
Abstract:
Orthogonal frequency division multiplexing (OFDM) has been widely adopted for modern wireless standards and become a key enabling technology for cognitive radios. However, one of its main drawbacks is significant spectral leakage due to the accumulation of multiple sinc-shaped subcarriers. In this paper, we present a novel pulse sha** scheme for efficient spectral leakage suppression in OFDM bas…
▽ More
Orthogonal frequency division multiplexing (OFDM) has been widely adopted for modern wireless standards and become a key enabling technology for cognitive radios. However, one of its main drawbacks is significant spectral leakage due to the accumulation of multiple sinc-shaped subcarriers. In this paper, we present a novel pulse sha** scheme for efficient spectral leakage suppression in OFDM based physical layer of IEEE 802.11af standard. With conventional pulse sha** filters such as a raised-cosine filter, vestigial symmetry can be used to reduce spectral leakage very effectively. However, these pulse sha** filters require long guard interval, i.e., cyclic prefix in an OFDM system, to avoid inter-symbol interference (ISI), resulting in a loss of spectral efficiency. The proposed pulse sha** method based on asymmetric pulse sha** achieves better spectral leakage suppression and decreases ISI caused by filtering as compared to conventional pulse sha** filters.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.