-
Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor
Authors:
Yongjie Si,
Yanxiong Li,
Jialong Li,
Jiaxin Tan,
Qianhua He
Abstract:
It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessi…
▽ More
It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessions. Moreover, we propose a method using expandable dual-embedding extractor to solve it. The proposed model consists of an embedding extractor and an expandable classifier. The embedding extractor consists of a pretrained Audio Spectrogram Transformer (AST) and a finetuned AST. The expandable classifier consists of prototypes and each prototype represents a class. Experiments are conducted on three datasets (LS-100, NSynth-100 and FSC-89). Results show that our method exceeds seven baseline ones in average accuracy with statistical significance. Code is at: https://github.com/YongjieSi/EDE.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network
Authors:
Yanxiong Li,
Jiaxin Tan,
Guoqing Chen,
Jialong Li,
Yongjie Si,
Qianhua He
Abstract:
This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextu…
▽ More
This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextual information from each audio clip. In addition, we integrate other techniques into our method, such as knowledge distillation, data augmentation, and adaptive residual normalization. When evaluated on the official dataset of DCASE2023 challenge, our method obtains the highest accuracy of 56.10% with parameter number of 5.21 kilo and multiply-accumulate operations of 1.44 million. It exceeds the top two systems of DCASE2023 challenge in accuracy and complexity, and obtains state-of-the-art result. Code is at: https://github.com/Jessytan/Low-complexity-ASC.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Patterned Beam Training: A Novel Low-Complexity and Low-Overhead Scheme for ELAA
Authors:
Hongkang Yu,
Yuan Si,
Shujuan Zhang,
Yijian Chen
Abstract:
Extremely large antenna arrays (ELAAs) can provide higher spectral efficiency. However, the use of narrower beams for data transmission significantly increases the overhead associated with beam training. In this letter, we propose a novel patterned beam training (PBT) scheme characterized by its low overhead and complexity. This scheme requires only a single linear operation by both the base stati…
▽ More
Extremely large antenna arrays (ELAAs) can provide higher spectral efficiency. However, the use of narrower beams for data transmission significantly increases the overhead associated with beam training. In this letter, we propose a novel patterned beam training (PBT) scheme characterized by its low overhead and complexity. This scheme requires only a single linear operation by both the base station and the user equipment to determine the optimal beam, reducing the training overhead to half or even less compared to traditional exhaustive search methods. Furthermore, We discuss the pattern design principles in detail and provide specific forms. Simulation results demonstrate that the proposed scheme outperforms the compared methods in terms of beam alignment accuracy and achieves a balance between signal-to-noise ratio (SNR) conditions and training overhead, making it a promising alternative.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Fast Beam Training and Performance Analysis for Extremely Large Aperture Array
Authors:
Yuan Si,
Hongkang Yu,
Yijian Chen
Abstract:
Extremely large aperture array (ELAA) can significantly enhance beamforming gain and spectral efficiency. Unfortunately, the use of narrower beams for data transmission results in a substantial increase in the cost of beam training. In this paper, we study a high-efficiency and low-overhead scheme named hash beam training. Specifically, two improved hash codebook design methods, random and fixed,…
▽ More
Extremely large aperture array (ELAA) can significantly enhance beamforming gain and spectral efficiency. Unfortunately, the use of narrower beams for data transmission results in a substantial increase in the cost of beam training. In this paper, we study a high-efficiency and low-overhead scheme named hash beam training. Specifically, two improved hash codebook design methods, random and fixed, are proposed. Moreover, we analyze beam alignment performance. Since the derived beam alignment success probability is a complex function, we also propose a heuristic metric to evaluate the impact of codebook parameter on performance. Finally, simulation results validate the theoretical analysis, indicating that the proposed beam training scheme can achieve fast beam alignment with lower overhead and higher accuracy.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
AOSR-Net: All-in-One Sandstorm Removal Network
Authors:
Yazhong Si,
Xulong Zhang,
Fan Yang,
Jianzong Wang,
Ning Cheng,
**g Xiao
Abstract:
Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios. In addition, these approaches often adopt a strategy of color correction followed by dust removal, which makes the algorithm structure too complex. To solve the issue, we introduce a novel image restoration model, named all-in-one…
▽ More
Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios. In addition, these approaches often adopt a strategy of color correction followed by dust removal, which makes the algorithm structure too complex. To solve the issue, we introduce a novel image restoration model, named all-in-one sandstorm removal network (AOSR-Net). This model is developed based on a re-formulated sandstorm scattering model, which directly establishes the image map** relationship by integrating intermediate parameters. Such integration scheme effectively addresses the problems of over-enhancement and weak generalization in the field of sand dust image enhancement. Experimental results on synthetic and real-world sandstorm images demonstrate the superiority of the proposed AOSR-Net over state-of-the-art (SOTA) algorithms.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information
Authors:
Heqing Zou,
Yuke Si,
Chen Chen,
Deepu Rajan,
Eng Siong Chng
Abstract:
Speech Emotion Recognition (SER) aims to help the machine to understand human's subjective emotion from only audio information. However, extracting and utilizing comprehensive in-depth audio information is still a challenging task. In this paper, we propose an end-to-end speech emotion recognition system using multi-level acoustic information with a newly designed co-attention module. We firstly e…
▽ More
Speech Emotion Recognition (SER) aims to help the machine to understand human's subjective emotion from only audio information. However, extracting and utilizing comprehensive in-depth audio information is still a challenging task. In this paper, we propose an end-to-end speech emotion recognition system using multi-level acoustic information with a newly designed co-attention module. We firstly extract multi-level acoustic information, including MFCC, spectrogram, and the embedded high-level acoustic information with CNN, BiLSTM and wav2vec2, respectively. Then these extracted features are treated as multimodal inputs and fused by the proposed co-attention mechanism. Experiments are carried on the IEMOCAP dataset, and our model achieves competitive performance with two different speaker-independent cross-validation strategies. Our code is available on GitHub.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
A comprehensive benchmark analysis for sand dust image reconstruction
Authors:
Yazhong Si,
Fan Yang,
Ya Guo,
Wei Zhang,
Yipu Yang
Abstract:
Numerous sand dust image enhancement algorithms have been proposed in recent years. To our best acknowledge, however, most methods evaluated their performance with no-reference way using few selected real-world images from internet. It is unclear how to quantitatively analysis the performance of the algorithms in a supervised way and how we could gauge the progress in the field. Moreover, due to t…
▽ More
Numerous sand dust image enhancement algorithms have been proposed in recent years. To our best acknowledge, however, most methods evaluated their performance with no-reference way using few selected real-world images from internet. It is unclear how to quantitatively analysis the performance of the algorithms in a supervised way and how we could gauge the progress in the field. Moreover, due to the absence of large-scale benchmark datasets, there are no well-known reports of data-driven based method for sand dust image enhancement up till now. To advance the development of deep learning-based algorithms for sand dust image reconstruction, while enabling supervised objective evaluation of algorithm performance. In this paper, we presented a comprehensive perceptual study and analysis of real-world sand dust images, then constructed a Sand-dust Image Reconstruction Benchmark (SIRB) for training Convolutional Neural Networks (CNNs) and evaluating algorithms performance. In addition, we adopted the existing image transformation neural network trained on SIRB as baseline to illustrate the generalization of SIRB for training CNNs. Finally, we conducted the qualitative and quantitative evaluation to demonstrate the performance and limitations of the state-of-the-arts (SOTA), which shed light on future research in sand dust image reconstruction.
△ Less
Submitted 19 April, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
ECG Beats Fast Classification Base on Sparse Dictionaries
Authors:
Nanyu Li,
Yujuan Si,
Di Wang,
Tong Liu,
**run Yu
Abstract:
Feature extraction plays an important role in Electrocardiogram (ECG) Beats classification system. Compared to other popular methods, VQ method performs well in feature extraction from ECG with advantages of dimensionality reduction. In VQ method, a set of dictionaries corresponding to segments of ECG beats is trained, and VQ codes are used to represent each heartbeat. However, in practice, VQ cod…
▽ More
Feature extraction plays an important role in Electrocardiogram (ECG) Beats classification system. Compared to other popular methods, VQ method performs well in feature extraction from ECG with advantages of dimensionality reduction. In VQ method, a set of dictionaries corresponding to segments of ECG beats is trained, and VQ codes are used to represent each heartbeat. However, in practice, VQ codes optimized by k-means or k-means++ exist large quantization errors, which results in VQ codes for two heartbeats of the same type being very different. So the essential differences between different types of heartbeats cannot be representative well. On the other hand, VQ uses too much data during codebook construction, which limits the speed of dictionary learning. In this paper, we propose a new method to improve the speed and accuracy of VQ method. To reduce the computation of codebook construction, a set of sparse dictionaries corresponding to wave segments of ECG beats is constructed. After initialized, sparse dictionaries are updated efficiently by Feature-sign and Lagrange dual algorithm. Based on those dictionaries, a set of codes can be computed to represent original ECG beats.Experimental results show that features extracted from ECG by our method are more efficient and separable. The accuracy of our method is higher than other methods with less time consumption of feature extraction
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
Hierarchical emotion-recognition framework based on discriminative brain neural network topology and ensemble co-decision strategy
Authors:
Cunbo Li,
Peiyang Li,
Yangsong Zhang,
Ning Li,
Ya**g Si,
Fali Li,
Dezhong Yao,
Peng Xu
Abstract:
Brain neural networks characterize various information propagation patterns for different emotional states. However, the statistical features based on traditional graph theory may ignore the spacial network difference. To reveal these inherent spatial features and increase the stability of emotional recognition, we proposed a hierarchical framework that can perform the multiple emotion recognition…
▽ More
Brain neural networks characterize various information propagation patterns for different emotional states. However, the statistical features based on traditional graph theory may ignore the spacial network difference. To reveal these inherent spatial features and increase the stability of emotional recognition, we proposed a hierarchical framework that can perform the multiple emotion recognitions with the multiple emotion-related spatial network topology patterns (MESNP) by combining a supervised learning with ensemble co-decision strategy. To evaluate the performance of our proposed MESNP approach, we conduct both off-line and simulated on-line experiments with two public datasets i.e., MAHNOB and DEAP. The experiment results demonstrated that MESNP can significantly enhance the classification performance for the multiple emotions. The highest accuracies of off-line experiments for MAHNOB-HCI and DEAP achieved 99.93% (3 classes) and 83.66% (4 classes), respectively. For simulated on-line experiments, we also obtained the best classification accuracies with 100% (3 classes) for MAHNOB and 99.22% (4 classes) for DEAP by proposed MESNP. These results further proved the efficiency of MESNP for structured feature extraction in mult-classification emotional task.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Reconfiguration of Brain Network between Resting-state and Oddball Paradigm
Authors:
Fali Li,
Chanlin Yi,
Yuanyuan Liao,
Yuanling Jiang,
Ya**g Si,
Limeng Song,
Tao Zhang,
Dezhong Yao,
Yangsong Zhang,
Zehong Cao,
Peng Xu
Abstract:
The oddball paradigm is widely applied to the investigation of multiple cognitive functions. Prior studies have explored the cortical oscillation and power spectral differing from the resting-state conduction to oddball paradigm, but whether brain networks existing the significant difference is still unclear. Our study addressed how the brain reconfigures its architecture from a resting-state cond…
▽ More
The oddball paradigm is widely applied to the investigation of multiple cognitive functions. Prior studies have explored the cortical oscillation and power spectral differing from the resting-state conduction to oddball paradigm, but whether brain networks existing the significant difference is still unclear. Our study addressed how the brain reconfigures its architecture from a resting-state condition (i.e., baseline) to P300 stimulus task in the visual oddball paradigm. In this study, electroencephalogram (EEG) datasets were collected from 24 postgraduate students, who were required to only mentally count the number of target stimulus; afterwards the functional EEG networks constructed in different frequency bands were compared between baseline and oddball task conditions to evaluate the reconfiguration of functional network in the brain. Compared to the baseline, our results showed the significantly (p < 0.05) enhanced delta/theta EEG connectivity and decreased alpha default mode network in the progress of brain reconfiguration to the P300 task. Furthermore, the reconfigured coupling strengths were demonstrated to relate to P300 amplitudes, which were then regarded as input features to train a classifier to differentiate the high and low P300 amplitudes groups with an accuracy of 77.78%. The findings of our study help us to understand the changes of functional brain connectivity from resting-state to oddball stimulus task, and the reconfigured network pattern has the potential for the selection of good subjects for P300-based brain- computer interface.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.