-
Revisiting Interpolation Augmentation for Speech-to-Text Generation
Authors:
Chen Xu,
Jie Wang,
Xiaoqian Liu,
Qianqian Dong,
Chunliang Zhang,
Tong Xiao,
**gbo Zhu,
Dapeng Man,
Wu Yang
Abstract:
Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under…
▽ More
Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning
Authors:
Tao Liu,
Yuhang Zhang,
Zhu Feng,
Zhiqin Yang,
Chen Xu,
Dapeng Man,
Wu Yang
Abstract:
Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted,we propose a Full Combination…
▽ More
Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted,we propose a Full Combination Backdoor Attack (FCBA) method. It aggregates more combined trigger information for a more complete backdoor pattern in the global model. Trained backdoored global model is more resilient to benign updates, leading to a higher attack success rate on the test set. We test on three datasets and evaluate with two models across various settings. FCBA's persistence outperforms SOTA federated learning backdoor attacks. On GTSRB, postattack 120 rounds, our attack success rate rose over 50% from baseline. The core code of our method is available at https://github.com/PhD-TaoLiu/FCBA.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition
Authors:
Chen Xu,
Xiaoqian Liu,
Erfeng He,
Yuhao Zhang,
Qianqian Dong,
Tong Xiao,
**gbo Zhu,
Dapeng Man,
Wu Yang
Abstract:
In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Build…
▽ More
In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Building upon the recent advances in CTC application, we develop an enhanced variant, BiL-CTC+, that establishes new state-of-the-art performances on the MuST-C ST benchmarks under resource-constrained scenarios. Intriguingly, our method also yields significant improvements in speech recognition performance, revealing the effect of cross-lingual learning on transcription and demonstrating its broad applicability. The source code is available at https://github.com/xuchennlp/S2T.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Comparison of various image fusion methods for impervious surface classification from VNREDSat-1
Authors:
Hung V. Luu,
Manh V. Pham,
Chuc D. Man,
Hung Q. Bui,
Thanh T. N. Nguyen
Abstract:
Impervious surface is an important indicator for urban development monitoring. Accurate urban impervious surfaces map** with VNREDSat-1 remains challenging due to their spectral diversity not captured by individual PAN image. In this artical, five multi-resolution image fusion techniques were compared for classification task of urban impervious surface. The result shows that for VNREDSat-1 datas…
▽ More
Impervious surface is an important indicator for urban development monitoring. Accurate urban impervious surfaces map** with VNREDSat-1 remains challenging due to their spectral diversity not captured by individual PAN image. In this artical, five multi-resolution image fusion techniques were compared for classification task of urban impervious surface. The result shows that for VNREDSat-1 dataset, UNB and Wavelet tranform methods are the best techniques reserving spatial and spectral information of original MS image, respectively. However, the UNB technique gives best results when it comes to impervious surface classification especially in the case of shadow area included in non-impervious surface group.
△ Less
Submitted 4 May, 2018; v1 submitted 6 March, 2018;
originally announced March 2018.