Search | arXiv e-print repository

Revisiting Interpolation Augmentation for Speech-to-Text Generation

Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, **gbo Zhu, Dapeng Man, Wu Yang

Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: ACL 2024 Findings

arXiv:2404.17617 [pdf, other]

Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning

Authors: Tao Liu, Yuhang Zhang, Zhu Feng, Zhiqin Yang, Chen Xu, Dapeng Man, Wu Yang

Abstract: Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted,we propose a Full Combination… ▽ More Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted,we propose a Full Combination Backdoor Attack (FCBA) method. It aggregates more combined trigger information for a more complete backdoor pattern in the global model. Trained backdoored global model is more resilient to benign updates, leading to a higher attack success rate on the test set. We test on three datasets and evaluate with two models across various settings. FCBA's persistence outperforms SOTA federated learning backdoor attacks. On GTSRB, postattack 120 rounds, our attack success rate rose over 50% from baseline. The core code of our method is available at https://github.com/PhD-TaoLiu/FCBA. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(19): 21359-21367

arXiv:2309.12234 [pdf, ps, other]

Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition

Authors: Chen Xu, Xiaoqian Liu, Erfeng He, Yuhao Zhang, Qianqian Dong, Tong Xiao, **gbo Zhu, Dapeng Man, Wu Yang

Abstract: In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Build… ▽ More In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Building upon the recent advances in CTC application, we develop an enhanced variant, BiL-CTC+, that establishes new state-of-the-art performances on the MuST-C ST benchmarks under resource-constrained scenarios. Intriguingly, our method also yields significant improvements in speech recognition performance, revealing the effect of cross-lingual learning on transcription and demonstrating its broad applicability. The source code is available at https://github.com/xuchennlp/S2T. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: Submitted to ICASSP 2024

arXiv:1803.02326 [pdf]

Comparison of various image fusion methods for impervious surface classification from VNREDSat-1

Authors: Hung V. Luu, Manh V. Pham, Chuc D. Man, Hung Q. Bui, Thanh T. N. Nguyen

Abstract: Impervious surface is an important indicator for urban development monitoring. Accurate urban impervious surfaces map** with VNREDSat-1 remains challenging due to their spectral diversity not captured by individual PAN image. In this artical, five multi-resolution image fusion techniques were compared for classification task of urban impervious surface. The result shows that for VNREDSat-1 datas… ▽ More Impervious surface is an important indicator for urban development monitoring. Accurate urban impervious surfaces map** with VNREDSat-1 remains challenging due to their spectral diversity not captured by individual PAN image. In this artical, five multi-resolution image fusion techniques were compared for classification task of urban impervious surface. The result shows that for VNREDSat-1 dataset, UNB and Wavelet tranform methods are the best techniques reserving spatial and spectral information of original MS image, respectively. However, the UNB technique gives best results when it comes to impervious surface classification especially in the case of shadow area included in non-impervious surface group. △ Less

Submitted 4 May, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

Showing 1–4 of 4 results for author: Man, D