Skip to main content

Showing 1–50 of 67 results for author: Tian, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18301  [pdf, other

    eess.AS cs.CL cs.SD

    MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

    Authors: Song Li, Yongbin You, Xuezhi Wang, Zhengkun Tian, Ke Ding, Guanglu Wan

    Abstract: Recently, multilingual artificial intelligence assistants, exemplified by ChatGPT, have gained immense popularity. As a crucial gateway to human-computer interaction, multilingual automatic speech recognition (ASR) has also garnered significant attention, as evidenced by systems like Whisper. However, the proprietary nature of the training data has impeded researchers' efforts to study multilingua… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  2. arXiv:2405.16965  [pdf, ps, other

    cs.IT eess.SP

    Timeliness of Status Update System: The Effect of Parallel Transmission Using Heterogeneous Updating Devices

    Authors: Zhengchuan Chen, Kang Lang, Nikolaos Pappas, Howard H. Yang, Min Wang, Zhong Tian, Tony Q. S. Quek

    Abstract: Timely status updating is the premise of emerging interaction-based applications in the Internet of Things (IoT). Using redundant devices to update the status of interest is a promising method to improve the timeliness of information. However, parallel status updating leads to out-of-order arrivals at the monitor, significantly challenging timeliness analysis. This work studies the Age of Informat… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  4. arXiv:2402.17723  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

    Authors: Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

    Abstract: Video and audio content creation serves as the core technique for the movie industry and professional users. Recently, existing diffusion-based methods tackle video and audio generation separately, which hinders the technique transfer from academia to industry. In this work, we aim at filling the gap, with a carefully designed optimization-based framework for cross-visual-audio and joint-visual-au… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024. Project website: https://yzxing87.github.io/Seeing-and-Hearing/

  5. arXiv:2402.17550  [pdf, other

    cs.NI cs.AI eess.SP

    Emergency Caching: Coded Caching-based Reliable Map Transmission in Emergency Networks

    Authors: Zeyu Tian, Lianming Xu, Liang Li, Li Wang, Aiguo Fei

    Abstract: Many rescue missions demand effective perception and real-time decision making, which highly rely on effective data collection and processing. In this study, we propose a three-layer architecture of emergency caching networks focusing on data collection and reliable transmission, by leveraging efficient perception and edge caching technologies. Based on this architecture, we propose a disaster map… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  6. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, **gcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  7. arXiv:2310.00687  [pdf, ps, other

    eess.SP

    DISCO Might Not Be Funky: Random Intelligent Reflective Surface Configurations That Attack

    Authors: Huan Huang, Lipeng Dai, Hongliang Zhang, Chongfu Zhang, Zhongxing Tian, Yi Cai, A. Lee Swindlehurst, Zhu Han

    Abstract: Emerging intelligent reflective surfaces (IRSs) significantly improve system performance, but also pose a significant risk for physical layer security (PLS). Unlike the extensive research on legitimate IRS-enhanced communications, in this article we present an adversarial IRS-based fully-passive jammer (FPJ). We describe typical application scenarios for Disco IRS (DIRS)-based FPJ, where an illegi… ▽ More

    Submitted 10 June, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by IEEE Wireless Communications. For the code of the DISCO RIS is available on Github (https://github.com/huanhuan1799/Disco-Intelligent-Reflecting-Surfaces-Active-Channel-Aging-for-Fully-Passive-Jamming-Attacks)

  8. arXiv:2309.07413  [pdf, other

    cs.CL cs.SD eess.AS

    CPPF: A contextual and post-processing-free model for automatic speech recognition

    Authors: Lei Zhang, Zhengkun Tian, Xiang Chen, Jiaming Sun, Hongyu Xiang, Ke Ding, Guanglu Wan

    Abstract: ASR systems have become increasingly widespread in recent years. However, their textual outputs often require post-processing tasks before they can be practically utilized. To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model. This integration n… ▽ More

    Submitted 20 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2024

  9. arXiv:2309.04156  [pdf, other

    cs.SD cs.CL eess.AS

    Cross-Utterance Conditioned VAE for Speech Generation

    Authors: Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

    Abstract: Speech synthesis systems powered by neural networks hold promise for multimedia production, but frequently face issues with producing expressive speech and seamless editing. In response, we present the Cross-Utterance Conditioned Variational Autoencoder speech synthesis (CUC-VAE S2) framework to enhance prosody and ensure natural speech generation. This framework leverages the powerful representat… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 13 pages;

  10. arXiv:2308.15716  [pdf, ps, other

    eess.SP

    Anti-Jamming Precoding Against Disco Intelligent Reflecting Surfaces Based Fully-Passive Jamming Attacks

    Authors: Huan Huang, Lipeng Dai, Hongliang Zhang, Zhongxing Tian, Yi Cai, Chongfu Zhang, A. Lee Swindlehurst, Zhu Han

    Abstract: Emerging intelligent reflecting surfaces (IRSs) significantly improve system performance, but also pose a huge risk for physical layer security. Existing works have illustrated that a disco IRS (DIRS), i.e., an illegitimate IRS with random time-varying reflection properties (like a "disco ball"), can be employed by an attacker to actively age the channels of legitimate users (LUs). Such active cha… ▽ More

    Submitted 24 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: This paper has been submitted for possible publication

  11. arXiv:2308.01138  [pdf, other

    cs.LG cs.AI eess.SP

    Can We Transfer Noise Patterns? A Multi-environment Spectrum Analysis Model Using Generated Cases

    Authors: Haiwen Du, Zheng Ju, Yu An, Honghui Du, Dongjie Zhu, Zhaoshuo Tian, Aonghus Lawlor, Ruihai Dong

    Abstract: Spectrum analysis systems in online water quality testing are designed to detect types and concentrations of pollutants and enable regulatory agencies to respond promptly to pollution incidents. However, spectral data-based testing devices suffer from complex noise patterns when deployed in non-laboratory environments. To make the analysis model applicable to more environments, we propose a noise… ▽ More

    Submitted 14 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  12. arXiv:2307.08323  [pdf, other

    cs.SD eess.AS

    TST: Time-Sparse Transducer for Automatic Speech Recognition

    Authors: Xiaohui Zhang, Mangui Liang, Zhengkun Tian, Jiangyan Yi, Jianhua Tao

    Abstract: End-to-end model, especially Recurrent Neural Network Transducer (RNN-T), has achieved great success in speech recognition. However, transducer requires a great memory footprint and computing time when processing a long decoding sequence. To solve this problem, we propose a model named time-sparse transducer, which introduces a time-sparse mechanism into transducer. In this mechanism, we obtain th… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 10 pages

    Journal ref: International Conference on Artificial Intelligence (CICAI 2023)

  13. arXiv:2306.10548  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    MARBLE: Music Audio Representation Benchmark for Universal Evaluation

    Authors: Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

    Abstract: In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue… ▽ More

    Submitted 23 November, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: camera-ready version for NeurIPS 2023

  14. arXiv:2305.13041  [pdf, ps, other

    cs.DC cs.LG eess.SP

    Distributed Learning over Networks with Graph-Attention-Based Personalization

    Authors: Zhuojun Tian, Zhaoyang Zhang, Zhaohui Yang, Richeng **, Huaiyu Dai

    Abstract: In conventional distributed learning over a network, multiple agents collaboratively build a common machine learning model. However, due to the underlying non-i.i.d. data distribution among agents, the unified learning model becomes inefficient for each agent to process its locally accessible data. To address this problem, we propose a graph-attention-based personalized training algorithm (GATTA)… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted for publication in IEEE TSP; with supplementary details for the derivations

  15. arXiv:2301.06948  [pdf, ps, other

    eess.SP

    Super-Resolution Harmonic Retrieval of Non-Circular Signals

    Authors: Yu Zhang, Yue Wang, Zhi Tian, Geert Leus, Gong Zhang

    Abstract: This paper proposes a super-resolution harmonic retrieval method for uncorrelated strictly non-circular signals, whose covariance and pseudo-covariance present Toeplitz and Hankel structures, respectively. Accordingly, the augmented covariance matrix constructed by the covariance and pseudo-covariance matrices is not only low rank but also jointly Toeplitz-Hankel structured. To efficiently exploit… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  16. arXiv:2211.12783  [pdf, other

    eess.SP

    Semantic-Aware Sensing Information Transmission for Metaverse: A Contest Theoretic Approach

    Authors: Jiacheng Wang, Hongyang Du, Zengshan Tian, Dusit Niyato, Jiawen Kang, Xuemin, Shen

    Abstract: With the advancement of network and computer technologies, virtual cyberspace keeps evolving, and Metaverse is the main representative. As an irreplaceable technology that supports Metaverse, the sensing information transmission from the physical world to Metaverse is vital. Inspired by emerging semantic communication, in this paper, we propose a semantic transmission framework for transmitting se… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  17. arXiv:2211.07182  [pdf, other

    eess.SP

    Compressive Spectrum Sensing Using Blind-Block Orthogonal Least Squares

    Authors: Liyang Lu, Wenbo Xu, Yue Wang, Zhi Tian

    Abstract: Compressive sensing (CS) has recently emerged as an extremely efficient technology of the wideband spectrum sensing. In compressive spectrum sensing (CSS), it is necessary to know the sparsity or the noise information in advance for reliable reconstruction. However, such information is usually absent in practical applications. In this paper, we propose a blind-block orthogonal least squares-based… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: 14 figures, submitted to Signal Processing for possible publication

  18. Compressive Spectrum Sensing Using Sampling-Controlled Block Orthogonal Matching Pursuit

    Authors: Liyang Lu, Wenbo Xu, Yue Wang, Zhi Tian

    Abstract: This paper proposes two novel schemes of wideband compressive spectrum sensing (CSS) via block orthogonal matching pursuit (BOMP) algorithm, for achieving high sensing accuracy in real time. These schemes aim to reliably recover the spectrum by adaptively adjusting the number of required measurements without inducing unnecessary sampling redundancy. To this end, the minimum number of required meas… ▽ More

    Submitted 13 April, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: Accepted by IEEE Transactions on Communications

    Journal ref: IEEE Trans. Commun., vol. 71, no. 2, pp. 1096-1111, Feb. 2023

  19. arXiv:2211.06073  [pdf, other

    cs.SD cs.CL eess.AS

    SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

    Authors: Jiangyan Yi, Chenglong Wang, Jianhua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu

    Abstract: Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society i… ▽ More

    Submitted 4 April, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted by Pattern Recognition, 1 April 2024

  20. arXiv:2211.03284  [pdf, other

    eess.AS cs.SD

    Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

    Authors: Zhengkun Tian, Hongyu Xiang, Min Li, Feifei Lin, Ke Ding, Guanglu Wan

    Abstract: The CTC model has been widely applied to many application scenarios because of its simple structure, excellent performance, and fast inference speed. There are many peaks in the probability distribution predicted by the CTC models, and each peak represents a non-blank token. The recognition latency of CTC models can be reduced by encouraging the model to predict peaks earlier. Existing methods to… ▽ More

    Submitted 15 March, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023(5 pages, 2 figures)

  21. arXiv:2210.16705  [pdf, other

    eess.SP

    Distributed Swarm Learning for Internet of Things at the Edge: Where Artificial Intelligence Meets Biological Intelligence

    Authors: Yue Wang, Zhi Tian, Xin Fan, Yan Huo, Cameron Nowzari, Kai Zeng

    Abstract: With the proliferation of versatile Internet of Things (IoT) services, smart IoT devices are increasingly deployed at the edge of wireless networks to perform collaborative machine learning tasks using locally collected data, giving rise to the edge learning paradigm. Due to device restrictions and resource constraints, edge learning among massive IoT devices faces major technical challenges cause… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

  22. arXiv:2210.16682  [pdf, ps, other

    cs.LG eess.SP

    Robust Distributed Learning Against Both Distributional Shifts and Byzantine Attacks

    Authors: Guanqiang Zhou, ** Xu, Yue Wang, Zhi Tian

    Abstract: In distributed learning systems, robustness issues may arise from two sources. On one hand, due to distributional shifts between training data and test data, the trained model could exhibit poor out-of-sample performance. On the other hand, a portion of working nodes might be subject to byzantine attacks which could invalidate the learning result. Existing works mostly deal with these two issues s… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

  23. arXiv:2210.12857  [pdf, other

    cs.CL cs.SD eess.AS

    Bootstrap** meaning through listening: Unsupervised learning of spoken sentence embeddings

    Authors: Jian Zhu, Zuoyu Tian, Yadong Liu, Cong Zhang, Chia-wen Lo

    Abstract: Inducing semantic representations directly from speech signals is a highly challenging task but has many useful applications in speech mining and spoken language understanding. This study tackles the unsupervised learning of semantic representations for spoken utterances. Through converting speech signals into hidden units generated from acoustic unit discovery, we propose WavEmbed, a multimodal s… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  24. arXiv:2208.09618  [pdf, other

    cs.SD cs.AI eess.AS

    Fully Automated End-to-End Fake Audio Detection

    Authors: Chenglong Wang, Jiangyan Yi, Jianhua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

    Abstract: The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure. However, artificial adjustment of the parameters can have a relatively obvious influence on the results. It is almost impossible to manually set the best set of parameters. Therefore this paper proposes a fully automated end-toen… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

  25. arXiv:2208.05578  [pdf, other

    eess.SP

    CB-DSL: Communication-efficient and Byzantine-robust Distributed Swarm Learning on Non-i.i.d. Data

    Authors: Xin Fan, Yue Wang, Yan Huo, Zhi Tian

    Abstract: The valuable data collected by IoT devices in edge networks together with the resurgence of ML stimulate the latest trend of edge AI. However, recent FL methods face major challenges including communication bottleneck, data heterogeneity and security concerns in edge IoT scenarios, especially when being adopted for distributed learning among massive IoT devices equipped with limited data and trans… ▽ More

    Submitted 20 October, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: update theoretical and simulation results

  26. arXiv:2206.12420  [pdf, other

    cs.LG eess.SP

    SCAI: A Spectral data Classification framework with Adaptive Inference for the IoT platform

    Authors: Yundong Sun, Dongjie Zhu, Haiwen Du, Yansong Wang, Zhaoshuo Tian

    Abstract: Currently, it is a hot research topic to realize accurate, efficient, and real-time identification of massive spectral data with the help of deep learning and IoT technology. Deep neural networks played a key role in spectral analysis. However, the inference of deeper models is performed in a static manner, and cannot be adjusted according to the device. Not all samples need to allocate all comput… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: 14 pages,11 figures

  27. Blind Orthogonal Least Squares based Compressive Spectrum Sensing

    Authors: Liyang Lu, Wenbo Xu, Yue Wang, Zhi Tian

    Abstract: As an enabling technique of cognitive radio (CR), compressive spectrum sensing (CSS) based on compressive sensing (CS) can detect the spectrum opportunities from wide frequency bands efficiently and accurately by using sub-Nyquist sampling rate. However, the sensing performance of most existing CSS excessively relies on the prior information such as spectrum sparsity or noise variance. Thus, a key… ▽ More

    Submitted 13 November, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: 14 figures, IEEE Transactions on Vehicular Technology (early access), Jun. 2023

  28. arXiv:2203.00508  [pdf, ps, other

    cs.IT eess.SP

    Reconfigurable Intelligent Surface-Aided Spectrum Sharing Coexisting with Multiple Primary Networks

    Authors: Zhong Tian, Zhengchuan Chen, Min Wang, Yunjian Jia, Wanli Wen

    Abstract: Considering the spectrum sharing system (SSS) coexisting with multiple primary networks, we have employed a well-designed reconfigurable intelligent surface (RIS) to control the radio environments of wireless channels and relieve the scarcity of the spectrum resource in this work. Specifically, the enhancement of the spectral efficiency of the secondary user in the considered SSS is decomposed int… ▽ More

    Submitted 4 November, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

  29. arXiv:2202.11693  [pdf, other

    cs.IT eess.SP

    Hybrid Mechanical and Electronic Beam Steering for Maximizing OAM Channel Capacity

    Authors: Rui Chen, Zhenyang Tian, Wen-Xuan Long, Xiaodong Wang, Wei Zhang

    Abstract: Radio frequency-orbital angular momentum (RF-OAM) is a novel approach of multiplexing a set of orthogonal modes on the same frequency channel to achieve high spectrum efficiencies. Since OAM requires precise alignment of the transmit and the receive antennas, the electronic beam steering approach has been proposed for the uniform circular array (UCA)-based OAM communication system to circumvent la… ▽ More

    Submitted 5 August, 2022; v1 submitted 28 January, 2022; originally announced February 2022.

  30. arXiv:2202.08433  [pdf, ps, other

    cs.SD cs.LG eess.AS

    ADD 2022: the First Audio Deep Synthesis Detection Challenge

    Authors: Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

    Abstract: Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake gam… ▽ More

    Submitted 26 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  31. arXiv:2201.12155  [pdf, other

    cs.CL cs.SD eess.AS

    Reducing language context confusion for end-to-end code-switching automatic speech recognition

    Authors: Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng

    Abstract: Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to r… ▽ More

    Submitted 29 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: text overlap with arXiv:2010.14798,the paper has been accepted by Insterspeech 2022

  32. Recovery Conditions of Sparse Signals Using Orthogonal Least Squares-Type Algorithms

    Authors: L. Lu, W. Xu, Y. Wang, Z. Tian

    Abstract: Orthogonal least squares (OLS)-type algorithms are efficient in reconstructing sparse signals, which include the well-known OLS, multiple OLS (MOLS) and block OLS (BOLS). In this paper, we first investigate the noiseless exact recovery conditions of these algorithms. Specifically, based on mutual incoherence property (MIP), we provide theoretical analysis of OLS and MOLS to ensure that the correct… ▽ More

    Submitted 12 October, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

    Comments: 13 pages, 11 figures, published in IEEE Transactions on Signal Processing

    Journal ref: IEEE Trans. Signal Process., vol. 70, pp. 4727-4741, Sep. 2022

  33. arXiv:2112.00457  [pdf, other

    cs.IT eess.SP

    Broadband beam steering for misaligned multi-mode OAM communication systems

    Authors: Zhengjuan Tian, Rui Chen, Wen-Xuan Long, Hong Zhou, Marco Moretti

    Abstract: Orbital angular momentum (OAM) at radio frequency (RF) has attracted more and more attention as a novel approach of multiplexing a set of orthogonal OAM modes on the same frequency channel to achieve high spectral efficiency (SE). However, the precondition for maintaining the orthogonality among different OAM modes is perfect alignment of the transmit and receive uniform circular arrays (UCAs), wh… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  34. A Unified 3D Beam Training and Tracking Procedure for Terahertz Communication

    Authors: Boyu Ning, Zhi Chen, Zhongbao Tian, Chong Han, Shaoqian Li

    Abstract: Terahertz (THz) communication is considered as an attractive way to overcome the bandwidth bottleneck and satisfy the ever-increasing capacity demand in the future. Due to the high directivity and propagation loss of THz waves, a massive MIMO system using beamforming is envisioned as a promising technology in THz communication to realize high-gain and directional transmission. However, pilots, whi… ▽ More

    Submitted 3 October, 2021; originally announced October 2021.

    Comments: in IEEE Transactions on Wireless Communications, 2021. arXiv admin note: text overlap with arXiv:2104.02885

  35. arXiv:2109.13833  [pdf, other

    eess.SY math.OC

    Convex Optimization of Speed and Energy Management System for Fuel Cell Hybrid Trains

    Authors: Rabee Jibrin, Stuart Hillmansen, Clive Roberts, Ning Zhao, Zhongbei Tian

    Abstract: We look into minimizing the hydrogen fuel consumption of hydrogen hybrid trains by optimizing their operation. The powertrain considered is a fuel cell charge-sustaining hybrid. Convex optimization is utilized to compute optimal speed and energy management trajectories. The barrier method is used to solve the optimization problems quickly on the order of tens of seconds for the entire journey. Sim… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: submitted to IEEE VPPC 2021

  36. arXiv:2107.05135  [pdf, other

    eess.IV

    Generative adversarial network based single pixel imaging

    Authors: Ming Zhao, Fengqiang Li, Fengyue Huo, Zhiming Tian

    Abstract: Single pixel imaging can reconstruct two-dimensional images of a scene with only a single-pixel detector. It has been widely used for imaging in non-visible bandwidth (e.g., near-infrared and X-ray) where focal-plane array sensors are challenging to be manufactured. In this paper, we propose a generative adversarial network based reconstruction algorithm for single pixel imaging, which demonstrate… ▽ More

    Submitted 11 July, 2021; originally announced July 2021.

  37. Beamforming Technologies for Ultra-Massive MIMO in Terahertz Communications

    Authors: Boyu Ning, Zhongbao Tian, Weidong Mei, Zhi Chen, Chong Han, Shaoqian Li, **hong Yuan, Rui Zhang

    Abstract: Terahertz (THz) communications with a frequency band $0.1-10$ THz are envisioned as a promising solution to future high-speed wireless communication. Although with tens of gigahertz available bandwidth, THz signals suffer from severe free-spreading loss and molecular-absorption loss, which limit the wireless transmission distance. To compensate for the propagation loss, the ultra-massive multiple-… ▽ More

    Submitted 13 March, 2023; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: This paper has been published: B. Ning et al., "Beamforming Technologies for Ultra-Massive MIMO in Terahertz Communications," in IEEE Open Journal of the Communications Society, vol. 4, pp. 614-658, 2023, doi: 10.1109/OJCOMS.2023.3245669. URL:https://ieeexplore.ieee.org/document/10045774

  38. Continual Learning for Fake Audio Detection

    Authors: Haoxin Ma, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengkun Tian, Chenglong Wang

    Abstract: Fake audio attack becomes a major threat to the speaker verification system. Although current detection approaches have achieved promising results on dataset-specific scenarios, they encounter difficulties on unseen spoofing data. Fine-tuning and retraining from scratch have been applied to incorporate new data. However, fine-tuning leads to performance degradation on previous data. Retraining tak… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: 5 pages, conference

    Journal ref: Proc. Interspeech 2021, 886-890

  39. arXiv:2104.03617  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Half-Truth: A Partially Fake Audio Detection Dataset

    Authors: Jiangyan Yi, Ye Bai, Jianhua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu

    Abstract: Diverse promising datasets have been designed to hold back the development of fake audio detection, such as ASVspoof databases. However, previous datasets ignore an attacking situation, in which the hacker hides some small fake clips in real speech audio. This poses a serious threat since that it is difficult to distinguish the small fake clip from the whole speech utterance. Therefore, this paper… ▽ More

    Submitted 15 December, 2023; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: accepted by Interspeech 2021

  40. arXiv:2104.02882  [pdf, other

    eess.AS cs.CL cs.SD

    FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

    Authors: Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen

    Abstract: Transducer-based models, such as RNN-Transducer and transformer-transducer, have achieved great success in speech recognition. A typical transducer model decodes the output sequence conditioned on the current acoustic state and previously predicted tokens step by step. Statistically, The number of blank tokens in the prediction results accounts for nearly 90\% of all tokens. It takes a lot of comp… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: Submitted to INTERSPEECH2021

  41. TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

    Authors: Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen, Xuefei Liu

    Abstract: The autoregressive (AR) models, such as attention-based encoder-decoder models and RNN-Transducer, have achieved great success in speech recognition. They predict the output sequence conditioned on the previous tokens and acoustic encoded states, which is inefficient on GPUs. The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire… ▽ More

    Submitted 3 April, 2021; originally announced April 2021.

    Comments: Submitted to Interspeech2021

  42. arXiv:2102.07594  [pdf, other

    cs.CL cs.AI eess.AS

    Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT

    Authors: Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

    Abstract: Attention-based encoder-decoder (AED) models have achieved promising performance in speech recognition. However, because the decoder predicts text tokens (such as characters or words) in an autoregressive manner, it is difficult for an AED model to predict all tokens in parallel. This makes the inference speed relatively slow. We believe that because the encoder already captures the whole speech u… ▽ More

    Submitted 29 August, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: 14 pages, 7 figures

  43. arXiv:2102.05210  [pdf, other

    eess.IV cs.CV

    D2A U-Net: Automatic Segmentation of COVID-19 Lesions from CT Slices with Dilated Convolution and Dual Attention Mechanism

    Authors: Xiangyu Zhao, Peng Zhang, Fan Song, Guangda Fan, Yangyang Sun, Yujia Wang, Zheyuan Tian, Luqi Zhang, Guanglei Zhang

    Abstract: Coronavirus Disease 2019 (COVID-19) has caused great casualties and becomes almost the most urgent public health events worldwide. Computed tomography (CT) is a significant screening tool for COVID-19 infection, and automated segmentation of lung infection in COVID-19 CT images will greatly assist diagnosis and health care of patients. However, accurate and automatic segmentation of COVID-19 lung… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

  44. arXiv:2102.03026  [pdf, other

    cs.CV eess.IV

    Instance and Panoptic Segmentation Using Conditional Convolutions

    Authors: Zhi Tian, Bowen Zhang, Hao Chen, Chunhua Shen

    Abstract: We propose a simple yet effective framework for instance and panoptic segmentation, termed CondInst (conditional convolutions for instance and panoptic segmentation). In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance. In contrast, we propose to attend to the instance… ▽ More

    Submitted 19 January, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

    Comments: Accepted to IEEE T. Pattern Analysis and Machine Intelligence (TPAMI). Extended version of arXiv:2003.05664

  45. arXiv:2011.04249  [pdf, other

    cs.SD cs.CL eess.AS

    Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

    Authors: Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen

    Abstract: The joint training framework for speech enhancement and recognition methods have obtained quite good performances for robust end-to-end automatic speech recognition (ASR). However, these methods only utilize the enhanced feature as the input of the speech recognition component, which are affected by the speech distortion problem. In order to address this problem, this paper proposes a gated recurr… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  46. arXiv:2010.14798  [pdf, other

    cs.SD cs.CL eess.AS

    Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

    Authors: Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, Jianhua Tao, Zhengqi wen

    Abstract: Despite the recent significant advances witnessed in end-to-end (E2E) ASR system for code-switching, hunger for audio-text paired data limits the further improvement of the models' performance. In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage. The model is decoupled into two parts:… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: 5 pages, 1 figures

  47. arXiv:2010.14791  [pdf, other

    eess.AS

    One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition

    Authors: Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi Wen

    Abstract: The RNN-Transducers and improved attention-based encoder-decoder models are widely applied to streaming speech recognition. Compared with these two end-to-end models, the CTC model is more efficient in training and inference. However, it cannot capture the linguistic dependencies between the output tokens. Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder an… ▽ More

    Submitted 3 April, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

  48. arXiv:2010.09776  [pdf, other

    cs.MA cs.AI cs.GT cs.LG eess.SY

    SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

    Authors: Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat , et al. (12 additional authors not shown)

    Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse a… ▽ More

    Submitted 31 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: 20 pages, 11 figures. Paper accepted to CoRL 2020

  49. arXiv:2009.13863  [pdf, ps, other

    eess.SP cs.DC cs.LG

    Distributed ADMM with Synergetic Communication and Computation

    Authors: Zhuojun Tian, Zhaoyang Zhang, Jue Wang, Xiaoming Chen, Wei Wang, Huaiyu Dai

    Abstract: In this paper, we propose a novel distributed alternating direction method of multipliers (ADMM) algorithm with synergetic communication and computation, called SCCD-ADMM, to reduce the total communication and computation cost of the system. Explicitly, in the proposed algorithm, each node interacts with only part of its neighboring nodes, the number of which is progressively determined according… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: Accepted for publication in IEEE TCOM

  50. Third-Order Statistics Reconstruction from Compressive Measurements

    Authors: Yanbo Wang, Zhi Tian

    Abstract: Estimation of third-order statistics relies on the availability of a huge amount of data records, which can pose severe challenges on the data collecting hardware in terms of considerable storage costs, overwhelming energy consumption, and unaffordably high sampling rate especially when dealing with high-dimensional data such as wideband signals. To overcome these challenges, this paper focuses on… ▽ More

    Submitted 13 May, 2021; v1 submitted 29 July, 2020; originally announced July 2020.