Skip to main content

Showing 1–18 of 18 results for author: Bian, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.03957  [pdf, ps, other

    eess.SP

    SwinFi: a CSI Compression Method based on Swin Transformer for Wi-Fi Sensing

    Authors: Jichen Bian

    Abstract: Wi-Fi sensing is a transformative approach that enables a large of applications through CSI analysis. The challenge lies in the high computational and communication costs with the increasing granularity of CSI data. In this letter, we propose SwinFi, a pioneering solution that compresses CSI at the edge into a succinct feature image and reconstructs at the cloud for further processing. SwinFi empl… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  2. Wi-Fi-based Personnel Identity Recognition: Addressing Dataset Imbalance with C-DDPMs

    Authors: Jichen Bian, Chong Tan, Peiyao Tang, Min Zheng

    Abstract: Wireless sensing technologies become increasingly prevalent due to the ubiquitous nature of wireless signals and their inherent privacy-friendly characteristics. Device-free personnel identity recognition, a prevalent application in wireless sensing, is susceptibly challenged by imbalanced channel state information (CSI) datasets. This letter proposes a novel method for CSI dataset augmentation th… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Journal ref: IEEE Signal Processing Letters, 2024

  3. arXiv:2403.03100  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

    Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, **yu Li, Sheng Zhao

    Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More

    Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

  4. arXiv:2312.00568  [pdf, ps, other

    eess.SP

    A WINNER+ Based 3-D Non-Stationary Wideband MIMO Channel Model

    Authors: Ji Bian, Jian Sun, Cheng-Xiang Wang, Rui Feng, Jie Huang, Yang Yang, Minggao Zhang

    Abstract: In this paper, a three-dimensional (3-D) non-stationary wideband multiple-input multiple-output (MIMO) channel model based on the WINNER+ channel model is proposed. The angular distributions of clusters in both the horizontal and vertical planes are jointly considered. The receiver and clusters can be moving, which makes the model more general. Parameters including number of clusters, powers, dela… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  5. arXiv:2310.11954  [pdf, other

    cs.CL cs.MM eess.AS

    MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

    Authors: Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

    Abstract: AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data… ▽ More

    Submitted 25 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

  6. arXiv:2310.05052  [pdf, other

    eess.SP cs.AI cs.LG

    Accurate battery lifetime prediction across diverse aging conditions with deep learning

    Authors: Han Zhang, Yuqi Li, Shun Zheng, Ziheng Lu, Xiaofan Gui, Wei Xu, Jiang Bian

    Abstract: Accurately predicting the lifetime of battery cells in early cycles holds tremendous value for battery research and development as well as numerous downstream applications. This task is rather challenging because diverse conditions, such as electrode materials, operating conditions, and working environments, collectively determine complex capacity-degradation behaviors. However, current prediction… ▽ More

    Submitted 24 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  7. arXiv:2310.00704  [pdf, other

    cs.SD eess.AS

    UniAudio: An Audio Foundation Model Toward Universal Audio Generation

    Authors: Dongchao Yang, **chuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Shinji Watanabe, Helen Meng

    Abstract: Large Language models (LLM) have demonstrated the capability to handle a variety of generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific approaches, leverages LLM techniques to generate multiple types of audio (including speech, sounds, music, and singing) with given input conditions. UniAudio 1) first tokenizes all types of target audio along with other con… ▽ More

    Submitted 11 December, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

  8. arXiv:2309.02285  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    PromptTTS 2: Describing and Generating Voices with Text Prompt

    Authors: Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

    Abstract: Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text… ▽ More

    Submitted 11 October, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Demo page: https://speechresearch.github.io/prompttts2

  9. arXiv:2307.01229  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    EmoGen: Eliminating Subjective Bias in Emotional Music Generation

    Authors: Chenfei Kang, Peiling Lu, Botao Yu, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

    Abstract: Music is used to convey emotions, and thus generating emotional music is important in automatic music generation. Previous work on emotional music generation directly uses annotated emotion labels as control signals, which suffers from subjective bias: different people may annotate different emotions on the same music, and one person may feel different emotions under different situations. Therefor… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 12 pages, 7 pages

  10. arXiv:2306.00110  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    MuseCoco: Generating Symbolic Music from Text

    Authors: Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian

    Abstract: Generating music from text descriptions is a user-friendly mode since the text is a relatively easy interface for user engagement. While some approaches utilize texts to control music audio generation, editing musical elements in generated audio is challenging for users. In contrast, symbolic music offers ease of editing, making it more accessible for users to manipulate specific musical elements.… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  11. arXiv:2305.10841  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

    Authors: Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan

    Abstract: Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there's a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However,… ▽ More

    Submitted 29 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 13 pages, 4 figures

  12. arXiv:2304.09116  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

    Authors: Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

    Abstract: Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skip**/repeating is… ▽ More

    Submitted 30 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: A large-scale text-to-speech and singing voice synthesis system with latent diffusion models. Update: NaturalSpeech 2 extension to voice conversion and speech enhancement

  13. arXiv:2304.00830  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

    Authors: Yuancheng Wang, Zeqian Ju, Xu Tan, Lei He, Zhizheng Wu, Jiang Bian, Sheng Zhao

    Abstract: Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been train… ▽ More

    Submitted 5 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

  14. arXiv:2301.08846  [pdf, other

    cs.LG cs.AI cs.CL cs.CV eess.AS

    Regeneration Learning: A Learning Paradigm for Data Generation

    Authors: Xu Tan, Tao Qin, Jiang Bian, Tie-Yan Liu, Yoshua Bengio

    Abstract: Machine learning methods for conditional data generation usually build a map** from source conditional data X to target data Y. The target Y (e.g., text, speech, music, image, video) is usually high-dimensional and complex, and contains information that does not exist in source data, which hinders effective and efficient learning on the source-target map**. In this paper, we present a learning… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

  15. arXiv:2212.14518  [pdf, other

    eess.AS cs.CL cs.LG cs.SD eess.SP

    ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

    Authors: Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic

    Abstract: Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: 13 pages, 5 figures

  16. arXiv:2211.16934  [pdf, other

    cs.CL cs.AI cs.LG cs.MM eess.AS

    VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

    Authors: Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian

    Abstract: Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible… ▽ More

    Submitted 4 December, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: AAAI 2023 camera version

  17. arXiv:2101.06610  [pdf, other

    eess.SP

    A General 3D Non-Stationary Wireless Channel Model for 5G and Beyond

    Authors: Ji Bian, Cheng-Xiang Wang, Xiqi Gao, Xiaohu You, Minggao Zhang

    Abstract: In this paper, a novel three-dimensional (3D) non-stationary geometry-based stochastic model (GBSM) for the fifth generation (5G) and beyond 5G (B5G) systems is proposed. The proposed B5G channel model (B5GCM) is designed to capture various channel characteristics in (B)5G systems such as space-time-frequency (STF) non-stationarity, spherical wavefront (SWF), high delay resolution, time-variant ve… ▽ More

    Submitted 17 January, 2021; originally announced January 2021.

  18. arXiv:2005.05650  [pdf, other

    eess.IV cs.CV cs.LG

    Invertible Image Rescaling

    Authors: Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu

    Abstract: High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images. However, typical image downscaling is a non-injective map** due to the loss of high-frequency information, which leads to the ill-posed problem of the inver… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.