Skip to main content

Showing 1–44 of 44 results for author: Tian, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05982  [pdf

    eess.IV cs.LG physics.med-ph

    Artificial Intelligence for Neuro MRI Acquisition: A Review

    Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

    Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Submitted to MAGMA for review

  2. arXiv:2310.04004  [pdf, other

    cs.SD eess.AS

    U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

    Authors: Tao Li, Zhichao Wang, Xinfa Zhu, Jian Cong, Qiao Tian, Yu** Wang, Lei Xie

    Abstract: Zero-shot speaker cloning aims to synthesize speech for any target speaker unseen during TTS system building, given only a single speech reference of the speaker at hand. Although more practical in real applications, the current zero-shot methods still produce speech with undesirable naturalness and speaker similarity. Moreover, endowing the target speaker with arbitrary speaking styles in the zer… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  3. arXiv:2309.07314  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    AudioSR: Versatile Audio Super-resolution at Scale

    Authors: Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley

    Abstract: Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4kHz to 8kHz). In this paper, we introduce a diffusion-based generative model, AudioSR,… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Under review. Demo and code: https://audioldm.github.io/audiosr

  4. arXiv:2309.01142  [pdf, other

    eess.AS cs.SD

    MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

    Authors: Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yu** Wang

    Abstract: In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation. Previous work generally took explicit prosodic features or fixed-length style embeddin… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: This work was submitted on April 10, 2022 and accepted on August 29, 2023

  5. arXiv:2309.00883  [pdf, other

    cs.SD eess.AS

    DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

    Authors: Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, **gbei Li, Qiao Tian, Yu** Wang, Lei Xie

    Abstract: While the performance of cross-lingual TTS based on monolingual corpora has been significantly improved recently, generating cross-lingual speech still suffers from the foreign accent problem, leading to limited naturalness. Besides, current cross-lingual methods ignore modeling emotion, which is indispensable paralinguistic information in speech delivery. In this paper, we propose DiCLET-TTS, a D… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: accepted by TASLP

  6. arXiv:2308.05734  [pdf, other

    cs.SD cs.AI cs.MM eess.AS eess.SP

    AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

    Authors: Haohe Liu, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Qiao Tian, Yu** Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley

    Abstract: Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learn… ▽ More

    Submitted 11 May, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. Project page is https://audioldm.github.io/audioldm2

  7. arXiv:2306.10521  [pdf, other

    eess.AS cs.SD

    LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

    Authors: Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yu** Wang

    Abstract: Language model (LM) based audio generation frameworks, e.g., AudioLM, have recently achieved new state-of-the-art performance in zero-shot audio generation. In this paper, we explore the feasibility of LMs for zero-shot voice conversion. An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source sem… ▽ More

    Submitted 20 August, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  8. arXiv:2306.05704  [pdf, other

    cs.CV cs.MM eess.IV

    Exploring Effective Mask Sampling Modeling for Neural Image Compression

    Authors: Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

    Abstract: Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: 10 pages

  9. arXiv:2306.02982  [pdf, other

    cs.CL eess.AS

    PolyVoice: Language Models for Speech to Speech Translation

    Authors: Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yu** Wang, Mingxuan Wang, Yuxuan Wang

    Abstract: We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt… ▽ More

    Submitted 13 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

  10. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yu** Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  11. arXiv:2305.10666  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    A unified front-end framework for English text-to-speech synthesis

    Authors: Zelin Ying, Chen Li, Yu Dong, Qiuqiang Kong, Qiao Tian, Yuanyuan Huo, Yuxuan Wang

    Abstract: The front-end is a critical component of English text-to-speech (TTS) systems, responsible for extracting linguistic features that are essential for a text-to-speech model to synthesize speech, such as prosodies and phonemes. The English TTS front-end typically consists of a text normalization (TN) module, a prosody word prosody phrase (PWPP) module, and a grapheme-to-phoneme (G2P) module. However… ▽ More

    Submitted 25 March, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted in ICASSP 2024

  12. arXiv:2305.07204  [pdf, other

    eess.AS cs.SD

    Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

    Authors: Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yu** Wang

    Abstract: Zero-shot voice conversion (VC) converts source speech into the voice of any desired speaker using only one utterance of the speaker without requiring additional model updates. Typical methods use a speaker representation from a pre-trained speaker verification (SV) model or learn speaker representation during VC training to achieve zero-shot VC. However, existing speaker modeling methods overlook… ▽ More

    Submitted 18 May, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Submitted to TASLP

  13. arXiv:2305.05203  [pdf, other

    cs.SD eess.AS

    Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing

    Authors: **gbei Li, Sipan Li, ** Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yu** Wang, Yuxuan Wang

    Abstract: Automatic dubbing, which generates a corresponding version of the input speech in another language, could be widely utilized in many real-world scenarios such as video and game localization. In addition to synthesizing the translated scripts, automatic dubbing needs to further transfer the speaking style in the original language to the dubbed speeches to give audiences the impression that the char… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: Submitted to TASLP

  14. arXiv:2302.05297  [pdf

    cs.CV eess.IV

    Objective Evaluation-based High-efficiency Learning Framework for Hyperspectral Image Classification

    Authors: Xuming Zhang, Jian Yan, Jia Tian, Wei Li, Xingfa Gu, Qingjiu Tian

    Abstract: Deep learning methods have been successfully applied to hyperspectral image (HSI) classification with remarkable performance. Because of limited labelled HSI data, earlier studies primarily adopted a patch-based classification framework, which divides images into overlap** patches for training and testing. However, this approach results in redundant computations and possible information leakage.… ▽ More

    Submitted 10 January, 2023; originally announced February 2023.

  15. arXiv:2212.05751  [pdf, other

    eess.AS

    Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network

    Authors: Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yu** Wang, Yuxuan Wang

    Abstract: The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and data augmentation. Previous methods rely on reference utterances in the inference phase or are unable to preserve speaker identity. To address these issues, we pr… ▽ More

    Submitted 10 August, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Accepted by INTERSPEECH 2023

  16. arXiv:2212.00687  [pdf

    eess.IV

    3D-EPI Blip-Up/Down Acquisition (BUDA) with CAIPI and Joint Hankel Structured Low-Rank Reconstruction for Rapid Distortion-Free High-Resolution T2* Map**

    Authors: Zhifeng Chen, Congyu Liao, Xiaozhi Cao, Benedikt A. Poser, Zhongbiao Xu, Wei-Ching Lo, Manyi Wen, Jae** Cho, Qiyuan Tian, Yaohui Wang, Yanqiu Feng, Ling Xia, Wufan Chen, Feng Liu, Berkin Bilgic

    Abstract: Purpose: This work aims to develop a novel distortion-free 3D-EPI acquisition and image reconstruction technique for fast and robust, high-resolution, whole-brain imaging as well as quantitative T2* map**. Methods: 3D-Blip-Up and -Down Acquisition (3D-BUDA) sequence is designed for both single- and multi-echo 3D GRE-EPI imaging using multiple shots with blip-up and -down readouts to encode B0 fi… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  17. arXiv:2211.08857  [pdf, other

    eess.AS cs.SD

    Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

    Authors: Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yu** Wang

    Abstract: Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC). However, in a low-resource situation, where only limited utterances from the target speaker are accessible, existing VC methods are hard to meet this requirement and capture the target speaker's timber. In this work, a novel VC model, referred… ▽ More

    Submitted 13 March, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Accepted by ICASSP 2023

  18. arXiv:2211.05360  [pdf

    eess.IV

    SRNR: Training neural networks for Super-Resolution MRI using Noisy high-resolution Reference data

    Authors: Jiaxin Xiao, Zihan Li, Berkin Bilgic, Jonathan R. Polimeni, Susie Huang, Qiyuan Tian

    Abstract: Neural network (NN) based approaches for super-resolution MRI typically require high-SNR high-resolution reference data acquired in many subjects, which is time consuming and a barrier to feasible and accessible implementation. We propose to train NNs for Super-Resolution using Noisy Reference data (SRNR), leveraging the mechanism of the classic NN-based denoising method Noise2Noise. We systematic… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 2 pages, 5 figures, submitted to ISMRM

  19. arXiv:2210.15158  [pdf, other

    eess.AS cs.SD

    Streaming Voice Conversion Via Intermediate Bottleneck Features And Non-streaming Teacher Guidance

    Authors: Yuanzhe Chen, Ming Tu, Tang Li, Xin Li, Qiuqiang Kong, Jiaxin Li, Zhichao Wang, Qiao Tian, Yu** Wang, Yuxuan Wang

    Abstract: Streaming voice conversion (VC) is the task of converting the voice of one person to another in real-time. Previous streaming VC methods use phonetic posteriorgrams (PPGs) extracted from automatic speech recognition (ASR) systems to represent speaker-independent information. However, PPGs lack the prosody and vocalization information of the source speaker, and streaming PPGs contain undesired leak… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: The paper has been submitted to ICASSP2023

  20. arXiv:2210.07594  [pdf, other

    cs.CV cs.LG cs.MM eess.IV

    See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images

    Authors: Xiaoyan Zhang, Gaoyang Tang, Yingying Zhu, Qi Tian

    Abstract: The issue of image haze removal has attracted wide attention in recent years. However, most existing haze removal methods cannot restore the scene with clear blue sky, since the color and texture information of the object in the original haze image is insufficient. To remedy this, we propose a cycle generative adversarial network to construct a novel end-to-end image dehaze model. We adopt outdoor… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  21. arXiv:2207.06088  [pdf, other

    cs.SD eess.AS

    Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

    Authors: Zhengxi Liu, Qiao Tian, Chenxu Hu, Xudong Liu, Menglin Wu, Yu** Wang, Hang Zhao, Yuxuan Wang

    Abstract: Some recent studies have demonstrated the feasibility of single-stage neural text-to-speech, which does not need to generate mel-spectrograms but generates the raw waveforms directly from the text. Single-stage text-to-speech often faces two problems: a) the one-to-many map** problem due to multiple speech variations and b) insufficiency of high frequency reconstruction due to the lack of superv… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

  22. VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

    Authors: Haohe Liu, Xubo Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang

    Abstract: Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on a single type of distortion, such as speech denoising or dereverberation. However, speech signals can be degraded by several different distortions simultaneously in the real world. It is thus important to extend speech restoration models to deal with multiple distortions. In this paper, we introduce Voic… ▽ More

    Submitted 17 April, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: Submitted to INTERSPEECH 2022

    Journal ref: Proc. Interspeech 2022

  23. arXiv:2203.16838  [pdf, other

    cs.SD eess.AS

    NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

    Authors: **gbei Li, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yu** Wang, Yuxuan Wang

    Abstract: Although deep learning and end-to-end models have been widely used and shown superiority in automatic speech recognition (ASR) and text-to-speech (TTS) synthesis, state-of-the-art forced alignment (FA) models are still based on hidden Markov model (HMM). HMM has limited view of contextual information and is developed with long pipelines, leading to error accumulation and unsatisfactory performance… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by ICASSP 2022

  24. arXiv:2203.14941  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Neural Vocoder is All You Need for Speech Super-resolution

    Authors: Haohe Liu, Woosung Choi, Xubo Liu, Qiuqiang Kong, Qiao Tian, DeLiang Wang

    Abstract: Speech super-resolution (SR) is a task to increase speech sampling rate by generating high-frequency components. Existing speech SR methods are trained in constrained experimental settings, such as a fixed upsampling ratio. These strong constraints can potentially lead to poor generalization ability in mismatched real-world cases. In this paper, we propose a neural vocoder based speech super-resol… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Submitted to INTERSPEECH 2022

    Journal ref: Proc. Interspeech 2022

  25. arXiv:2202.02814  [pdf

    eess.IV cs.LG

    Wave-Encoded Model-based Deep Learning for Highly Accelerated Imaging with Joint Reconstruction

    Authors: Jae** Cho, Borjan Gagoski, Taehyung Kim, Qiyuan Tian, Stephen Robert Frost, Itthi Chatnuntawech, Berkin Bilgic

    Abstract: Purpose: To propose a wave-encoded model-based deep learning (wave-MoDL) strategy for highly accelerated 3D imaging and joint multi-contrast image reconstruction, and further extend this to enable rapid quantitative imaging using an interleaved look-locker acquisition sequence with T2 preparation pulse (3D-QALAS). Method: Recently introduced MoDL technique successfully incorporates convolutional… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Comments: 8 figures, 1 table

  26. arXiv:2112.01587  [pdf

    eess.IV cs.AI cs.CV physics.med-ph

    Improving accuracy and uncertainty quantification of deep learning based quantitative MRI using Monte Carlo dropout

    Authors: Mehmet Yigit Avci, Ziyu Li, Qiuyun Fan, Susie Huang, Berkin Bilgic, Qiyuan Tian

    Abstract: Dropout is conventionally used during the training phase as regularization method and for quantifying uncertainty in deep learning. We propose to use dropout during training as well as inference steps, and average multiple predictions to improve the accuracy, while reducing and quantifying the uncertainty. The results are evaluated for fractional anisotropy (FA) and mean diffusivity (MD) maps whic… ▽ More

    Submitted 5 November, 2023; v1 submitted 2 December, 2021; originally announced December 2021.

  27. arXiv:2111.07220  [pdf

    eess.IV cs.LG physics.med-ph

    SDnDTI: Self-supervised deep learning-based denoising for diffusion tensor MRI

    Authors: Qiyuan Tian, Ziyu Li, Qiuyun Fan, Jonathan R. Polimeni, Berkin Bilgic, David H. Salat, Susie Y. Huang

    Abstract: The noise in diffusion-weighted images (DWIs) decreases the accuracy and precision of diffusion tensor magnetic resonance imaging (DTI) derived microstructural parameters and leads to prolonged acquisition time for achieving improved signal-to-noise ratio (SNR). Deep learning-based image denoising using convolutional neural networks (CNNs) has superior performance but often requires additional hig… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

  28. arXiv:2110.09788  [pdf, other

    cs.CV eess.IV

    CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis

    Authors: Peng Zhou, Lingxi Xie, Bingbing Ni, Qi Tian

    Abstract: The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses. The recently proposed NeRF-based GANs made great progress towards 3D-aware generators, but they are unable to generate high-quality images yet. This paper presents CIPS-3D, a style-based, 3D-aware generator that is composed o… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: 3D-aware GANs based on NeRF, https://github.com/PeterouZh/CIPS-3D

  29. arXiv:2110.08243  [pdf, other

    eess.AS cs.CL cs.CV cs.LG cs.SD eess.IV

    Neural Dubber: Dubbing for Videos According to Scripts

    Authors: Chenxu Hu, Qiao Tian, Tingle Li, Yu** Wang, Yuxuan Wang, Hang Zhao

    Abstract: Dubbing is a post-production process of re-recording actors' dialogues, which is extensively used in filmmaking and video production. It is usually performed manually by professional voice actors who read lines with proper prosody, and in synchronization with the pre-recorded videos. In this work, we propose Neural Dubber, the first neural network model to solve a novel automatic video dubbing (AV… ▽ More

    Submitted 15 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted by NeurIPS 2021; Project page at https://tsinghua-mars-lab.github.io/NeuralDubber/

  30. arXiv:2110.03347  [pdf, ps, other

    eess.AS cs.HC cs.SD

    Cloning one's voice using very limited data in the wild

    Authors: Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yu** Wang, Yuxuan Wang

    Abstract: With the increasing popularity of speech synthesis products, the industry has put forward more requirements for personalized speech synthesis: (1) How to use low-resource, easily accessible data to clone a person's voice. (2) How to clone a person's voice while controlling the style and prosody. To solve the above two problems, we proposed the Hieratron model framework in which the prosody and tim… ▽ More

    Submitted 8 October, 2021; v1 submitted 7 October, 2021; originally announced October 2021.

  31. arXiv:2109.13731  [pdf, other

    cs.SD eess.AS

    VoiceFixer: Toward General Speech Restoration with Neural Vocoder

    Authors: Haohe Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng Huang, Yuxuan Wang

    Abstract: Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on single-task speech restoration (SSR), such as speech denoising or speech declip**. However, SSR systems only focus on one task and do not address the general speech restoration problem. In addition, previous SSR systems show limited performance in some speech restoration tasks such as speech super-reso… ▽ More

    Submitted 5 October, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

  32. arXiv:2106.01918  [pdf

    eess.IV eess.SP physics.bio-ph

    Highly Accelerated EPI with Wave Encoding and Multi-shot Simultaneous Multi-Slice Imaging

    Authors: Jae** Cho, Congyu Liao, Qiyuan Tian, Zi**g Zhang, **min Xu, Wei-Ching Lo, Benedikt A. Poser, V. Andrew Stenger, Jason Stockmann, Kawin Setsompop, Berkin Bilgic

    Abstract: We introduce wave encoded acquisition and reconstruction techniques for highly accelerated echo planar imaging (EPI) with reduced g-factor penalty and image artifacts. Wave-EPI involves playing sinusoidal gradients during the EPI readout while employing interslice shifts as in blipped-CAIPI acquisitions. This spreads the aliasing in all spatial directions, thereby taking better advantage of 3D coi… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

  33. arXiv:2105.05537  [pdf, other

    eess.IV cs.CV

    Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation

    Authors: Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, Manning Wang

    Abstract: In the past few years, convolutional neural networks (CNNs) have achieved milestones in medical image analysis. Especially, the deep neural networks based on U-shaped architecture and skip-connections have been widely applied in a variety of medical image tasks. However, although CNN has achieved excellent performance, it cannot learn global and long-range semantic information interaction well due… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: a drafted manuscript

  34. arXiv:2102.09069  [pdf

    eess.IV cs.LG physics.med-ph

    SRDTI: Deep learning-based super-resolution for diffusion tensor MRI

    Authors: Qiyuan Tian, Ziyu Li, Qiuyun Fan, Chanon Ngamsombat, Yuxin Hu, Congyu Liao, Fuyixue Wang, Kawin Setsompop, Jonathan R. Polimeni, Berkin Bilgic, Susie Y. Huang

    Abstract: High-resolution diffusion tensor imaging (DTI) is beneficial for probing tissue microstructure in fine neuroanatomical structures, but long scan times and limited signal-to-noise ratio pose significant barriers to acquiring DTI at sub-millimeter resolution. To address this challenge, we propose a deep learning-based super-resolution method entitled "SRDTI" to synthesize high-resolution diffusion-w… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

  35. arXiv:2011.12206  [pdf, other

    eess.AS

    TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

    Authors: Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, Linghui Chen, Lei Xie, Shan Liu

    Abstract: Recently, GAN based speech synthesis methods, such as MelGAN, have become very popular. Compared to conventional autoregressive based methods, parallel structures based generators make waveform generation process fast and stable. However, the quality of generated speech by autoregressive based neural vocoders, such as WaveRNN, is still higher than GAN. To address this issue, we propose a novel voc… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

  36. arXiv:2011.02055  [pdf, other

    eess.IV cs.CV

    Self-Adaptively Learning to Demoire from Focused and Defocused Image Pairs

    Authors: Lin Liu, Shanxin Yuan, Jianzhuang Liu, Li** Bao, Gregory Slabaugh, Qi Tian

    Abstract: Moire artifacts are common in digital photography, resulting from the interference between high-frequency scene content and the color filter array of the camera. Existing deep learning-based demoireing methods trained on large scale datasets are limited in handling various complex moire patterns, and mainly focus on demoireing of photos taken of digital displays. Moreover, obtaining moire-free gro… ▽ More

    Submitted 5 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to NeurIPS 2020. Project page: "http://home.ustc.edu.cn/~ll0825/project_FDNet.html"

  37. arXiv:2011.00935  [pdf, other

    eess.AS cs.SD

    FeatherTTS: Robust and Efficient attention based Neural TTS

    Authors: Qiao Tian, Zewang Zhang, Chao Liu, Heng Lu, Linghui Chen, Bin Wei, Pujiang He, Shan Liu

    Abstract: Attention based neural TTS is elegant speech synthesis pipeline and has shown a powerful ability to generate natural speech. However, it is still not robust enough to meet the stability requirements for industrial products. Besides, it suffers from slow inference speed owning to the autoregressive generation process. In this work, we propose FeatherTTS, a robust and efficient attention-based neura… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

  38. arXiv:2005.05642  [pdf, other

    cs.SD cs.CL eess.AS

    AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN

    Authors: Zewang Zhang, Qiao Tian, Heng Lu, Ling-Hui Chen, Shan Liu

    Abstract: This paper investigates how to leverage a DurIAN-based average model to enable a new speaker to have both accurate pronunciation and fluent cross-lingual speaking with very limited monolingual data. A weakness of the recently proposed end-to-end text-to-speech (TTS) systems is that robust alignment is hard to achieve, which hinders it to scale well with very limited data. To cope with this issue,… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: Submitted to InterSpeech 2020

  39. arXiv:2005.05551  [pdf, other

    cs.SD eess.AS

    FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

    Authors: Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu

    Abstract: In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. Howeve… ▽ More

    Submitted 3 September, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: Accepted by INTERSPEECH 2020

  40. arXiv:1910.02593  [pdf, other

    eess.IV cs.CV

    Unsupervised Image Super-Resolution with an Indirect Supervised Path

    Authors: Zhen Han, Enyan Dai, Xu Jia, Xiaoying Ren, Shuaijun Chen, Chun**g Xu, Jianzhuang Liu, Qi Tian

    Abstract: The task of single image super-resolution (SISR) aims at reconstructing a high-resolution (HR) image from a low-resolution (LR) image. Although significant progress has been made by deep learning models, they are trained on synthetic paired data in a supervised way and do not perform well on real data. There are several attempts that directly apply unsupervised image translation models to address… ▽ More

    Submitted 13 October, 2019; v1 submitted 6 October, 2019; originally announced October 2019.

  41. arXiv:1907.10804  [pdf, other

    cs.CV cs.LG eess.IV

    Co-Evolutionary Compression for Unpaired Image Translation

    Authors: Han Shu, Yunhe Wang, Xu Jia, Kai Han, Hanting Chen, Chun**g Xu, Qi Tian, Chang Xu

    Abstract: Generative adversarial networks (GANs) have been successfully used for considerable computer vision tasks, especially the image-to-image translation. However, generators in these networks are of complicated architectures with large number of parameters and huge computational complexities. Existing methods are mainly designed for compressing and speeding-up deep neural networks in the classificatio… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

    Comments: Accepted by ICCV 2019

  42. arXiv:1812.02339  [pdf, other

    eess.AS cs.SD

    Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

    Authors: Qiao Tian, Xucheng Wan, Shan Liu

    Abstract: Although state-of-the-art parallel WaveNet has addressed the issue of real-time waveform generation, there remains problems. Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural waveforms. Secondly, a parallel WaveNet is trained under a distillation framework, which makes it tedious to adapt a well trained model to a new speaker. To… ▽ More

    Submitted 19 July, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

    Comments: 5 pages, 4 figure, 1 table, 6 equations

  43. arXiv:1811.05473  [pdf

    physics.med-ph eess.IV

    High-fidelity, high-isotropic resolution diffusion imaging through gSlider acquisition with B1+ & T1 corrections and integrated ΔB0/Rx shim array

    Authors: Congyu Liao, Jason Stockmann, Qiyuan Tian, Berkin Bilgic, Nicolas S. Arango, Mary Kate Manhard, William A. Grissom, Lawrence L. Wald, Kawin Setsompop

    Abstract: Purpose: B1+ and T1 corrections and dynamic multi-coil shimming approaches were proposed to improve the fidelity of high isotropic resolution Generalized slice dithered enhanced resolution (gSlider) diffusion imaging. Methods: An extended reconstruction incorporating B1+ inhomogeneity and T1 recovery information was developed to mitigate slab-boundary artifacts in short-TR gSlider acquisitions. Sl… ▽ More

    Submitted 26 March, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: 7 figures

    Journal ref: Magnetic Resonance in Medicine (2019)

  44. arXiv:1808.02814  [pdf

    eess.IV cs.LG stat.ML

    Highly Accelerated Multishot EPI through Synergistic Machine Learning and Joint Reconstruction

    Authors: Berkin Bilgic, Itthi Chatnuntawech, Mary Kate Manhard, Qiyuan Tian, Congyu Liao, Stephen F. Cauley, Susie Y. Huang, Jonathan R. Polimeni, Lawrence L. Wald, Kawin Setsompop

    Abstract: Purpose: To introduce a combined machine learning (ML) and physics-based image reconstruction framework that enables navigator-free, highly accelerated multishot echo planar imaging (msEPI), and demonstrate its application in high-resolution structural and diffusion imaging. Methods: Singleshot EPI is an efficient encoding technique, but does not lend itself well to high-resolution imaging due t… ▽ More

    Submitted 24 March, 2019; v1 submitted 8 August, 2018; originally announced August 2018.