Skip to main content

Showing 1–50 of 68 results for author: Guo, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.05763  [pdf, other

    eess.AS

    WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

    Authors: Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, Binbin Zhang, Lei Xie

    Abstract: With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the open-sourced WenetSpeech dataset. Tailored for the text-to-speech tasks, we refined WenetSpeech by adjusting segment boundaries, enhancing the audio… ▽ More

    Submitted 19 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  2. arXiv:2406.05672  [pdf, other

    eess.AS

    Text-aware and Context-aware Expressive Audiobook Speech Synthesis

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie

    Abstract: Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech. However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks without relying on manually labeled data or reference speech. To address this problem, we propose a text-aware and context-aware(TACA) style modeling approach… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  3. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  4. arXiv:2402.15939  [pdf

    eess.IV cs.LG

    Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI

    Authors: Zi Wang, Min Xiao, Yirong Zhou, Chengyan Wang, Naiming Wu, Yi Li, Yiwen Gong, Shufu Chang, Yinyin Chen, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Di Guo, Guang Yang, Xiaobo Qu

    Abstract: Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, levera… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 10 pages, 11 figures, 3 tables

  5. arXiv:2312.12692  [pdf, other

    eess.SP

    Reducing Satellite Interference to Radio Telescopes Using Beacons

    Authors: Cuneyd Ozturk, Randall A. Berry, Dongning Guo, Michael L. Honig, Frank D. Lind

    Abstract: This paper proposes the transmission of beacon signals to alert potential interferers of an ongoing or impending passive sensing measurement. We focus on the interference from Low-Earth Orbiting (LEO) satellites to a radio-telescope. We compare the beacon approach with two versions of Radio Quiet Zones (RQZs): fixed quiet zones on the ground and in the sky, and dynamic quiet zones that vary across… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  6. arXiv:2312.09746  [pdf, other

    cs.SD eess.AS

    Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

    Authors: Bingshen Mu, Pengcheng Guo, Dake Guo, Pan Zhou, Wei Chen, Lei Xie

    Abstract: Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces challenges in real-world distant scenarios across various array topologies each with multiple recording devices. The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition perf… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  7. arXiv:2312.05746  [pdf, ps, other

    cs.MA eess.SY

    A Scalable MARL Solution for Scheduling in Conflict Graphs

    Authors: Yiming Zhang, Dongning Guo

    Abstract: This paper proposes a fully scalable multi-agent reinforcement learning (MARL) approach for packet scheduling in conflict graphs, aiming to minimizing average packet delays. Each agent autonomously manages the schedule of a single link over one or multiple sub-bands, considering its own state and states of conflicting links. The problem can be conceptualized as a decentralized partially observable… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  8. arXiv:2310.13882  [pdf

    eess.SP

    NMR Spectra Denoising with Vandermonde Constraints

    Authors: Di Guo, Runmin Xu, **yu Wu, Mei** Lin, Xiaofeng Du, Xiaobo Qu

    Abstract: Nuclear magnetic resonance (NMR) spectroscopy serves as an important tool to analyze chemicals and proteins in bioengineering. However, NMR signals are easily contaminated by noise during the data acquisition, which can affect subsequent quantitative analysis. Therefore, denoising NMR signals has been a long-time concern. In this work, we propose an optimization model-based iterative denoising met… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 10 pages, 9 figures

  9. arXiv:2310.11641  [pdf

    eess.IV cs.AI physics.med-ph

    Cloud-Magnetic Resonance Imaging System: In the Era of 6G and Artificial Intelligence

    Authors: Yirong Zhou, Yanhuang Wu, Yuhan Su, **g Li, Jianyun Cai, Yongfu You, Di Guo, Xiaobo Qu

    Abstract: Magnetic Resonance Imaging (MRI) plays an important role in medical diagnosis, generating petabytes of image data annually in large hospitals. This voluminous data stream requires a significant amount of network bandwidth and extensive storage infrastructure. Additionally, local data processing demands substantial manpower and hardware investments. Data isolation across different healthcare instit… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 4pages, 5figures, letters

  10. arXiv:2310.02855  [pdf, other

    eess.IV cs.CV

    Multi-Resolution Fusion for Fully Automatic Cephalometric Landmark Detection

    Authors: Dongqian Guo, Wencheng Han

    Abstract: Cephalometric landmark detection on lateral skull X-ray images plays a crucial role in the diagnosis of certain dental diseases. Accurate and effective identification of these landmarks presents a significant challenge. Based on extensive data observations and quantitative analyses, we discovered that visual features from different receptive fields affect the detection accuracy of various landmark… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  11. arXiv:2309.13907  [pdf, other

    cs.SD eess.AS

    HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie

    Abstract: Recent advances in text-to-speech, particularly those based on Graph Neural Networks (GNNs), have significantly improved the expressiveness of short-form synthetic speech. However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging. To address this problem, we expand the capabilities of GNNs with a hierarchical prosody modeling approach, named HiGNN… ▽ More

    Submitted 6 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ASRU2023

  12. arXiv:2309.12094  [pdf, other

    eess.SP cs.NI

    RadYOLOLet: Radar Detection and Parameter Estimation Using YOLO and WaveLet

    Authors: Shamik Sarkar, Dongning Guo, Danijela Cabric

    Abstract: Detection of radar signals without assistance from the radar transmitter is a crucial requirement for emerging and future shared-spectrum wireless networks like Citizens Broadband Radio Service (CBRS). In this paper, we propose a supervised deep learning-based spectrum sensing approach called RadYOLOLet that can detect low-power radar signals in the presence of interference and estimate the radar… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 15 pages

  13. arXiv:2309.11763  [pdf

    eess.IV

    Bloch Equation Enables Physics-informed Neural Network in Parametric Magnetic Resonance Imaging

    Authors: Qingrui Cai, Liuhong Zhu, Jianjun Zhou, Chen Qian, Di Guo, Xiaobo Qu

    Abstract: Magnetic resonance imaging (MRI) is an important non-invasive imaging method in clinical diagnosis. Beyond the common image structures, parametric imaging can provide the intrinsic tissue property thus could be used in quantitative evaluation. The emerging deep learning approach provides fast and accurate parameter estimation but still encounters the lack of network interpretation and enough train… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  14. arXiv:2309.07178  [pdf

    q-bio.QM cs.AI cs.LG eess.SP

    CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis

    Authors: Di Guo, Si** Li, Jun Liu, Zhangren Tu, Tianyu Qiu, **g**g Xu, Liubin Feng, Donghai Lin, Qing Hong, Mei** Lin, Yanqin Lin, Xiaobo Qu

    Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep l… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 11 pages, 13 figures

  15. arXiv:2309.06843  [pdf, other

    cs.RO eess.SY

    Stepwise Model Reconstruction of Robotic Manipulator Based on Data-Driven Method

    Authors: Dingxu Guo, Jian xu, Shu Zhang

    Abstract: Research on dynamics of robotic manipulators provides promising support for model-based control. In general, rigorous first-principles-based dynamics modeling and accurate identification of mechanism parameters are critical to achieving high precision in model-based control, while data-driven model reconstruction provides alternative approaches of the above process. Taking the level of activation… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 8 pages, 11 figures

  16. arXiv:2308.14360  [pdf, other

    cs.SD cs.AI eess.AS

    InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

    Authors: Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song

    Abstract: Music editing primarily entails the modification of instrument tracks or remixing in the whole, which offers a novel reinterpretation of the original piece through a series of operations. These music processing methods hold immense potential across various applications but demand substantial expertise. Prior methodologies, although effective for image and audio modifications, falter when directly… ▽ More

    Submitted 12 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Demo samples are available at https://musicedit.github.io/

  17. arXiv:2308.13421  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Exploiting Diverse Feature for Multimodal Sentiment Analysis

    Authors: Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo, Meng Wang

    Abstract: In this paper, we present our solution to the MuSe-Personalisation sub-challenge in the MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation aims to predict the continuous arousal and valence values of a participant based on their audio-visual, language, and physiological signal modalities data. Considering different people have personal characteristics, the main cha… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  18. arXiv:2308.10193  [pdf, other

    cs.NI cs.LG eess.SP

    ProSpire: Proactive Spatial Prediction of Radio Environment Using Deep Learning

    Authors: Shamik Sarkar, Dongning Guo, Danijela Cabric

    Abstract: Spatial prediction of the radio propagation environment of a transmitter can assist and improve various aspects of wireless networks. The majority of research in this domain can be categorized as 'reactive' spatial prediction, where the predictions are made based on a small set of measurements from an active transmitter whose radio environment is to be predicted. Emerging spectrum-sharing paradigm… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 9 pages

  19. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Mei**g Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables

  20. arXiv:2306.11021  [pdf, other

    eess.SP

    CloudBrain-MRS: An Intelligent Cloud Computing Platform for in vivo Magnetic Resonance Spectroscopy Preprocessing, Quantification, and Analysis

    Authors: Xiaodie Chen, Jiayu Li, Dicheng Chen, Yirong Zhou, Zhangren Tu, Mei** Lin, Taishan Kang, Jianzhong Lin, Tao Gong, Liuhong Zhu, Jianjun Zhou, Lin Ou-yang, Jiefeng Guo, Jiyang Dong, Di Guo, Xiaobo Qu

    Abstract: Magnetic resonance spectroscopy (MRS) is an important clinical imaging method for diagnosis of diseases. MRS spectrum is used to observe the signal intensity of metabolites or further infer their concentrations. Although the magnetic resonance vendors commonly provide basic functions of spectra plots and metabolite quantification, the widespread clinical research of MRS is still limited due to the… ▽ More

    Submitted 6 September, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: 11 pages, 12 figures

  21. arXiv:2306.09116  [pdf, other

    eess.IV cs.CV

    Accurate Airway Tree Segmentation in CT Scans via Anatomy-aware Multi-class Segmentation and Topology-guided Iterative Learning

    Authors: Puyang Wang, Dazhou Guo, Dandan Zheng, Minghui Zhang, Haogang Yu, Xin Sun, Jia Ge, Yun Gu, Le Lu, Xianghua Ye, Dakai **

    Abstract: Intrathoracic airway segmentation in computed tomography (CT) is a prerequisite for various respiratory disease analyses such as chronic obstructive pulmonary disease (COPD), asthma and lung cancer. Unlike other organs with simpler shapes or topology, the airway's complex tree structure imposes an unbearable burden to generate the "ground truth" label (up to 7 or 3 hours of manual or semi-automati… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  22. arXiv:2305.01661  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    SIA-FTP: A Spoken Instruction Aware Flight Trajectory Prediction Framework

    Authors: Dongyue Guo, Jianwei Zhang, Yi Lin

    Abstract: Ground-air negotiation via speech communication is a vital prerequisite for ensuring safety and efficiency in air traffic control (ATC) operations. However, with the increase in traffic flow, incorrect instructions caused by human factors bring a great threat to ATC safety. Existing flight trajectory prediction (FTP) approaches primarily rely on the flight status of historical trajectory, leading… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  23. arXiv:2305.00170  [pdf, other

    cs.SD eess.AS

    Enhancing multilingual speech recognition in air traffic control by sentence-level language identification

    Authors: Peng Fan, Dongyue Guo, JianWei Zhang, Bo Yang, Yi Lin

    Abstract: Automatic speech recognition (ASR) technique is becoming increasingly popular to improve the efficiency and safety of air traffic control (ATC) operations. However, the conversation between ATC controllers and pilots using multilingual speech brings a great challenge to building high-accuracy ASR systems. In this work, we present a two-stage multilingual ASR framework. The first stage is to train… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

  24. arXiv:2303.05745  [pdf, other

    eess.IV cs.CV

    Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation

    Authors: Minghui Zhang, Yangqian Wu, Hanxiao Zhang, Yulei Qin, Hao Zheng, Wen Tang, Corey Arnold, Chenhao Pei, Pengxin Yu, Yang Nan, Guang Yang, Simon Walsh, Dominic C. Marshall, Matthieu Komorowski, Puyang Wang, Dazhou Guo, Dakai **, Ya'nan Wu, Shuiqing Zhao, Runsheng Chang, Boyu Zhang, Xing Lv, Abdul Qayyum, Moona Mazher, Qi Su , et al. (11 additional authors not shown)

    Abstract: Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms drive… ▽ More

    Submitted 27 June, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: 32 pages, 16 figures. Homepage: https://atm22.grand-challenge.org/. Submitted

  25. arXiv:2212.01878  [pdf

    eess.IV

    CloudBrain-ReconAI: An Online Platform for MRI Reconstruction and Image Quality Evaluation

    Authors: Yirong Zhou, Chen Qian, Jiayu Li, Zi Wang, Yu Hu, Biao Qu, Liuhong Zhu, Jianjun Zhou, Taishan Kang, Jianzhong Lin, Qing Hong, Jiyang Dong, Di Guo, Xiaobo Qu

    Abstract: Efficient collaboration between engineers and radiologists is important for image reconstruction algorithm development and image quality evaluation in magnetic resonance imaging (MRI). Here, we develop CloudBrain-ReconAI, an online cloud computing platform, for algorithm deployment, fast and blind reader study. This platform supports online image reconstruction using state-of-the-art artificial in… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

    Comments: 8 pages, 11 figures

  26. arXiv:2211.13479  [pdf

    eess.SP

    Alternating Deep Low-Rank Approach for Exponential Function Reconstruction and Its Biomedical Magnetic Resonance Applications

    Authors: Yihui Huang, Zi Wang, Xinlin Zhang, Jian Cao, Zhangren Tu, Mei** Lin, Di Guo, Xiaobo Qu

    Abstract: Undersampling can accelerate the signal acquisition but at the cost of bringing in artifacts. Removing these artifacts is a fundamental problem in signal processing and this task is also called signal reconstruction. Through modeling signals as the superimposed exponential functions, deep learning has achieved fast and high-fidelity signal reconstruction by training a map** from the undersampled… ▽ More

    Submitted 8 August, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: 13 pages

  27. arXiv:2210.12723  [pdf

    eess.IV cs.AI cs.LG

    A Faithful Deep Sensitivity Estimation for Accelerated Magnetic Resonance Imaging

    Authors: Zi Wang, Haoming Fang, Chen Qian, Boxuan Shi, Lijun Bao, Liuhong Zhu, Jianjun Zhou, Wen** Wei, Jianzhong Lin, Di Guo, Xiaobo Qu

    Abstract: Magnetic resonance imaging (MRI) is an essential diagnostic tool that suffers from prolonged scan time. To alleviate this limitation, advanced fast MRI technology attracts extensive research interests. Recent deep learning has shown its great potential in improving image quality and reconstruction speed. Faithful coil sensitivity estimation is vital for MRI reconstruction. However, most deep learn… ▽ More

    Submitted 24 December, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

    Comments: 12 pages, 13 figures, 7 tables

  28. arXiv:2210.11388  [pdf

    eess.IV cs.CV

    Physics-informed Deep Diffusion MRI Reconstruction with Synthetic Data: Break Training Data Bottleneck in Artificial Intelligence

    Authors: Chen Qian, Yuncheng Gao, Mingyang Han, Zi Wang, Dan Ruan, Yu Shen, Ya** Wu, Yirong Zhou, Chengyan Wang, Boyu Jiang, Ran Tao, Zhigang Wu, Jiazheng Wang, Liuhong Zhu, Yi Guo, Taishan Kang, Jianzhong Lin, Tao Gong, Chen Yang, Guoqiang Fei, Mei** Lin, Di Guo, Jianjun Zhou, Meiyun Wang, Xiaobo Qu

    Abstract: Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion MRI (DWI) acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot motion-induced arti… ▽ More

    Submitted 5 February, 2024; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 23 pages, 16 figures

  29. arXiv:2207.05042  [pdf, other

    cs.CV cs.MM cs.SD eess.AS eess.IV

    Audio-Visual Segmentation

    Authors: **xing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, **g Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

    Abstract: We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with… ▽ More

    Submitted 17 February, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Code is available at https://github.com/OpenNLPLab/AVSBench

  30. arXiv:2204.09917  [pdf, other

    cs.SD cs.AI eess.AS

    SinTra: Learning an inspiration model from a single multi-track music segment

    Authors: Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng

    Abstract: In this paper, we propose SinTra, an auto-regressive sequential generative model that can learn from a single multi-track music segment, to generate coherent, aesthetic, and variable polyphonic music of multi-instruments with an arbitrary length of bar. For this task, to ensure the relevance of generated samples and training music, we present a novel pitch-group representation. SinTra, consisting… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  31. arXiv:2203.14559  [pdf

    eess.SP

    A Paired Phase and Magnitude Reconstruction for Advanced Diffusion-Weighted Imaging

    Authors: Chen Qian, Zi Wang, Xinlin Zhang, Boxuan Shi, Boyu Jiang, Ran Tao, **g Li, Yuwei Ge, Taishan Kang, Jianzhong Lin, Di Guo, Xiaobo Qu

    Abstract: Objective: Multi-shot interleaved echo planer imaging can obtain diffusion-weighted images (DWI) with high spatial resolution and low distortion, but suffers from ghost artifacts introduced by phase variations between shots. In this work, we aim at solving the challenging reconstructions under inter-shot motions between shots and a low signal-to-noise ratio. Methods: An explicit phase model with p… ▽ More

    Submitted 8 December, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: 12 pages, 14 figures

  32. arXiv:2201.12358  [pdf, other

    cs.LG eess.SY

    EVBattery: A Large-Scale Electric Vehicle Dataset for Battery Health and Capacity Estimation

    Authors: Haowei He, **gzhao Zhang, Yanan Wang, Benben Jiang, Shaobo Huang, Chen Wang, Yang Zhang, Gengang Xiong, Xuebing Han, Dongxu Guo, Guannan He, Minggao Ouyang

    Abstract: Electric vehicles (EVs) play an important role in reducing carbon emissions. As EV adoption accelerates, safety issues caused by EV batteries have become an important research topic. In order to benchmark and develop data-driven methods for this task, we introduce a large and comprehensive dataset of EV batteries. Our dataset includes charging records collected from hundreds of EVs from three manu… ▽ More

    Submitted 1 November, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: 15 pages, 8 figures

  33. arXiv:2112.04721  [pdf

    eess.IV cs.AI cs.CV physics.med-ph

    One-dimensional Deep Low-rank and Sparse Network for Accelerated MRI

    Authors: Zi Wang, Chen Qian, Di Guo, Hongwei Sun, Rushuai Li, Bo Zhao, Xiaobo Qu

    Abstract: Deep learning has shown astonishing performance in accelerated magnetic resonance imaging (MRI). Most state-of-the-art deep learning reconstructions adopt the powerful convolutional neural network and perform 2D convolution since many magnetic resonance images or their corresponding k-space are in 2D. In this work, we present a new approach that explores the 1D convolution, making the deep network… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: 16 pages

  34. arXiv:2111.02654  [pdf, other

    cs.SD cs.CL eess.AS

    Speech recognition for air traffic control via feature learning and end-to-end training

    Authors: Peng Fan, Dongyue Guo, Yi Lin, Bo Yang, Jianwei Zhang

    Abstract: In this work, we propose a new automatic speech recognition (ASR) system based on feature learning and an end-to-end training procedure for air traffic control (ATC) systems. The proposed model integrates the feature learning block, recurrent neural network (RNN), and connectionist temporal classification loss to build an end-to-end ASR model. Facing the complex environments of ATC speech, instead… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: Submitted to IEEE ICASSP 2022

  35. arXiv:2111.02041  [pdf, other

    cs.SD cs.CL eess.AS

    A Comparative Study of Speaker Role Identification in Air Traffic Communication Using Deep Learning Approaches

    Authors: Dongyue Guo, Jianwei Zhang, Bo Yang, Yi Lin

    Abstract: Automatic spoken instruction understanding (SIU) of the controller-pilot conversations in the air traffic control (ATC) requires not only recognizing the words and semantics of the speech but also determining the role of the speaker. However, few of the published works on the automatic understanding systems in air traffic communication focus on speaker role identification (SRI). In this paper, we… ▽ More

    Submitted 22 August, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: This work has been submitted to the ACM TALLIP for possible publication

  36. arXiv:2111.01544  [pdf

    eess.IV cs.CV physics.med-ph

    Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study

    Authors: Dazhou Guo, Jia Ge, Xianghua Ye, Senxiang Yan, Yi Xin, Yuchen Song, Bing-shen Huang, Tsung-Min Hung, Zhuotun Zhu, Ling Peng, Yan** Ren, Rui Liu, Gong Zhang, Mengyuan Mao, Xiaohua Chen, Zhongjie Lu, Wenxiang Li, Yuzhen Chen, Lingyun Huang, **g Xiao, Adam P. Harrison, Le Lu, Chien-Yu Lin, Dakai **, Tsung-Ying Ho

    Abstract: Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose di… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  37. arXiv:2110.10965  [pdf, other

    eess.IV cs.CV

    2020 CATARACTS Semantic Segmentation Challenge

    Authors: Imanol Luengo, Maria Grammatikopoulou, Rahim Mohammadi, Chris Walsh, Chinedu Innocent Nwoye, Deepak Alapatt, Nicolas Padoy, Zhen-Liang Ni, Chen-Chen Fan, Gui-Bin Bian, Zeng-Guang Hou, Heon** Ha, Jiacheng Wang, Haojie Wang, Dong Guo, Lu Wang, Guotai Wang, Mobarakol Islam, Bharat Giddwani, Ren Hongliang, Theodoros Pissas, Claudio Ravasio, Martin Huber, Jeremy Birch, Joan M. Nunez Do Rio , et al. (15 additional authors not shown)

    Abstract: Surgical scene segmentation is essential for anatomy and instrument localization which can be further used to assess tissue-instrument interactions during a surgical procedure. In 2017, the Challenge on Automatic Tool Annotation for cataRACT Surgery (CATARACTS) released 50 cataract surgery videos accompanied by instrument usage annotations. These annotations included frame-level instrument presenc… ▽ More

    Submitted 24 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

  38. arXiv:2109.11572  [pdf, other

    eess.IV cs.CV

    SAME: Deformable Image Registration based on Self-supervised Anatomical Embeddings

    Authors: Fengze Liu, Ke Yan, Adam Harrison, Dazhou Guo, Le Lu, Alan Yuille, Lingyun Huang, Guotong Xie, **g Xiao, Xianghua Ye, Dakai **

    Abstract: In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm SAM, which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAME, which breaks down image registration into three steps: affine transformation, coarse deformation, and deep d… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

  39. arXiv:2109.09271  [pdf, ps, other

    eess.IV cs.CV

    DeepStationing: Thoracic Lymph Node Station Parsing in CT Scans using Anatomical Context Encoding and Key Organ Auto-Search

    Authors: Dazhou Guo, Xianghua Ye, Jia Ge, Xing Di, Le Lu, Lingyun Huang, Guotong Xie, **g Xiao, Zhongjie Liu, Ling Peng, Senxiang Yan, Dakai **

    Abstract: Lymph node station (LNS) delineation from computed tomography (CT) scans is an indispensable step in radiation oncology workflow. High inter-user variabilities across oncologists and prohibitive laboring costs motivated the automated approach. Previous works exploit anatomical priors to infer LNS based on predefined ad-hoc margins. However, without voxel-level supervision, the performance is sever… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

  40. Multi-Rate Nyquist-SCM for C-Band 100Gbit/s Signal over 50km Dispersion-Uncompensated Link

    Authors: Haide Wang, Ji Zhou, **long Wei, Dong Guo, Yuanhua Feng, Wei** Liu, Changyuan Yu, Dawei Wang, Zhaohui Li

    Abstract: In this paper, to the best of our knowledge, we propose the first multi-rate Nyquist-subcarriers modulation (SCM) for C-band 100Gbit/s signal transmission over 50km dispersion-uncompensated link. Chromatic dispersion (CD) introduces severe spectral nulls on optical double-sideband signal, which greatly degrades the performance of intensity-modulation and direct-detection systems. Based on the prio… ▽ More

    Submitted 28 November, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

    Comments: This paper has been accepted by Journal of Lightwave Techonlogy

  41. arXiv:2107.11650  [pdf, other

    eess.IV eess.SP

    Accelerated MRI Reconstruction with Separable and Enhanced Low-Rank Hankel Regularization

    Authors: Xinlin Zhang, Hengfa Lu, Di Guo, Zongying Lai, Huihui Ye, Xi Peng, Bo Zhao, Xiaobo Qu

    Abstract: The combination of the sparse sampling and the low-rank structured matrix reconstruction has shown promising performance, enabling a significant reduction of the magnetic resonance imaging data acquisition time. However, the low-rank structured approaches demand considerable memory consumption and are time-consuming due to a noticeable number of matrix operations performed on the huge-size block H… ▽ More

    Submitted 24 July, 2021; originally announced July 2021.

    Comments: 17 pages, 17 figures

  42. arXiv:2104.08824  [pdf

    eess.IV

    XCloud-pFISTA: A Medical Intelligence Cloud for Accelerated MRI

    Authors: Yirong Zhou, Chen Qian, Yi Guo, Zi Wang, Jian Wang, Biao Qu, Di Guo, Yongfu You, Xiaobo Qu

    Abstract: Machine learning and artificial intelligence have shown remarkable performance in accelerated magnetic resonance imaging (MRI). Cloud computing technologies have great advantages in building an easily accessible platform to deploy advanced algorithms. In this work, we develop an open-access, easy-to-use and high-performance medical intelligence cloud computing platform (XCloud-pFISTA) to reconstru… ▽ More

    Submitted 10 June, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

  43. arXiv:2102.08535  [pdf

    cs.CL cs.SD eess.AS

    ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

    Authors: Yi Lin, Bo Yang, Linchao Li, Dongyue Guo, Jianwei Zhang, Hu Chen, Yi Zhang

    Abstract: In this paper, a multilingual end-to-end framework, called as ATCSpeechNet, is proposed to tackle the issue of translating communication speech into human-readable text in air traffic control (ATC) systems. In the proposed framework, we focus on integrating the multilingual automatic speech recognition (ASR) into one model, in which an end-to-end paradigm is developed to convert speech waveform in… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: An improved work based on our previous Interspeech 2020 paper (https://isca-speech.org/archive/Interspeech_2020/pdfs/1020.pdf)

  44. arXiv:2101.11442  [pdf

    physics.med-ph cs.LG eess.IV

    Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo Data

    Authors: Dicheng Chen, Wanqi Hu, Huiting Liu, Yirong Zhou, Tianyu Qiu, Yihui Huang, Zi Wang, Jiazheng Wang, Liangjie Lin, Zhigang Wu, Hao Chen, Xi Chen, Gen Yan, Di Guo, Jianzhong Lin, Xiaobo Qu

    Abstract: Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a comm… ▽ More

    Submitted 25 October, 2022; v1 submitted 26 January, 2021; originally announced January 2021.

  45. arXiv:2012.14830  [pdf

    cs.LG eess.IV physics.bio-ph physics.med-ph

    A Sparse Model-inspired Deep Thresholding Network for Exponential Signal Reconstruction -- Application in Fast Biological Spectroscopy

    Authors: Zi Wang, Di Guo, Zhangren Tu, Yihui Huang, Yirong Zhou, Jian Wang, Liubin Feng, Donghai Lin, Yongfu You, Tatiana Agback, Vladislav Orekhov, Xiaobo Qu

    Abstract: The non-uniform sampling is a powerful approach to enable fast acquisition but requires sophisticated reconstruction algorithms. Faithful reconstruction from partial sampled exponentials is highly expected in general signal processing and many applications. Deep learning has shown astonishing potential in this field but many existing problems, such as lack of robustness and explainability, greatly… ▽ More

    Submitted 17 January, 2022; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: 30 pages

  46. Annotation-Efficient Learning for Medical Image Segmentation based on Noisy Pseudo Labels and Adversarial Learning

    Authors: Lu Wang, Dong Guo, Guotai Wang, Shaoting Zhang

    Abstract: Despite that deep learning has achieved state-of-the-art performance for medical image segmentation, its success relies on a large set of manually annotated images for training that are expensive to acquire. In this paper, we propose an annotation-efficient learning framework for segmentation tasks that avoids annotations of training images, where we use an improved Cycle-Consistent Generative Adv… ▽ More

    Submitted 28 December, 2020; originally announced December 2020.

    Comments: 13 pages, 15 figures

  47. arXiv:2012.10682  [pdf, other

    eess.SP cs.IT cs.LG

    Deep Reinforcement Learning for Joint Spectrum and Power Allocation in Cellular Networks

    Authors: Yasar Sinan Nasir, Dongning Guo

    Abstract: A wireless network operator typically divides the radio spectrum it possesses into a number of subbands. In a cellular network those subbands are then reused in many cells. To mitigate co-channel interference, a joint spectrum and power allocation problem is often formulated to maximize a sum-rate objective. The best known algorithms for solving such problems generally require instantaneous global… ▽ More

    Submitted 19 December, 2020; originally announced December 2020.

    Comments: 7 pages, 3 figures, to be submitted. To reproduce the results please see https://github.com/sinannasir/Spectrum-Power-Allocation

  48. arXiv:2009.06681  [pdf, other

    eess.SP cs.IT stat.ML

    Deep Actor-Critic Learning for Distributed Power Control in Wireless Mobile Networks

    Authors: Yasar Sinan Nasir, Dongning Guo

    Abstract: Deep reinforcement learning offers a model-free alternative to supervised deep learning and classical optimization for solving the transmit power control problem in wireless networks. The multi-agent deep reinforcement learning approach considers each transmitter as an individual learning agent that determines its transmit power level by observing the local wireless environment. Following a certai… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

    Comments: 5 pages, 4 figures, to appear in the 54th Annual IEEE Asilomar Conference on Signals, Systems, and Computers, Nov 2020. This is an invited paper to the session Reinforcement Learning and Bandits for Communication Systems. To reproduce the results please see https://github.com/sinannasir/Power-Control-asilomar

  49. arXiv:2008.11870  [pdf, other

    eess.IV cs.CV

    Lymph Node Gross Tumor Volume Detection and Segmentation via Distance-based Gating using 3D CT/PET Imaging in Radiotherapy

    Authors: Zhuotun Zhu, Dakai **, Ke Yan, Tsung-Ying Ho, Xianghua Ye, Dazhou Guo, Chun-Hung Chao, **g Xiao, Alan Yuille, Le Lu

    Abstract: Finding, identifying and segmenting suspicious cancer metastasized lymph nodes from 3D multi-modality imaging is a clinical task of paramount importance. In radiotherapy, they are referred to as Lymph Node Gross Tumor Volume (GTVLN). Determining and delineating the spread of GTVLN is essential in defining the corresponding resection and irradiating regions for the downstream workflows of surgical… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: MICCAI2020

  50. Time irreversibility and amplitude irreversibility measures for nonequilibrium processes

    Authors: Wenpo Yao, Jun Wang, Matjaz Perc, Wenli Yao, Jiafei Dai, Daqing Guo, Dezhong Yao

    Abstract: Time irreversibility, which characterizes nonequilibrium processes, can be measured based on the probabilistic differences between symmetric vectors. To simplify the quantification of time irreversibility, symmetric permutations instead of symmetric vectors have been employed in some studies. However, although effective in practical applications, this approach is conceptually incorrect. Time irrev… ▽ More

    Submitted 30 December, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: 16 pages, 6 figures

    Journal ref: Commun. Nonlinear Sci. Numer. Simulat. 96, 105688 (2021)