Skip to main content

Showing 1–50 of 165 results for author: Zhang, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.10838  [pdf, other

    eess.SP

    Digital Wireless Image Transmission via Distribution Matching

    Authors: Pu**g Yang, Guangyi Zhang, Yunlong Cai

    Abstract: Deep learning-based joint source-channel coding (JSCC) is emerging as a potential technology to meet the demand for effective data transmission, particularly for image transmission. Nevertheless, most existing advancements only consider analog transmission, where the channel symbols are continuous, making them incompatible with practical digital communication systems. In this work, we address this… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  2. arXiv:2406.05437  [pdf, ps, other

    eess.SP

    From Analog to Digital: Multi-Order Digital Joint Coding-Modulation for Semantic Communication

    Authors: Guangyi Zhang, Pu**g Yang, Yunlong Cai, Qiyu Hu, Guanding Yu

    Abstract: Recent studies in joint source-channel coding (JSCC) have fostered a fresh paradigm in end-to-end semantic communication. Despite notable performance achievements, present initiatives in building semantic communication systems primarily hinge on the transmission of continuous channel symbols, thus presenting challenges in compatibility with established digital systems. In this paper, we introduce… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  3. arXiv:2405.18712  [pdf, other

    eess.SY

    Identifying the Most Influential Driver Nodes for Pinning Control of Multi-Agent Systems with Time-Varying Topology

    Authors: Guangrui Zhang, Zhaohui Liu, Xinghuo Yu, Mahdi Jalili

    Abstract: Identifying the most influential driver nodes to guarantee the fastest synchronization speed is a key topic in pinning control of multi-agent systems. This paper develops a methodology to find the most influential pinning nodes under time-varying topologies. First, we provide the pinning control synchronization conditions of multi-agent systems. Second, a method is proposed to identify the best dr… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2405.18333  [pdf, other

    eess.SY

    On the analysis of a higher-order Lotka-Volterra model: an application of S-tensors and the polynomial complementarity problem

    Authors: Shaoxuan Cui, Qi Zhao, Guofeng Zhang, Hildeberto Jardón-Kojakhmetov, Ming Cao

    Abstract: It is known that the effect of species' density on species' growth is non-additive in real ecological systems. This challenges the conventional Lotka-Volterra model, where the interactions are always pairwise and their effects are additive. To address this challenge, we introduce HOIs (Higher-Order Interactions) which are able to capture, for example, the indirect effect of one species on a second… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  6. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  7. arXiv:2404.06693  [pdf, other

    cs.CV eess.IV

    Binomial Self-compensation for Motion Error in Dynamic 3D Scanning

    Authors: Geyou Zhang, Ce Zhu, Kai Liu

    Abstract: Phase shifting profilometry (PSP) is favored in high-precision 3D scanning due to its high accuracy, robustness, and pixel-wise property. However, a fundamental assumption of PSP that the object should remain static is violated in dynamic measurement, making PSP susceptible to object moving, resulting in ripple-like errors in the point clouds. We propose a pixel-wise and frame-wise loopable binomi… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  8. arXiv:2404.06393  [pdf, other

    cs.SD cs.AI eess.AS

    MuPT: A Generative Symbolic Music Pretrained Transformer

    Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

    Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  9. arXiv:2403.19944  [pdf, other

    cs.CV eess.IV

    Binarized Low-light Raw Video Enhancement

    Authors: Gengchen Zhang, Yulun Zhang, Xin Yuan, Ying Fu

    Abstract: Recently, deep neural networks have achieved excellent performance on low-light raw video enhancement. However, they often come with high computational complexity and large memory costs, which hinder their applications on resource-limited devices. In this paper, we explore the feasibility of applying the extremely compact binary neural network (BNN) to low-light raw video enhancement. Nevertheless… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  10. arXiv:2403.03416  [pdf, other

    eess.SY

    On discrete-time polynomial dynamical systems on hypergraphs

    Authors: Shaoxuan Cui, Guofeng Zhang, Hildeberto Jardón-Kojakhmetov, Ming Cao

    Abstract: This paper studies the stability of discrete-time polynomial dynamical systems on hypergraphs by utilizing the Perron-Frobenius theorem for nonnegative tensors with respect to the tensors Z-eigenvalues and Z-eigenvectors. Firstly, for a multilinear polynomial system on a uniform hypergraph, we study the stability of the origin of the corresponding systems. Next, we extend our results to non-homoge… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.03652

  11. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, **gcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  12. arXiv:2401.14614  [pdf, other

    eess.SP

    Feature Allocation for Semantic Communication with Space-Time Importance Awareness

    Authors: Kequan Zhou, Guangyi Zhang, Yunlong Cai, Qiyu Hu, Guanding Yu, A. Lee Swindlehurst

    Abstract: In the realm of semantic communication, the significance of encoded features can vary, while wireless channels are known to exhibit fluctuations across multiple subchannels in different domains. Consequently, critical features may traverse subchannels with poor states, resulting in performance degradation. To tackle this challenge, we introduce a framework called Feature Allocation for Semantic Tr… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  13. arXiv:2401.03652  [pdf, other

    eess.SY

    On Metzler positive systems on hypergraphs

    Authors: Shaoxuan Cui, Guofeng Zhang, Hildeberto Jardón-Kojakhmetov, Ming Cao

    Abstract: In graph-theoretical terms, an edge in a graph connects two vertices while a hyperedge of a hypergraph connects any more than one vertices. If the hypergraph's hyperedges further connect the same number of vertices, it is said to be uniform. In algebraic graph theory, a graph can be characterized by an adjacency matrix, and similarly, a uniform hypergraph can be characterized by an adjacency tenso… ▽ More

    Submitted 5 June, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  14. arXiv:2312.16082  [pdf, ps, other

    quant-ph eess.SY

    The Quantum Kalman Decomposition: A Gramian Matrix Approach

    Authors: Guofeng Zhang, **ghao Li, Zhiyuan Dong, Ian R. Petersen

    Abstract: The Kalman canonical form for quantum linear systems was derived in \cite{ZGPG18}. The purpose of this paper is to present an alternative derivation by means of a Gramian matrix approach. Controllability and observability Gramian matrices are defined for linear quantum systems, which are used to characterize various subspaces. Based on these characterizations, real orthogonal and block symplectic… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 22 pages, 2 figures, submitted for publication. Comments are welcome

  15. arXiv:2312.15741  [pdf

    eess.SY cs.LG

    Improving the Accuracy and Interpretability of Neural Networks for Wind Power Forecasting

    Authors: Wenlong Liao, Fernando Porte-Agel, Jiannong Fang, Birgitte Bak-Jensen, Zhe Yang, Gonghao Zhang

    Abstract: Deep neural networks (DNNs) are receiving increasing attention in wind power forecasting due to their ability to effectively capture complex patterns in wind data. However, their forecasted errors are severely limited by the local optimal weight issue in optimization algorithms, and their forecasted behavior also lacks interpretability. To address these two challenges, this paper firstly proposes… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 10 pages, 10 figures

  16. arXiv:2312.01403  [pdf, other

    eess.SY

    OplixNet: Towards Area-Efficient Optical Split-Complex Networks with Real-to-Complex Data Assignment and Knowledge Distillation

    Authors: Ruidi Qiu, Amro Eldebiky, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, Bing Li

    Abstract: Having the potential for high speed, high throughput, and low energy cost, optical neural networks (ONNs) have emerged as a promising candidate for accelerating deep learning tasks. In conventional ONNs, light amplitudes are modulated at the input and detected at the output. However, the light phases are still ignored in conventional structures, although they can also carry information for computi… ▽ More

    Submitted 15 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: Accepted by Design Automation and Test in Europe (DATE) 2024

  17. arXiv:2311.15583  [pdf, other

    cs.LG eess.SP

    A Simple Geometric-Aware Indoor Positioning Interpolation Algorithm Based on Manifold Learning

    Authors: Suorong Yang, Geng Zhang, Jian Zhao, Furao Shen

    Abstract: Interpolation methodologies have been widely used within the domain of indoor positioning systems. However, existing indoor positioning interpolation algorithms exhibit several inherent limitations, including reliance on complex mathematical models, limited flexibility, and relatively low precision. To enhance the accuracy and efficiency of indoor positioning interpolation techniques, this paper p… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  18. arXiv:2311.15309  [pdf, other

    eess.IV

    Deep Refinement-Based Joint Source Channel Coding over Time-Varying Channels

    Authors: Junyu Pan, Hanlei Li, Guangyi Zhang, Yunlong Cai, Guanding Yu

    Abstract: In recent developments, deep learning (DL)-based joint source-channel coding (JSCC) for wireless image transmission has made significant strides in performance enhancement. Nonetheless, the majority of existing DL-based JSCC methods are tailored for scenarios featuring stable channel conditions, notably a fixed signal-to-noise ratio (SNR). This specialization poses a limitation, as their performan… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  19. arXiv:2311.12427  [pdf

    eess.AS

    A Distributed Algorithm for Personal Sound Zones Systems

    Authors: Sipei Zhao, Guoqiang Zhang, Eva Cheng, Ian S. Burnett

    Abstract: A Personal Sound Zones (PSZ) system aims to generate two or more independent listening zones that allow multiple users to listen to different music/audio content in a shared space without the need for wearing headphones. Most existing studies assume that the acoustic paths between loudspeakers and microphones are measured beforehand in a stationary environment. Recently, adaptive PSZ systems have… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  20. arXiv:2311.12316  [pdf

    cs.CV cs.AI eess.IV

    Generating Progressive Images from Pathological Transitions via Diffusion Model

    Authors: Zeyu Liu, Tianyi Zhang, Yufang He, Yunlu Feng, Yu Zhao, Guanglei Zhang

    Abstract: Deep learning is widely applied in computer-aided pathological diagnosis, which alleviates the pathologist workload and provide timely clinical analysis. However, most models generally require large-scale annotated data for training, which faces challenges due to the sampling and annotation scarcity in pathological images. The rapid develo** generative models shows potential to generate more tra… ▽ More

    Submitted 9 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 13 pages, 9 figs, 4 tabs

  21. arXiv:2311.10568  [pdf, other

    eess.IV cs.CV

    Phase Guided Light Field for Spatial-Depth High Resolution 3D Imaging

    Authors: Geyou Zhang, Ce Zhu, Kai Liu, Yipeng Liu

    Abstract: On 3D imaging, light field cameras typically are of single shot, and however, they heavily suffer from low spatial resolution and depth accuracy. In this paper, by employing an optical projector to project a group of single high-frequency phase-shifted sinusoid patterns, we propose a phase guided light field algorithm to significantly improve both the spatial and depth resolutions for off-the-shel… ▽ More

    Submitted 9 April, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

  22. arXiv:2311.06712  [pdf, other

    eess.IV

    PuzzleTuning: Explicitly Bridge Pathological and Natural Image with Puzzles

    Authors: Tianyi Zhang, Shangqing Lyu, Yanli Lei, Sicheng Chen, Nan Ying, Yufang He, Yu Zhao, Yunlu Feng, Hwee Kuan Lee, Guanglei Zhang

    Abstract: Pathological image analysis is a crucial field in computer vision. Due to the annotation scarcity in the pathological field, pre-training with self-supervised learning (SSL) is widely applied to learn on unlabeled images. However, the current SSL-based pathological pre-training: (1) does not explicitly explore the essential focuses of the pathological field, and (2) does not effectively bridge wit… ▽ More

    Submitted 22 April, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

    Comments: 13 pages, 9 figures, 8 tables

  23. arXiv:2311.03863  [pdf

    eess.SY cs.LG

    An Explainable Framework for Machine learning-Based Reactive Power Optimization of Distribution Network

    Authors: Wenlong Liao, Benjamin Schäfer, Dalin Qin, Gonghao Zhang, Zhixian Wang, Zhe Yang

    Abstract: To reduce the heavy computational burden of reactive power optimization of distribution networks, machine learning models are receiving increasing attention. However, most machine learning models (e.g., neural networks) are usually considered as black boxes, making it challenging for power system operators to identify and comprehend potential biases or errors in the decision-making process of mach… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: It was submitted to the 23rd Power Systems Computation Conference (PSCC 2024) on Sept.2023

  24. arXiv:2310.17997  [pdf

    physics.optics cs.AI eess.IV

    Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy

    Authors: Hui Sun, Hao Luo, Feifei Wang, Qingjiu Chen, Meng Chen, Xiaoduo Wang, Haibo Yu, Guanglie Zhang, Lianqing Liu, Jian** Wang, Dapeng Wu, Wen Jung Li

    Abstract: Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the map** relationship between op… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 13 pages,7 figures

  25. arXiv:2310.17902  [pdf

    eess.IV

    CPIA Dataset: A Comprehensive Pathological Image Analysis Dataset for Self-supervised Learning Pre-training

    Authors: Nan Ying, Yanli Lei, Tianyi Zhang, Shangqing Lyu, Chunhui Li, Sicheng Chen, Zeyu Liu, Yu Zhao, Guanglei Zhang

    Abstract: Pathological image analysis is a crucial field in computer-aided diagnosis, where deep learning is widely applied. Transfer learning using pre-trained models initialized on natural images has effectively improved the downstream pathological performance. However, the lack of sophisticated domain-specific pathological initialization hinders their potential. Self-supervised learning (SSL) enables pre… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  26. arXiv:2310.15011  [pdf, ps, other

    cs.IT eess.SP

    Interference Management by Harnessing Multi-Domain Resources in Spectrum-Sharing Aided Satellite-Ground Integrated Networks

    Authors: Xiao** Ding, Yue Lei, Yulong Zou, Gengxin Zhang, Lajos Hanzo

    Abstract: A spectrum-sharing satellite-ground integrated network is conceived, consisting of a pair of non-geostationary orbit (NGSO) constellations and multiple terrestrial base stations, which impose the co-frequency interference (CFI) on each other. The CFI may increase upon increasing the number of satellites. To manage the potentially severe interference, we propose to rely on joint multi-domain resour… ▽ More

    Submitted 29 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Submitted to IEEE Transactions on Vehicular Technology

  27. arXiv:2310.07323  [pdf

    cs.LG eess.SY

    Multichannel consecutive data cross-extraction with 1DCNN-attention for diagnosis of power transformer

    Authors: Wei Zheng, Guogang Zhang, Chenchen Zhao, Qianqian Zhu

    Abstract: Power transformer plays a critical role in grid infrastructure, and its diagnosis is paramount for maintaining stable operation. However, the current methods for transformer diagnosis focus on discrete dissolved gas analysis, neglecting deep feature extraction of multichannel consecutive data. The unutilized sequential data contains the significant temporal information reflecting the transformer c… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  28. arXiv:2309.10510  [pdf, other

    eess.SY cs.NE

    Logic Design of Neural Networks for High-Throughput and Low-Power Applications

    Authors: Kangwei Xu, Grace Li Zhang, Ulf Schlichtmann, Bing Li

    Abstract: Neural networks (NNs) have been successfully deployed in various fields. In NNs, a large number of multiplyaccumulate (MAC) operations need to be performed. Most existing digital hardware platforms rely on parallel MAC units to accelerate these MAC operations. However, under a given area constraint, the number of MAC units in such platforms is limited, so MAC units have to be reused to perform MAC… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: accepted by ASPDAC 2024

  29. arXiv:2309.08730  [pdf, other

    eess.AS cs.AI cs.CL cs.MM cs.SD

    MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

    Authors: Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

    Abstract: Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio… ▽ More

    Submitted 2 April, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

    Journal ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics

  30. arXiv:2309.03815  [pdf, other

    cs.CV cs.MM eess.IV

    T2IW: Joint Text to Image & Watermark Generation

    Authors: An-An Liu, Guokai Zhang, Yuting Su, Ning Xu, Yongdong Zhang, Lanjun Wang

    Abstract: Recent developments in text-conditioned image generative models have revolutionized the production of realistic results. Unfortunately, this has also led to an increase in privacy violations and the spread of false information, which requires the need for traceability, privacy protection, and other security measures. However, existing text-to-image paradigms lack the technical capabilities to link… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  31. arXiv:2309.02529  [pdf, other

    eess.IV

    Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation

    Authors: Haisheng Fu, Feng Liang, Jie Liang, Yongqiang Wang, Guohe Zhang, **gning Han

    Abstract: Deep learning-based image compression has made great progresses recently. However, many leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we introduce four techniques to bala… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Submitted to Trans. Journal

  32. arXiv:2309.01171  [pdf, other

    eess.IV cs.CV

    Deep Unfolding Convolutional Dictionary Model for Multi-Contrast MRI Super-resolution and Reconstruction

    Authors: Pengcheng Lei, Faming Fang, Guixu Zhang, Ming Xu

    Abstract: Magnetic resonance imaging (MRI) tasks often involve multiple contrasts. Recently, numerous deep learning-based multi-contrast MRI super-resolution (SR) and reconstruction methods have been proposed to explore the complementary information from the multi-contrast images. However, these methods either construct parameter-sharing networks or manually design fusion rules, failing to accurately model… ▽ More

    Submitted 23 January, 2024; v1 submitted 3 September, 2023; originally announced September 2023.

  33. arXiv:2308.11126  [pdf, other

    eess.SP

    Alleviating Distortion Accumulation in Multi-Hop Semantic Communication

    Authors: Guangyi Zhang, Qiyu Hu, Yunlong Cai, Guanding Yu

    Abstract: Recently, semantic communication has been investigated to boost the performance of end-to-end image transmission systems. However, existing semantic approaches are generally based on deep learning and belong to lossy transmission. Consequently, as the receiver continues to transmit received images to another device, the distortion of images accumulates with each transmission. Unfortunately, most r… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  34. arXiv:2307.16679  [pdf, other

    eess.AS cs.CL cs.LG

    Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

    Authors: Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

    Abstract: Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space. Aiming to improve those assumptions, Normalizing Flows and Diffusion Probabilistic Models were recently proposed as alternatives. In this paper, we compare traditional L1/L2-based approaches to diffusion and flow-based approaches for the tasks of prosod… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, 5 tables. Interspeech 2023

  35. arXiv:2307.11778  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge

    Authors: Xiaoxiao Li, Gaosheng Zhang, An Zhu, Weiyong Li, Shuming Fang, Xiaoyue Yang, Jianchao Zhu

    Abstract: This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  36. arXiv:2307.05161  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    On the Effectiveness of Speech Self-supervised Learning for Music

    Authors: Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu

    Abstract: Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Neverthele… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  37. arXiv:2306.17103  [pdf, other

    cs.CL cs.SD eess.AS

    LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

    Authors: Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi LI, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenhu Chen, Wei Xue, Yike Guo

    Abstract: We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language mo… ▽ More

    Submitted 21 November, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 9 pages, 2 figures, 5 tables, accepted by ISMIR 2023

  38. arXiv:2306.15534  [pdf, other

    eess.SP

    SCAN: Semantic Communication with Adaptive Channel Feedback

    Authors: Guangyi Zhang, Qiyu Hu, Yunlong Cai, Guanding Yu

    Abstract: In existing semantic communication systems for image transmission, some images are generally reconstructed with considerably low quality. As a result, the reliable transmission of each image cannot be guaranteed, bringing significant uncertainty to semantic communication systems. To address this issue, we propose a novel performance metric to characterize the reliability of semantic communication… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

  39. arXiv:2306.10548  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    MARBLE: Music Audio Representation Benchmark for Universal Evaluation

    Authors: Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

    Abstract: In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue… ▽ More

    Submitted 23 November, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: camera-ready version for NeurIPS 2023

  40. arXiv:2306.07787  [pdf, other

    quant-ph eess.SY physics.atom-ph

    Quantum coherent feedback control of an N-level atom with multiple excitations

    Authors: Hai** Ding, Guofeng Zhang

    Abstract: The purpose of this paper is to study the dynamics of a quantum coherent feedback network, where an $N$-level atom is coupled with a cavity and the cavity is also coupled with single or multiple parallel waveguides. When the atom is initialized at the highest energy level, it can emit multiple photons into the cavity, and the photons can be further transmitted to the waveguides and re-interact wit… ▽ More

    Submitted 23 May, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 16 pages, 6 figures

  41. arXiv:2306.07505  [pdf

    q-bio.TO eess.IV

    Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

    Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

    Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  42. arXiv:2306.06440  [pdf, ps, other

    cs.NI cs.CR eess.SY

    Epidemic spreading in wireless sensor networks with node sleep scheduling

    Authors: Yanqing Wu, Cunlai Pu, Gongxuan Zhang, Lunbo Li, Yongxiang Xia, Chengyi Xia

    Abstract: Wireless Sensor Networks (WSNs) have become widely used in various fields like environmental monitoring, smart agriculture, and health care. However, their extensive usage also introduces significant vulnerabilities to cyber viruses. Addressing this security issue in WSNs is very challenging due to their inherent limitations in energy and bandwidth to implement real-time security measures. To tack… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  43. arXiv:2306.06373  [pdf, other

    quant-ph eess.SY physics.atom-ph

    Quantum feedback control of a two-atom network closed by a semi-infinite waveguide

    Authors: Hai** Ding, Guofeng Zhang, Mu-Tian Cheng, Guoqing Cai

    Abstract: The purpose of this paper is to study the dynamics of a coherent feedback network where two two-level atoms are coupled with a semi-infinite waveguide. In this set-up, the two-level atoms can work as the photon source, and the photons can be emitted into the waveguide via the nonchiral or chiral couplings between the atom and the waveguide, according to whether the coupling strengths between the a… ▽ More

    Submitted 2 May, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

  44. arXiv:2306.00107  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

    Authors: Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghao Xiao, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Zili Wang, Yike Guo, Jie Fu

    Abstract: Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is partially due to the distinctive challenges associated with modelling musical knowledge, part… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: accepted by ICLR 2024

  45. Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

    Authors: Yusheng Tian, Guangyan Zhang, Tan Lee

    Abstract: This paper is about develo** personalized speech synthesis systems with recordings of mildly impaired speech. In particular, we consider consonant and vowel alterations resulted from partial glossectomy, the surgical removal of part of the tongue. The aim is to restore articulation in the synthesized speech and maximally preserve the target speaker's individuality. We propose to tackle the probl… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: submitted to INTERSPEECH 2023

    Journal ref: INTERSPEECH 2023

  46. arXiv:2305.10821  [pdf, other

    eess.AS

    Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

    Authors: Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

    Abstract: Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available. However, most of them neglect to utilize speaker's 2-dimensional (2D) location cues contained in mixture signal, which limits the performance when two sources come from close directions. In this paper, we propose an end-to-end beamforming network for… ▽ More

    Submitted 2 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2212.03401

  47. arXiv:2305.08303  [pdf, other

    eess.SP cs.IT cs.LG

    Deep-Unfolding for Next-Generation Transceivers

    Authors: Qiyu Hu, Yunlong Cai, Guangyi Zhang, Guanding Yu, Geoffrey Ye Li

    Abstract: The stringent performance requirements of future wireless networks, such as ultra-high data rates, extremely high reliability and low latency, are spurring worldwide studies on defining the next-generation multiple-input multiple-output (MIMO) transceivers. For the design of advanced transceivers in wireless communications, optimization approaches often leading to iterative algorithms have achieve… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

    Comments: 16 pages, 6 figures

  48. arXiv:2305.03274  [pdf, other

    eess.SP

    FAST: Feature Arrangement for Semantic Transmission

    Authors: Kequan Zhou, Guangyi Zhang, Yunlong Cai, Qiyu Hu, Guanding Yu

    Abstract: Although existing semantic communication systems have achieved great success, they have not considered that the channel is time-varying wherein deep fading occurs occasionally. Moreover, the importance of each semantic feature differs from each other. Consequently, the important features may be affected by channel fading and corrupted, resulting in performance degradation. Therefore, higher perfor… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  49. arXiv:2304.14508  [pdf

    eess.IV cs.CV cs.LG

    3D Brainformer: 3D Fusion Transformer for Brain Tumor Segmentation

    Authors: Rui Nian, Guoyao Zhang, Yao Sui, Yuqi Qian, Qiuying Li, Mingzhang Zhao, Jianhui Li, Ali Gholipour, Simon K. Warfield

    Abstract: Magnetic resonance imaging (MRI) is critically important for brain map** in both scientific research and clinical studies. Precise segmentation of brain tumors facilitates clinical diagnosis, evaluations, and surgical planning. Deep learning has recently emerged to improve brain tumor segmentation and achieved impressive results. Convolutional architectures are widely used to implement those neu… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: 10 pages, 4 figures

    MSC Class: 68T07 ACM Class: I.4.6; I.5.1

  50. arXiv:2304.07802  [pdf, other

    eess.SP

    Reconfigurable Intelligent Surface-Enabled Gridless DoA Estimation System for NLoS Scenarios

    Authors: Jiawen Yuan, Shaodan Ma, Gong Zhang, Henry Leung

    Abstract: The conventional direction-of-arrival (DoA) estimation approaches only be effective when the line-of-sight (LoS) link exists, while in the case of the non-line-of-sight (NLoS) situation, the spatial angle can not be captured and thus the DoA estimation performance would be significantly degraded. To address this challenge, a novel reconfigurable intelligent surface (RIS)- enabled gridless DoA esti… ▽ More

    Submitted 7 November, 2023; v1 submitted 16 April, 2023; originally announced April 2023.