Skip to main content

Showing 1–50 of 161 results for author: Yang, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18862  [pdf, other

    cs.SD eess.AS

    Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study

    Authors: Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie

    Abstract: Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic Speech Recognition (ASR). These models typically adopt a unified method to model discrete speech and text tokens, followed by training a decoder-only transformer. However, they are all designed for non-streaming ASR tasks, where the entire s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  2. arXiv:2406.12726  [pdf, other

    cs.SD cs.AI eess.AS

    ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting

    Authors: Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li

    Abstract: Keyword Spotting (KWS) is essential in edge computing requiring rapid and energy-efficient responses. Spiking Neural Networks (SNNs) are well-suited for KWS for their efficiency and temporal capacity for speech. To further reduce the latency and energy consumption, this study introduces ED-sKWS, an SNN-based KWS model with an early-decision mechanism that can stop speech processing and output the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  3. arXiv:2406.12254  [pdf, other

    eess.IV cs.CV

    Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

    Authors: Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael Kim, Shunxing Bao, Ann Xenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: 2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmenta… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.11158  [pdf, other

    eess.SY

    Dynamic Modeling and Control for an Offshore Semisubmersible Floating Wind Turbine

    Authors: Yingjie Gong, Qinmin Yang, Hua Geng, Wenchao Meng, Lin Wang

    Abstract: Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a sem… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.10606  [pdf, other

    eess.SP

    Semantic Communication for Edge Intelligence Enabled Autonomous Driving System

    Authors: Yunqi Feng, Hesheng Shen, Zhendong Shan, Qianqian Yang, Xiufang Shi

    Abstract: Expected to provide higher transportation efficiency and security, autonomous driving has attracted substantial attentions from both industry and academia. Meanwhile, the emergence of edge intelligence has further introduced significant advancements to this field. However, the crucial demands of ultra-reliable and low-latency communications (URLLC) among the vehicles and edge servers have hindered… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Network Magazine, and is ungergoing major revisions

  6. arXiv:2406.10469  [pdf, other

    eess.IV cs.CV cs.MM

    Object-Attribute-Relation Representation based Video Semantic Communication

    Authors: Qiyuan Du, Yi** Duan, Qianqian Yang, Xiaoming Tao, Mérouane Debbah

    Abstract: With the rapid growth of multimedia data volume, there is an increasing need for efficient video transmission in applications such as virtual reality and future video streaming services. Semantic communication is emerging as a vital technique for ensuring efficient and reliable transmission in low-bandwidth, high-noise settings. However, most current approaches focus on joint source-channel coding… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  7. arXiv:2405.11493  [pdf, other

    cs.CV cs.IT eess.SP

    Point Cloud Compression with Implicit Neural Representations: A Unified Framework

    Authors: Hongning Ruan, Yulin Shao, Qianqian Yang, Liang Zhao, Dusit Niyato

    Abstract: Point clouds have become increasingly vital across various applications thanks to their ability to realistically depict 3D objects and scenes. Nevertheless, effectively compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we present a pioneering point cloud compression framework capable of handling both geometry and attribute components. Unlike… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 6 Pages, 6 Figures, submitted to IEEE ICCC

  8. arXiv:2405.09234  [pdf, other

    eess.IV

    Enhancing Image Privacy in Semantic Communication over Wiretap Channels leveraging Differential Privacy

    Authors: Weixuan Chen, Shunpu Tang, Qianqian Yang

    Abstract: Semantic communication (SemCom) enhances transmission efficiency by sending only task-relevant information compared to traditional methods. However, transmitting semantic-rich data over insecure or public channels poses security and privacy risks. This paper addresses the privacy problem of transmitting images over wiretap channels and proposes a novel SemCom approach ensuring privacy through a di… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  9. arXiv:2405.00365  [pdf, other

    cs.IT eess.SP

    Robust Continuous-Time Beam Tracking with Liquid Neural Network

    Authors: Fenghao Zhu, Xinquan Wang, Chongwen Huang, Richeng **, Qianqian Yang, Ahmed Alhammadi, Zhaoyang Zhang, Chau Yuen, Mérouane Debbah

    Abstract: Millimeter-wave (mmWave) technology is increasingly recognized as a pivotal technology of the sixth-generation communication networks due to the large amounts of available spectrum at high frequencies. However, the huge overhead associated with beam training imposes a significant challenge in mmWave communications, particularly in urban environments with high background noise. To reduce this high… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  10. arXiv:2404.19750  [pdf, other

    cs.IT eess.SP

    A Joint Communication and Computation Design for Distributed RISs Assisted Probabilistic Semantic Communication in IIoT

    Authors: Zhouxiang Zhao, Zhaohui Yang, Chongwen Huang, Li Wei, Qianqian Yang, Caijun Zhong, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, the problem of spectral-efficient communication and computation resource allocation for distributed reconfigurable intelligent surfaces (RISs) assisted probabilistic semantic communication (PSC) in industrial Internet-of-Things (IIoT) is investigated. In the considered model, multiple RISs are deployed to serve multiple users, while PSC adopts compute-then-transmit protocol to reduc… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  11. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  12. TIUP: Effective Processor Verification with Tautology-Induced Universal Properties

    Authors: Yufeng Li, Yiwei Ci, Qiusong Yang

    Abstract: Design verification is a complex and costly task, especially for large and intricate processor projects. Formal verification techniques provide advantages by thoroughly examining design behaviors, but they require extensive labor and expertise in property formulation. Recent research focuses on verifying designs using the self-consistency universal property, reducing verification difficulty as it… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by ASP-DAC 2024, please note that this is not the final camera-ready version

  13. arXiv:2404.12170  [pdf, other

    eess.SP cs.IT

    Secure Semantic Communication for Image Transmission in the Presence of Eavesdroppers

    Authors: Shunpu Tang, Chen Liu, Qianqian Yang, Shibo He, Dusit Niyato

    Abstract: Semantic communication (SemCom) has emerged as a key technology for the forthcoming sixth-generation (6G) network, attributed to its enhanced communication efficiency and robustness against channel noise. However, the open nature of wireless channels renders them vulnerable to eavesdrop**, posing a serious threat to privacy. To address this issue, we propose a novel secure semantic communication… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  14. arXiv:2404.03172  [pdf, other

    cs.SE cs.AR eess.SY

    SEPE-SQED: Symbolic Quick Error Detection by Semantically Equivalent Program Execution

    Authors: Yufeng Li, Qiusong Yang, Yiwei Ci, Enyuan Tian

    Abstract: Symbolic quick error detection (SQED) has greatly improved efficiency in formal chip verification. However, it has a limitation in detecting single-instruction bugs due to its reliance on the self-consistency property. To address this, we propose a new variant called symbolic quick error detection by semantically equivalent program execution (SEPE-SQED), which utilizes program synthesis techniques… ▽ More

    Submitted 6 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted by DAC 2024, please note that this is not the final camera-ready version

  15. arXiv:2404.00612  [pdf, other

    cs.IT eess.SP

    Resource Allocation for Green Probabilistic Semantic Communication with Rate Splitting

    Authors: Ruopeng Xu, Zhaohui Yang, Zhouxiang Zhao, Qianqian Yang, Zhaoyang Zhang

    Abstract: In this paper, the energy efficient design for probabilistic semantic communication (PSC) system with rate splitting multiple access (RSMA) is investigated. Basic principles are first reviewed to show how the PSC system works to extract, compress and transmit the semantic information in a task-oriented transmission. Subsequently, the process of how multiuser semantic information can be represented… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  16. arXiv:2403.20237  [pdf, other

    eess.SP

    Evolving Semantic Communication with Generative Model

    Authors: Shunpu Tang, Qianqian Yang, Deniz Gündüz, Zhaoyang Zhang

    Abstract: Recently, learning-based semantic communication (SemCom) has emerged as a promising approach in the upcoming 6G network and researchers have made remarkable efforts in this field. However, existing works have yet to fully explore the advantages of the evolving nature of learning-based systems, where knowledge accumulates during transmission have the potential to enhance system performance. In this… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  17. arXiv:2403.20198  [pdf, other

    cs.IT eess.SY

    Minimizing End-to-End Latency for Joint Source-Channel Coding Systems

    Authors: Kaiyi Chi, Qianqian Yang, Yuanchao Shu, Zhaohui Yang, Zhiguo Shi

    Abstract: While existing studies have highlighted the advantages of deep learning (DL)-based joint source-channel coding (JSCC) schemes in enhancing transmission efficiency, they often overlook the crucial aspect of resource management during the deployment phase. In this paper, we propose an approach to minimize the transmission latency in an uplink JSCC-based system. We first analyze the correlation betwe… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 7 Pages, 5 Figures, accepted by 2024 IEEE ICC Workshop

  18. arXiv:2403.20058  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks

    Authors: Luoyu Wang, Yitian Tao, Qing Yang, Yan Liang, Siwei Liu, Hongcheng Shi, Dinggang Shen, Han Zhang

    Abstract: Simultaneous functional PET/MR (sf-PET/MR) presents a cutting-edge multimodal neuroimaging technique. It provides an unprecedented opportunity for concurrently monitoring and integrating multifaceted brain networks built by spatiotemporally covaried metabolic activity, neural activity, and cerebral blood flow (perfusion). Albeit high scientific/clinical values, short in hardware accessibility of P… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 11 pages

  19. arXiv:2403.18992  [pdf

    eess.IV

    Tractography with T1-weighted MRI and associated anatomical constraints on clinical quality diffusion MRI

    Authors: Tian Yu, Yunhe Li, Michael E. Kim, Chenyu Gao, Qi Yang, Leon Y. Cai, Susane M. Resnick, Lori L. Beason-Held, Daniel C. Moyer, Kurt G. Schilling, Bennett A. Landman

    Abstract: Diffusion MRI (dMRI) streamline tractography, the gold standard for in vivo estimation of brain white matter (WM) pathways, has long been considered indicative of macroscopic relationships with WM microstructure. However, recent advances in tractography demonstrated that convolutional recurrent neural networks (CoRNN) trained with a teacher-student framework have the ability to learn and propagate… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  20. arXiv:2403.06439  [pdf, other

    physics.optics eess.IV

    Wide-Field, High-Resolution Reconstruction in Computational Multi-Aperture Miniscope Using a Fourier Neural Network

    Authors: Qianwan Yang, Ruipeng Guo, Guorong Hu, Yujia Xue, Yunzhe Li, Lei Tian

    Abstract: Traditional fluorescence microscopy is constrained by inherent trade-offs among resolution, field-of-view, and system complexity. To navigate these challenges, we introduce a simple and low-cost computational multi-aperture miniature microscope, utilizing a microlens array for single-shot wide-field, high-resolution imaging. Addressing the challenges posed by extensive view multiplexing and non-lo… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  21. arXiv:2403.05772  [pdf, other

    cs.SD cs.NE eess.AS

    sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

    Authors: Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li

    Abstract: Speech applications are expected to be low-power and robust under noisy conditions. An effective Voice Activity Detection (VAD) front-end lowers the computational need. Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. However, SNN-based VADs have yet to achieve noise robustness and often require large models for high performance. This paper introduces a no… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by ICASSP 2024

  22. arXiv:2403.01250  [pdf

    eess.SY

    Resilient Mobile Energy Storage Resources Based Distribution Network Restoration in Interdependent Power-Transportation-Information Networks

    Authors: Jian Zhong, Chen Chen, Qiming Yang, Dafu Liu, Wentao Shen, Chenlin Ji, Zhaohong Bie

    Abstract: The interactions between power, transportation, and information networks (PTIN), are becoming more profound with the advent of smart city technologies. Existing mobile energy storage resource (MESR)-based power distribution network (PDN) restoration schemes often neglect the interdependencies among PTIN, thus, efficient PDN restoration cannot be achieved. This paper outlines the interacting factor… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  23. arXiv:2403.00434  [pdf, other

    cs.IT eess.SP

    Probabilistic Semantic Communication over Wireless Networks with Rate Splitting

    Authors: Zhouxiang Zhao, Zhaohui Yang, Ye Hu, Qianqian Yang, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, the problem of joint transmission and computation resource allocation for probabilistic semantic communication (PSC) system with rate splitting multiple access (RSMA) is investigated. In the considered model, the base station (BS) needs to transmit a large amount of data to multiple users with RSMA. Due to limited communication resources, the BS is required to utilize semantic commu… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  24. arXiv:2402.13776  [pdf, other

    eess.IV cs.CV cs.LG

    Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

    Authors: Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Yan Liang, Qing Yang, Dinggang Shen, Han Zhang

    Abstract: Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  25. arXiv:2402.10728  [pdf, other

    eess.IV cs.CV

    Semi-weakly-supervised neural network training for medical image registration

    Authors: Yiwen Li, Yunguan Fu, Iani J. M. B. Gayo, Qianye Yang, Zhe Min, Shaheer U. Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, Matthew J. Clarkson, Dean C. Barratt, Victor A. Prisacariu, Yipeng Hu

    Abstract: For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialis… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  26. arXiv:2402.07729  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

    Authors: Qian Yang, ** Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, **gren Zhou

    Abstract: Recently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field. Previous models primarily focus on assessing different fundamental tasks, such as Automatic Speech Recognition (ASR), and lack an assessment of the ope… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  27. arXiv:2401.13980  [pdf, other

    cs.IT eess.IV

    A Nearly Information Theoretically Secure Approach for Semantic Communications over Wiretap Channel

    Authors: Weixuan Chen, Shuo Shao, Qianqian Yang, Zhaoyang Zhang, ** Zhang

    Abstract: This paper addresses the challenge of achieving information-theoretic security in semantic communication (SeCom) over a wiretap channel, where a legitimate receiver coexists with an eavesdropper experiencing a poorer channel condition. Despite previous efforts to secure SeCom against eavesdroppers, achieving information-theoretic security in such schemes remains an open issue. In this work, we pro… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 13 pages, 16 figures

  28. arXiv:2401.03060  [pdf

    eess.IV cs.CV

    Super-resolution multi-contrast unbiased eye atlases with deep probabilistic refinement

    Authors: Ho Hin Lee, Adam M. Saunders, Michael E. Kim, Samuel W. Remedios, Lucas W. Remedios, Yucheng Tang, Qi Yang, Xin Yu, Shunxing Bao, Chloe Cho, Louise A. Mawn, Tonia S. Rex, Kevin L. Schey, Blake E. Dewey, Jeffrey M. Spraggins, Jerry L. Prince, Yuankai Huo, Bennett A. Landman

    Abstract: Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference. Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Revised for submission to SPIE Journal of Medical Imaging. 26 pages, 6 figures

  29. arXiv:2401.00658  [pdf, other

    cs.IT cs.LG cs.MM eess.SP

    Point Cloud in the Air

    Authors: Yulin Shao, Chenghong Bian, Li Yang, Qianqian Yang, Zhaoyang Zhang, Deniz Gunduz

    Abstract: Acquisition and processing of point clouds (PCs) is a crucial enabler for many emerging applications reliant on 3D spatial data, such as robot navigation, autonomous vehicles, and augmented reality. In most scenarios, PCs acquired by remote sensors must be transmitted to an edge server for fusion, segmentation, or inference. Wireless transmission of PCs not only puts on increased burden on the alr… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  30. arXiv:2312.11102  [pdf, ps, other

    eess.SP

    Holographic Imaging with XL-MIMO and RIS: Illumination and Reflection Design

    Authors: Giulia Torcolacci, Anna Guerra, Haiyang Zhang, Francesco Guidi, Qianyu Yang, Yonina C. Eldar, Davide Dardari

    Abstract: This paper addresses a near-field imaging problem utilizing extremely large-scale multiple-input multiple-output (XL-MIMO) antennas and reconfigurable intelligent surfaces (RISs) already in place for wireless communications. To this end, we consider a system with a fixed transmitting antenna array illuminating a region of interest (ROI) and a fixed receiving antenna array inferring the ROI's scatt… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  31. arXiv:2312.10472  [pdf, other

    cs.LG cs.AI eess.SY

    Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System

    Authors: Ruining Zhang, Haoran Han, Maolong Lv, Qisong Yang, Jian Cheng

    Abstract: Extensive utilization of deep reinforcement learning (DRL) policy networks in diverse continuous control tasks has raised questions regarding performance degradation in expansive state spaces where the input state norm is larger than that in the training environment. This paper aims to uncover the underlying factors contributing to such performance deterioration when dealing with expanded state sp… ▽ More

    Submitted 31 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

  32. arXiv:2312.10305  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

    Authors: Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang

    Abstract: Speech signals are inherently complex as they encompass both global acoustic characteristics and local semantic information. However, in the task of target speech extraction, certain elements of global and local semantic information in the reference speech, which are irrelevant to speaker identity, can lead to speaker confusion within the speech extraction network. To overcome this challenge, we p… ▽ More

    Submitted 19 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  33. arXiv:2312.06462  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

    Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

    Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More

    Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight. 13 pages, 10 figures

  34. arXiv:2312.05062  [pdf, ps, other

    eess.IV

    Deep Learning Enabled Semantic Communication Systems for Video Transmission

    Authors: Zhenguo Zhang, Qianqian Yang, Shibo He, Jiming Chen

    Abstract: Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compact… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  35. arXiv:2311.07919  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

    Authors: Yunfei Chu, ** Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, **gren Zhou

    Abstract: Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field. Consequently, most existing works have only been able to support a limited range of interaction capabilities. In this paper, we develop the Qwen-… ▽ More

    Submitted 21 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: The code, checkpoints and demo are released at https://github.com/QwenLM/Qwen-Audio

  36. arXiv:2311.03500  [pdf

    eess.IV cs.CV q-bio.NC

    Predicting Age from White Matter Diffusivity with Residual Learning

    Authors: Chenyu Gao, Michael E. Kim, Ho Hin Lee, Qi Yang, Nazirah Mohd Khairi, Praitayini Kanakaraj, Nancy R. Newlin, Derek B. Archer, Angela L. Jefferson, Warren D. Taylor, Brian D. Boyd, Lori L. Beason-Held, Susan M. Resnick, The BIOCARD Study Team, Yuankai Huo, Katherine D. Van Schaik, Kurt G. Schilling, Daniel Moyer, Ivana IÅ¡gum, Bennett A. Landman

    Abstract: Imaging findings inconsistent with those expected at specific chronological age ranges may serve as early indicators of neurological disorders and increased mortality risk. Estimation of chronological age, and deviations from expected results, from structural MRI data has become an important task for develo** biomarkers that are sensitive to such deviations. Complementary to structural analysis,… ▽ More

    Submitted 21 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: SPIE Medical Imaging: Image Processing. San Diego, CA. February 2024 (accepted as poster presentation)

  37. arXiv:2310.14954  [pdf, other

    cs.SD cs.CL eess.AS

    Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition

    Authors: Peng Fan, Changhao Shan, Sining Sun, Qing Yang, Jianwei Zhang

    Abstract: Recently, Conformer as a backbone network for end-to-end automatic speech recognition achieved state-of-the-art performance. The Conformer block leverages a self-attention mechanism to capture global information, along with a convolutional neural network to capture local information, resulting in improved performance. However, the Conformer-based model encounters an issue with the self-attention m… ▽ More

    Submitted 28 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: This manuscript has been accepted by IEEE Signal Processing Letters for publication

  38. arXiv:2310.03049  [pdf, other

    cs.LG eess.IV physics.optics

    QuATON: Quantization Aware Training of Optical Neurons

    Authors: Hasindu Kariyawasam, Ramith Hettiarachchi, Quansan Yang, Alex Matlock, Takahiro Nambara, Hiroyuki Kusaka, Yuichiro Kunai, Peter T C So, Edward S Boyden, Dushan Wadduwage

    Abstract: Optical processors, built with "optical neurons", can efficiently perform high-dimensional linear operations at the speed of light. Thus they are a promising avenue to accelerate large-scale linear computations. With the current advances in micro-fabrication, such optical processors can now be 3D fabricated, but with a limited precision. This limitation translates to quantization of learnable para… ▽ More

    Submitted 21 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

  39. arXiv:2310.00730  [pdf

    physics.optics eess.IV

    EventLFM: Event Camera integrated Fourier Light Field Microscopy for Ultrafast 3D imaging

    Authors: Ruipeng Guo, Qianwan Yang, Andrew S. Chang, Guorong Hu, Joseph Greene, Christopher V. Gabel, Sixian You, Lei Tian

    Abstract: Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus… ▽ More

    Submitted 3 April, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  40. arXiv:2310.00593  [pdf, other

    eess.SP

    Nonlinear Multi-Carrier System with Signal Clip**: Measurement, Analysis, and Optimization

    Authors: Yuyang Du, Liang Hao, Yiming Lei, Qun Yang, Shiqi Xu

    Abstract: Signal clip** is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clip** reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clip** distortion. Optimizing the j… ▽ More

    Submitted 16 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  41. arXiv:2310.00015  [pdf, other

    cs.IT eess.SP

    Semantic Communication with Probability Graph: A Joint Communication and Computation Design

    Authors: Zhouxiang Zhao, Zhaohui Yang, Quoc-Viet Pham, Qianqian Yang, Zhaoyang Zhang

    Abstract: In this paper, we present a probability graph-based semantic information compression system for scenarios where the base station (BS) and the user share common background knowledge. We employ probability graphs to represent the shared knowledge between the communicating parties. During the transmission of specific text data, the BS first extracts semantic information from the text, which is repres… ▽ More

    Submitted 5 October, 2023; v1 submitted 16 September, 2023; originally announced October 2023.

  42. arXiv:2309.09392  [pdf, other

    eess.IV cs.CV

    Deep conditional generative models for longitudinal single-slice abdominal computed tomography harmonization

    Authors: Xin Yu, Qi Yang, Yucheng Tang, Riqiang Gao, Shunxing Bao, Leon Y. Cai, Ho Hin Lee, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different or… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  43. arXiv:2309.06787  [pdf, other

    cs.SD eess.AS

    DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation

    Authors: Zhichao Wu, Qiulin Li, Sixing Liu, Qun Yang

    Abstract: In the Text-to-speech(TTS) task, the latent diffusion model has excellent fidelity and generalization, but its expensive resource consumption and slow inference speed have always been a challenging. This paper proposes Discrete Diffusion Model with Contrastive Learning for Text-to-Speech Generation(DCTTS). The following contributions are made by DCTTS: 1) The TTS diffusion model based on discrete… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: 5 pages, submitted to ICASSP

  44. arXiv:2309.04071  [pdf, other

    eess.IV cs.CV

    Enhancing Hierarchical Transformers for Whole Brain Segmentation with Intracranial Measurements Integration

    Authors: Xin Yu, Yucheng Tang, Qi Yang, Ho Hin Lee, Shunxing Bao, Yuankai Huo, Bennett A. Landman

    Abstract: Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potentia… ▽ More

    Submitted 10 April, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

  45. arXiv:2308.06533  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Knowledge Distilled Ensemble Model for sEMG-based Silent Speech Interface

    Authors: Wenqiang Lai, Qihan Yang, Ye Mao, Endong Sun, Jiangnan Ye

    Abstract: Voice disorders affect millions of people worldwide. Surface electromyography-based Silent Speech Interfaces (sEMG-based SSIs) have been explored as a potential solution for decades. However, previous works were limited by small vocabularies and manually extracted features from raw data. To address these limitations, we propose a lightweight deep learning knowledge-distilled ensemble model for sEM… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: 6 pages, 5 figures

  46. arXiv:2308.04304  [pdf, other

    cs.IT cs.CR cs.LG eess.IV

    The Model Inversion Eavesdrop** Attack in Semantic Communication Systems

    Authors: Yuhao Chen, Qianqian Yang, Zhiguo Shi, Jiming Chen

    Abstract: In recent years, semantic communication has been a popular research topic for its superiority in communication efficiency. As semantic communication relies on deep learning to extract meaning from raw messages, it is vulnerable to attacks targeting deep learning models. In this paper, we introduce the model inversion eavesdrop** attack (MIEA) to reveal the risk of privacy leaks in the semantic c… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by 2023 IEEE Global Communications Conference (GLOBECOM)

  47. arXiv:2308.02282  [pdf, other

    cs.LG cs.AI eess.SP

    DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization

    Authors: Wang Lu, **dong Wang, Xinwei Sun, Yiqiang Chen, Xiangyang Ji, Qiang Yang, Xing Xie

    Abstract: Time series remains one of the most challenging modalities in machine learning research. The out-of-distribution (OOD) detection and generalization on time series tend to suffer due to its non-stationary property, i.e., the distribution changes over time. The dynamic distributions inside time series pose great challenges to existing algorithms to identify invariant distributions since they mainly… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: Journal version of arXiv:2209.07027; 17 pages

  48. arXiv:2308.01499  [pdf, other

    cs.CV eess.IV

    TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations

    Authors: Qi Yang, Joel Jung, Timon Deschamps, Xiaozhong Xu, Shan Liu

    Abstract: Dynamic colored meshes (DCM) are widely used in various applications; however, these meshes may undergo different processes, such as compression or transmission, which can distort them and degrade their quality. To facilitate the development of objective metrics for DCMs and study the influence of typical distortions on their perception, we create the Tencent - dynamic colored mesh database (TDMD)… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

  49. arXiv:2308.01173  [pdf, other

    eess.IV

    FlexDTI: Flexible diffusion gradient encoding scheme-based highly efficient diffusion tensor imaging using deep learning

    Authors: Zejun Wu, Jiechao Wang, Zunquan Chen, Qinqin Yang, Zhen Xing, Dairong Cao, Jianfeng Bao, Taishan Kang, Jianzhong Lin, Shuhui Cai, Zhong Chen, Congbo Cai

    Abstract: Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. App… ▽ More

    Submitted 21 December, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: 24 pages,9 figures,3 tables

  50. Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation

    Authors: Wen Yan, Bernard Chiu, Ziyi Shen, Qianye Yang, Tom Syer, Zhe Min, Shonit Punwani, Mark Emberton, David Atkinson, Dean C. Barratt, Yipeng Hu

    Abstract: One of the distinct characteristics in radiologists' reading of multiparametric prostate MR scans, using reporting systems such as PI-RADS v2.1, is to score individual types of MR modalities, T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant canc… ▽ More

    Submitted 20 January, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: 30 pages, 6 figures

    MSC Class: 68T07

    Journal ref: journal={Medical Image Analysis}, volume={91}, pages={103030}, year={2024}, publisher={Elsevier}